TIN MAN v2.1 · COGNITIVE CLUSTER · NVIDIA BLACKWELL EMBEDDED
First publicly documented Nemotron-Omni V3 100% NVFP4 native — encoders included.
A 12-core multimodal cognitive cluster engineered for sovereign edge autonomy. Tin Man v2.1 runs the full Nemotron-3-Nano-Omni V3 stack — LLM body, RADIO visual encoder, Parakeet audio encoder — natively in NVFP4 on a single NVIDIA Jetson AGX Thor 5000 node. A configuration the official NVIDIA reference recipe leaves in BF16.
ARCHITECTURE
12 cores. One cognitive cluster.
Tin Man v2.1 implements the Shield Brain deterministic AI control architecture on NVIDIA Jetson AGX Thor 5000. The cluster is a set of containerized cognitive cores, each isolated in fault-bounded execution environments with deterministic startup sequencing and quota- enforced resources. Sprint 2 brings the first core — omni_core — to operational status, multimodal end-to-end.
- omni_core multimodal cognition (LLM + Vision + Audio) ACTIVE
- retina_core 4K video preprocessing via DeepStream PLANNED
- riva_core ASR + TTS + NMT (Riva ARM64) PLANNED
- brainstem_core ROS2 Isaac + LiDAR sensor fusion PLANNED
- prefrontal_core decision controller + multimodal prompt assembly PLANNED
- guardian_core AI Act safety gates · deterministic rule engine PLANNED
- realtime_core reflex layer · sub-100 ms hard real-time loop PLANNED
- isr_core pattern recognition · intelligence layer PLANNED
- memory_core RAG + episodic context store PLANNED
- motion_core path planning · trajectory control PLANNED
- gateway_core API I/O PLANNED
- science_core ad-hoc compute · experimental workloads PLANNED
The hardware platform — Jetson AGX Thor 5000 with 128 GiB LPDDR5x unified memory — has approximately 75 GiB of headroom available for Sprint 3 expansion without compromising the cognitive trunk performance characteristics observed in Sprint 2.
PERFORMANCE · SPRINT 2
Numbers from a single Jetson AGX Thor node.
All metrics below are from the Sprint 2 closure benchmarks (2026-05-12), reproducible from the documented engine builds. The cluster is 19.4 GiB on-disk in total, with approximately 1 GiB runtime activation in steady state.
Table A · Cluster engine inventory
| Component | Build duration | Engine size | Runtime activation |
|---|---|---|---|
| LLM (Nemotron-3-Nano-Omni V3, 30B MoE A3B) | 1 m 45 s | 18.14 GiB | 134 MB |
| Visual encoder (RADIO C-RADIOv2-H, ViT-H/16) | 27.7 s | 768 MiB | 115 MB |
| Audio encoder (Parakeet Fast Conformer) | 2 m 04 s | 565 MiB | 136 MB |
| Cluster total | — | 19.4 GiB on-disk | ~390 MB runtime |
Table B · Long-context inference (Hybrid Mamba prefill invariance)
| Input context | Setup time | Inference (prefill + decode) | Output tokens | Aggregate e2e throughput |
|---|---|---|---|---|
| 8K | 13 s | 7 s | 113 | ~1.5K tok/s |
| 32K | 13 s | 17 s | 122 | ~2.4K tok/s |
| 64K | 13 s | 31 s | 96 | ~2.5K tok/s |
| 100K | 12 s | 48 s | 106 | ~2.6K tok/s |
Prefill throughput remains constant at approximately 2.5K tok/s from 8K to 100K input tokens. This is a characteristic of the Hybrid Mamba-2 + Transformer GQA architecture (46 O(N) Mamba layers + 6 O(N²) Attention layers, 30B total parameters, 3B active per token) that pure- transformer architectures cannot match at this scale on embedded hardware.
TECHNICAL CONTRIBUTION
A 13-line wiring patch falsified a public-domain limitation.
A public-domain inference held that TensorRT 10.13.3.9 could not parse NVFP4 dequantization for encoder architectures (ViT, Conformer), mandating BF16 fallback. We empirically falsified this. The apparent limitation was a 13-line wiring gap in a post-export rewriter that was present in the LLM export path but not in the encoder export path. Once wired through, the encoder ONNX parses cleanly and the engines build natively in NVFP4.
We documented the finding and submitted a bundle of six atomic patches for upstream contribution to NVIDIA TRT-Edge-LLM. The fix pattern is reusable for any NVFP4-quantized encoder architecture in the upstream codebase.
-
13 LOC
the wiring fix in the post-export rewriter
-
+331 / −63 LOC
total atomic patch bundle for upstream
-
6 patches · 3 rounds
vendored on TRT-Edge-LLM v0.7.0 base
HARDWARE PLATFORM
NVIDIA Jetson AGX Thor 5000 — Blackwell embedded.
Tin Man v2.1 runs on a single Jetson AGX Thor 5000 node. The hardware platform was selected for the combination of unified memory capacity, Tensor Core generation, and embedded deployability that the cluster's design envelope required.
- GPU NVIDIA Blackwell embedded (sm_110a) · 80 Tensor Cores Gen 5 · 2560 CUDA cores
- CPU 14× Arm Neoverse V3AE + efficiency cluster
- Unified memory 128 GiB LPDDR5x · ~273 GB/s bandwidth
- Storage NVMe PCIe 5.0 · ~14 GB/s sequential read
- Software stack JetPack 7.0 · CUDA 13.0 · TensorRT 10.13.3.9 · TRT-Edge-LLM v0.7.0
- Deployment Containerized (Docker 27.5.1 + NVIDIA Container Toolkit CDI) · air-gap capable
INTEGRATION
Tin Man is how Shield Brain runs.
Shield Brain is the deterministic AI control architecture protected by a patent application filed in Canada (2025, pending). It defines the cognitive cores — Prefrontal, Guardian, Realtime, and supporting cores — and the hardware-isolated execution model that guarantees safety-critical determinism under variable generative workloads.
Tin Man v2.1 is the implementation of Shield Brain on Jetson AGX Thor — the cluster where the architecture becomes operational. The Sprint 2 deliverable activates omni_core (multimodal cognition); Sprint 3+ activates the additional cores defined by the Shield Brain framework.
ECOSYSTEM
NVIDIA Inception member. Contributing upstream.
Reinventy is a member of the NVIDIA Inception program. Tin Man v2.1 is the first publicly documented edge deployment characterizing Nemotron-3-Nano-Omni V3 NVFP4 native behavior end-to-end, including encoder paths the official NVIDIA reference recipe does not yet quantize natively.
Sprint 2 deliverables are formatted for upstream contribution to NVIDIA TRT-Edge-LLM. The fix-pattern documented above (encoder NVFP4 export wiring) is reusable beyond Reinventy's own deployment, and we have prepared a 6-patch bundle PR ready for community review.
- NVIDIA Inception Member
- Upstream PR bundle ready: TRT-Edge-LLM v0.7.0 + 6 patches
- Sprint 2 report: external-distribution-ready under partnership
ENGAGE
Capability briefs are released under partnership.
The Sprint 2 performance report, the upstream patch bundle technical context, integration roadmaps, and the partnership pathway are released under non-disclosure agreement. Reach out and we will route the conversation to the technical lead.
Direct: engage@reinventy-solutions.ca