arXiv cs.CL·16 June 2026

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Signal

Hype

In three linesNVIDIA introduces Nemotron 3 Ultra, a 550B-parameter (55B active) Mamba-Transformer MoE hybrid model pre-trained on 20T tokens with 1M context length. Uses SFT, RL, and multi-teacher distillation. Achieves ~6x inference throughput of public LLMs with comparable accuracy. Base, post-trained, and quantized checkpoints, training data, and recipe open-sourced on HuggingFace.

Read source

Your take?

AI Agents Reasoning Open source Reinforcement learning Fine-tuning

Summary generated by Claude — human-verified

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Other angles on this story