NVIDIA Launches Nemotron 3 Ultra: A 550B-Parameter AI Model Promising Frontier Reasoning at 30% Lower Cost

Jun 05, 2026
NVIDIA Technical Blog
Article image for NVIDIA Launches Nemotron 3 Ultra: A 550B-Parameter AI Model Promising Frontier Reasoning at 30% Lower Cost

Summary

NVIDIA launches Nemotron 3 Ultra, a massive 550B-parameter AI model delivering frontier reasoning capabilities at 30% lower cost and 5x higher throughput than comparable open models, powered by cutting-edge hybrid architecture and a novel multi-teacher training method, with fully open weights released for enterprise adoption.

Key Points

  • NVIDIA releases Nemotron 3 Ultra, a 550B-parameter Mixture-of-Experts model with 55B active parameters, delivering frontier reasoning and 5x higher throughput compared to similar open models, making it purpose-built for orchestrating complex, long-running agentic workflows at up to 30% lower cost.
  • Key architectural innovations powering the model include hybrid Mamba-Transformer layers for long-context efficiency, NVFP4 quantization enabling cross-GPU deployment on Hopper, Blackwell, and Ampere architectures, LatentMoE for smarter expert routing, and multi-token prediction for faster multi-turn generation.
  • A new training method called Multi-Teacher On-Policy Distillation uses over 10 specialized teacher models to continuously improve the student model across domains, while NVIDIA simultaneously releases fully open weights, recipes, 10M new SFT samples, and 15 new RL environments to support broad enterprise adoption and fine-tuning.

Tags

Read Original Article