Nvidia Launches Cosmos 3: Open-Source AI Model Trained on 20 Trillion Tokens Targets Robotics and Physical AI
Summary
Nvidia unveils Cosmos 3 at GTC Taipei, a fully open-source omnimodal AI model trained on 20 trillion tokens, 1 billion images, and 400 million videos, designed to advance robotics and physical AI through a new mixture-of-transformers architecture capable of reasoning and generating across text, video, images, sound, and action.
Key Points
- Nvidia unveils Cosmos 3 at GTC Taipei, a fully open-source omnimodal world foundation model capable of reasoning and generating across text, video, images, sound, and action, trained on a massive dataset of 20 trillion tokens, 1 billion images, and 400 million videos.
- Cosmos 3 introduces a new 'mixture-of-transformers' architecture that combines reasoning and generation capabilities, enabling it to understand object interactions, motion, and spatiotemporal relationships, and comes in multiple sizes including Super, Nano, and the upcoming Edge variant for real-time inference.
- While Cosmos 3 does not yet fully solve the generalization challenge in physical AI, Nvidia's open-source approach strategically fuels the broader robotics ecosystem while simultaneously driving demand for its own compute hardware and enabling further hardware co-design innovation.