Inception Labs' Mercury 2 Claims Title of World's Fastest Reasoning AI, Hitting 1,000 Tokens Per Second With 90% Math Benchmark Score
Summary
Inception Labs' Mercury 2 is shaking up the AI world, claiming the title of fastest reasoning model by generating 1,000 tokens per second while scoring 90% on the AIME 2026 math benchmark, outperforming Google's DiffusionGemma, and delivering an 82% latency reduction and 90% cost cut in real-world testing.
Key Points
- Inception Labs' Mercury 2 is declared the world's fastest reasoning language model, generating approximately 1,000 tokens per second and scoring 90% on the AIME 2026 math benchmark, outperforming Google's DiffusionGemma, which scored only 69.1% on the same test.
- Both Mercury 2 and Google's DiffusionGemma use diffusion-based parallel text generation instead of traditional word-by-word output, but Mercury 2 demonstrates superior benchmark performance while DiffusionGemma remains free and open-weight on Hugging Face, whereas Mercury 2 operates as a paid, closed-weight API.
- Real-world testing by Augment Code shows that swapping in Mercury 2 delivered an 82% reduction in latency and a 90% cut in costs with no loss in output quality, highlighting its advantages for speed-sensitive, high-volume AI workflows such as multi-agent coding systems and real-time applications.