Xiaomi Hits 1,000 Tokens Per Second on Trillion-Parameter AI, Claiming 15x Speed Advantage Over ChatGPT and Claude

Jun 09, 2026
Decrypt
Article image for Xiaomi Hits 1,000 Tokens Per Second on Trillion-Parameter AI, Claiming 15x Speed Advantage Over ChatGPT and Claude

Summary

Xiaomi and TileRT claim a major AI speed breakthrough, hitting over 1,000 tokens per second on a trillion-parameter model using just 8 commodity GPUs — roughly 15 times faster than ChatGPT and Claude — powered by FP4 quantization and speculative decoding, with an open-source model checkpoint already live on Hugging Face.

Key Points

  • Xiaomi and inference partner TileRT have achieved over 1,000 tokens per second on a 1-trillion-parameter AI model using a standard 8-GPU commodity node, making it roughly 15 times faster than ChatGPT and Claude, which operate at around 68–71 tokens per second.
  • The breakthrough relies on two key techniques: FP4 quantization, which compresses expert model layers to 4-bit precision with near-zero quality loss, and DFlash speculative decoding, which proposes and verifies entire blocks of tokens in a single pass rather than one at a time.
  • A limited API trial for MiMo-V2.5-Pro-UltraSpeed runs June 9–23, priced at 3 times the standard MiMo rate for approximately 10 times the generation speed, with the FP4-DFlash model checkpoint already open-sourced on Hugging Face for community testing.

Tags

Read Original Article