Google Releases Open-Source AI Drafters That Triple Gemma 4 Inference Speed Without Sacrificing Quality

May 06, 2026
Google
Article image for Google Releases Open-Source AI Drafters That Triple Gemma 4 Inference Speed Without Sacrificing Quality

Summary

Google releases open-source Multi-Token Prediction drafters for Gemma 4 models, tripling AI inference speed through speculative decoding with zero quality loss, now freely available on Hugging Face and Kaggle under an Apache 2.0 license.

Key Points

  • Google releases Multi-Token Prediction (MTP) drafters for Gemma 4 models, delivering up to a 3x inference speedup with zero degradation in output quality or reasoning accuracy.
  • The MTP drafters use speculative decoding, pairing a lightweight drafter model with the heavier Gemma 4 target model to predict multiple tokens simultaneously, which the target model then verifies in parallel, dramatically reducing latency.
  • The MTP drafters are now available under an open-source Apache 2.0 license on Hugging Face, Kaggle, and multiple inference frameworks, supporting use cases from on-device edge deployment to high-performance workstation and cloud environments.

Tags

Read Original Article