Google Launches Two New Gemini API Tiers Offering Developers Cost Savings and Priority Reliability
Summary
Google launches two new Gemini API tiers — Flex and Priority — giving developers a powerful choice between 50% cost savings for background tasks or maximum reliability for critical real-time applications, all through a single unified interface.
Key Points
- Google is launching two new Gemini API service tiers, Flex and Priority, giving developers granular control over cost and reliability through a single unified interface.
- Flex Inference offers 50% cost savings for latency-tolerant background tasks like data enrichment and agentic workflows, using synchronous endpoints without the complexity of async job management.
- Priority Inference provides the highest reliability for critical, user-facing applications like real-time support bots, automatically downgrading overflow requests to Standard tier instead of failing, and is available to Tier 2 and 3 paid projects.