Cloudflare Launches Kimi K2.5 on Workers AI, Slashing Inference Costs by 77% While Processing 7 Billion Tokens Daily

Mar 22, 2026

The Cloudflare Blog

Article image for Cloudflare Launches Kimi K2.5 on Workers AI, Slashing Inference Costs by 77% While Processing 7 Billion Tokens Daily

Summary

Cloudflare launches Kimi K2.5 on Workers AI, achieving a massive 77% cut in inference costs while processing over 7 billion tokens daily, bringing powerful frontier open-source AI capabilities including a 256k context window and vision inputs to its platform alongside new features like prefix caching and a redesigned async API.

Key Points

Cloudflare's Workers AI now supports large frontier open-source models, launching with Moonshot AI's Kimi K2.5, which features a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic tasks.
Kimi K2.5 is delivering major cost savings in production, with Cloudflare cutting inference costs by 77% compared to mid-tier proprietary models, processing over 7 billion tokens per day on internal security review agents.
New platform features are rolling out alongside the launch, including prefix caching with discounted cached token pricing, a session affinity header for higher cache hit rates, and a redesigned asynchronous API that ensures durable inference execution without capacity errors.

Cloudflare Launches Kimi K2.5 on Workers AI, Slashing Inference Costs by 77% While Processing 7 Billion Tokens Daily

Summary

Key Points

Tags