Cloudflare Launches Kimi K2.5 on Workers AI, Slashing Inference Costs by 77% While Processing 7 Billion Tokens Daily
Summary
Cloudflare launches Kimi K2.5 on Workers AI, achieving a massive 77% cut in inference costs while processing over 7 billion tokens daily, bringing powerful frontier open-source AI capabilities including a 256k context window and vision inputs to its platform alongside new features like prefix caching and a redesigned async API.
Key Points
- Cloudflare's Workers AI now supports large frontier open-source models, launching with Moonshot AI's Kimi K2.5, which features a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic tasks.
- Kimi K2.5 is delivering major cost savings in production, with Cloudflare cutting inference costs by 77% compared to mid-tier proprietary models, processing over 7 billion tokens per day on internal security review agents.
- New platform features are rolling out alongside the launch, including prefix caching with discounted cached token pricing, a session affinity header for higher cache hit rates, and a redesigned asynchronous API that ensures durable inference execution without capacity errors.