DeepSeek Releases Open Source Framework That Boosts AI Inference Speed by Up to 85%

Jun 30, 2026

Venturebeat

Article image for DeepSeek Releases Open Source Framework That Boosts AI Inference Speed by Up to 85%

Summary

DeepSeek releases DSpark, a free open-source AI framework that boosts large language model inference speeds by up to 85% using confidence-scheduled speculative decoding, delivering over 50% throughput gains in live production and supporting popular open-weight models like Qwen and Gemma.

Key Points

DeepSeek releases DSpark, an MIT-licensed open source framework that speeds up large language model inference by up to 85% using a technique called confidence-scheduled speculative decoding, which drafts multiple tokens ahead and selectively verifies only the most promising ones.
In live production tests, DSpark boosts per-user generation speeds by 60-85% for DeepSeek-V4-Flash and 57-78% for DeepSeek-V4-Pro, while also improving aggregate throughput by over 50%, with early community benchmarks confirming approximately 2.3x speed gains over non-speculative decoding.
DSpark is not limited to DeepSeek models, as released checkpoints and the accompanying DeepSpec training codebase support open-weight models like Qwen and Gemma, giving enterprise teams running self-hosted infrastructure a concrete path to train compatible draft modules and reduce inference costs without changing the underlying model.

DeepSeek Releases Open Source Framework That Boosts AI Inference Speed by Up to 85%

Summary

Key Points

Tags