DeepSeek Releases Open Source Framework That Boosts AI Inference Speed by Up to 85%
Summary
DeepSeek releases DSpark, a free open-source AI framework that boosts large language model inference speeds by up to 85% using confidence-scheduled speculative decoding, delivering over 50% throughput gains in live production and supporting popular open-weight models like Qwen and Gemma.
Key Points
- DeepSeek releases DSpark, an MIT-licensed open source framework that speeds up large language model inference by up to 85% using a technique called confidence-scheduled speculative decoding, which drafts multiple tokens ahead and selectively verifies only the most promising ones.
- In live production tests, DSpark boosts per-user generation speeds by 60-85% for DeepSeek-V4-Flash and 57-78% for DeepSeek-V4-Pro, while also improving aggregate throughput by over 50%, with early community benchmarks confirming approximately 2.3x speed gains over non-speculative decoding.
- DSpark is not limited to DeepSeek models, as released checkpoints and the accompanying DeepSpec training codebase support open-weight models like Qwen and Gemma, giving enterprise teams running self-hosted infrastructure a concrete path to train compatible draft modules and reduce inference costs without changing the underlying model.