New Open-Source Engine Brings DeepSeek V4 Flash Local Inference to Apple Silicon Macs
Summary
A new open-source engine called ds4.c launches Metal-only local inference for DeepSeek V4 Flash on Apple Silicon Macs, featuring 2-bit quantization, disk KV cache persistence, and a 1-million-token context window — making powerful AI inference possible on MacBooks and Mac Studios with 128GB or more of RAM.
Key Points
- A new open-source project called ds4.c is launching a narrow, Metal-only local inference engine specifically built for DeepSeek V4 Flash, offering features like disk KV cache persistence, 2-bit quantization support, and an OpenAI/Anthropic-compatible server API.
- DeepSeek V4 Flash is highlighted as a standout model due to its shorter, complexity-proportional thinking sections, 1-million-token context window, and efficient compressed KV cache, making it viable for local inference on MacBooks and Mac Studios with 128GB or more of RAM.
- The engine supports coding agent integrations with tools like Claude Code, opencode, and Pi, and is validated against official DeepSeek logits to ensure correctness, though it remains alpha-quality and is explicitly Metal-only with no plans for a general GGUF loader.