New Open-Source Engine Brings DeepSeek V4 Flash Local Inference to Apple Silicon Macs

May 08, 2026

GitHub

Article image for New Open-Source Engine Brings DeepSeek V4 Flash Local Inference to Apple Silicon Macs

Summary

A new open-source engine called ds4.c launches Metal-only local inference for DeepSeek V4 Flash on Apple Silicon Macs, featuring 2-bit quantization, disk KV cache persistence, and a 1-million-token context window — making powerful AI inference possible on MacBooks and Mac Studios with 128GB or more of RAM.

Key Points

A new open-source project called ds4.c is launching a narrow, Metal-only local inference engine specifically built for DeepSeek V4 Flash, offering features like disk KV cache persistence, 2-bit quantization support, and an OpenAI/Anthropic-compatible server API.
DeepSeek V4 Flash is highlighted as a standout model due to its shorter, complexity-proportional thinking sections, 1-million-token context window, and efficient compressed KV cache, making it viable for local inference on MacBooks and Mac Studios with 128GB or more of RAM.
The engine supports coding agent integrations with tools like Claude Code, opencode, and Pi, and is validated against official DeepSeek logits to ensure correctness, though it remains alpha-quality and is explicitly Metal-only with no plans for a general GGUF loader.

New Open-Source Engine Brings DeepSeek V4 Flash Local Inference to Apple Silicon Macs

Summary

Key Points

Tags