DeepSeek-AI Unveils Open-Source OCR Model with Human-Like Visual Processing Technology
Summary
DeepSeek-AI launches DeepSeek-OCR-2, an open-source visual OCR model featuring groundbreaking Visual Causal Flow technology that mimics human visual processing, supporting dynamic resolution up to 6×768×768 plus 1×1024×1024 image patches with document-to-markdown conversion, PDF processing, and streaming output capabilities through vLLM and Transformers frameworks.
Key Points
- DeepSeek-AI releases DeepSeek-OCR-2, an open-source visual OCR model that uses Visual Causal Flow technology for human-like visual encoding
- The model supports dynamic resolution processing with up to 6×768×768 plus 1×1024×1024 image patches and offers both document-to-markdown conversion and free OCR capabilities
- DeepSeek-OCR-2 provides inference options through both vLLM and Transformers frameworks, supporting streaming output, PDF processing, and batch evaluation for benchmarks