Baidu Launches Qianfan-VL Vision-Language Models With New OCR Reasoning Tech for Enterprise Use

Mar 19, 2026
GitHub
Article image for Baidu Launches Qianfan-VL Vision-Language Models With New OCR Reasoning Tech for Enterprise Use

Summary

Baidu launches the Qianfan-VL series of vision-language models ranging from 3B to 70B parameters, alongside a new 4B Qianfan-OCR model featuring 'Layout-as-Thought' reasoning that tops major benchmarks, supports 192 languages, and is built for enterprise document understanding and visual reasoning tasks.

Key Points

  • Baidu releases the Qianfan-VL series, a family of domain-enhanced vision-language models ranging from 3B to 70B parameters, designed for enterprise use cases including OCR, document understanding, and complex visual reasoning with Chain-of-Thought support on 8B and 70B variants.
  • A new 4B model called Qianfan-OCR is now available, featuring a 'Layout-as-Thought' reasoning mechanism that unifies document parsing, table extraction, formula recognition, and key information extraction in a single model, achieving top rankings on OmniDocBench v1.5, OCRBench, and KIE benchmarks while supporting 192 languages.
  • The models are trained using a four-stage progressive training strategy on Baidu's Kunlun P800 chips across a 5,000+ chip distributed cluster, processing up to 3T tokens with 90%+ scaling efficiency, and are deployable via Hugging Face Transformers or vLLM with an OpenAI-compatible API.

Tags

Read Original Article