IBM Releases Granite-Docling-258M Open-Source AI Model for Advanced Document-to-Text Conversion
Summary
IBM Research unveils Granite-Docling-258M, a powerful 258-million parameter open-source AI model that converts complex documents to text while preserving layouts, tables, and equations, featuring improved stability and experimental multilingual support for Arabic, Chinese, and Japanese languages.
Key Points
- IBM Research introduces Granite-Docling-258M, a compact 258-million parameter open-source vision-language model designed specifically for high-fidelity document-to-text conversion while preserving complex layouts, tables, equations, and lists
- The model builds on SmolDocling-256M-preview with upgraded Granite 3-based architecture and SigLIP2 visual encoder, addressing previous stability issues like token repetition and incomplete parses through improved dataset filtering
- Granite-Docling uses DocTags structured markup format to describe page elements and their relationships, enabling outputs in Markdown, JSON, or HTML formats, with experimental multilingual support for Arabic, Chinese, and Japanese