IBM, NVIDIA, and Red Hat Launch Open AI Document Format to Replace PDF with Up to 30x Lower Token Costs

Jun 16, 2026
theregister
Article image for IBM, NVIDIA, and Red Hat Launch Open AI Document Format to Replace PDF with Up to 30x Lower Token Costs

Summary

IBM, NVIDIA, and Red Hat are spearheading DocLang, a groundbreaking open AI-native document format built to replace PDFs in enterprise AI pipelines, delivering up to 30x lower token costs while preserving semantic structure and slashing hallucination risks.

Key Points

  • A new working group under the LF AI & Data Foundation, founded by IBM, NVIDIA, Red Hat, ABBYY, HumanSignal, and Forgis, is developing DocLang, an open AI-native document format designed to replace formats like PDF, HTML, and Markdown for enterprise AI pipelines.
  • DocLang uses a limited XML vocabulary optimized for LLM tokenizers on a 1-to-1 basis, preserving semantic structure, layout, and governance metadata that current formats lose during AI processing, reducing hallucination risk and improving output accuracy.
  • Early benchmarks show DocLang delivers 4x to over 30x lower token costs compared to PDFs, with a real-world test of IBM's 2025 annual report showing fewer input tokens, lower latency, and better AI output quality when using the new format.

Tags

Read Original Article