Allen Institute for AI Unveils Bolmo Models That Process Raw Text Bytes Without Tokenizers

Dec 17, 2025

Venturebeat

Article image for Allen Institute for AI Unveils Bolmo Models That Process Raw Text Bytes Without Tokenizers

Summary

Allen Institute for AI launches Bolmo, revolutionary language models that process raw UTF-8 bytes directly without tokenizers, offering superior multilingual capabilities and handling of text errors through innovative 'bytefying' technology that transforms existing models cost-effectively.

Key Points

Allen Institute for AI introduces Bolmo, a new family of byte-level language models including 7B and 1B versions that operate directly on raw UTF-8 bytes without requiring tokenizers
Bolmo models are built by 'bytefiying' existing Olmo 3 models through a two-stage training process, making byte-level training more cost-effective than training from scratch
The models demonstrate strong performance against comparable byte-level and character-based models while offering enterprises better handling of multilingual text, misspellings, and noisy inputs

Allen Institute for AI Unveils Bolmo Models That Process Raw Text Bytes Without Tokenizers

Summary

Key Points

Tags