Allen Institute for AI Unveils Bolmo Models That Process Raw Text Bytes Without Tokenizers

Dec 17, 2025
Venturebeat
Article image for Allen Institute for AI Unveils Bolmo Models That Process Raw Text Bytes Without Tokenizers

Summary

Allen Institute for AI launches Bolmo, revolutionary language models that process raw UTF-8 bytes directly without tokenizers, offering superior multilingual capabilities and handling of text errors through innovative 'bytefying' technology that transforms existing models cost-effectively.

Key Points

  • Allen Institute for AI introduces Bolmo, a new family of byte-level language models including 7B and 1B versions that operate directly on raw UTF-8 bytes without requiring tokenizers
  • Bolmo models are built by 'bytefiying' existing Olmo 3 models through a two-stage training process, making byte-level training more cost-effective than training from scratch
  • The models demonstrate strong performance against comparable byte-level and character-based models while offering enterprises better handling of multilingual text, misspellings, and noisy inputs

Tags

Read Original Article