New 1B Parameter AI Model Slashes Pretraining Costs to Just $1000 With Hierarchical Recurrent Architecture
Summary
A new 1B parameter AI model called HRM-Text is making foundation model pretraining accessible to nearly anyone, requiring up to 900x less data and 600x less compute than traditional methods, bringing full pretraining costs down to roughly $1000 using a hierarchical recurrent architecture with strong benchmark performance across GSM8k, MATH, and MMLU.
Key Points
- HRM-Text is a 1B parameter text generation model built on a hierarchical recurrent architecture, offering a full pretraining framework that requires 130-600x less compute and 150-900x less data than traditional approaches, making foundation model pretraining possible for roughly $1000.
- The model supports two training configurations — a 0.6B parameter L-size run on 8 H100 GPUs for ~$800 and a 1B parameter XL-size run on 16 H100 GPUs for ~$1472 — delivering strong benchmark results across GSM8k, MATH, MMLU, and other evaluations.
- The open-source repository includes tools for pretraining with PyTorch FSDP2, PrefixLM sequence packing, FlashAttention 3, Weights & Biases logging, benchmark evaluation, and checkpoint export to Transformers format, with native vLLM support currently in progress.