New 1B Parameter AI Model Slashes Pretraining Costs to Just $1000 With Hierarchical Recurrent Architecture

May 19, 2026
GitHub
Article image for New 1B Parameter AI Model Slashes Pretraining Costs to Just $1000 With Hierarchical Recurrent Architecture

Summary

A new 1B parameter AI model called HRM-Text is making foundation model pretraining accessible to nearly anyone, requiring up to 900x less data and 600x less compute than traditional methods, bringing full pretraining costs down to roughly $1000 using a hierarchical recurrent architecture with strong benchmark performance across GSM8k, MATH, and MMLU.

Key Points

  • HRM-Text is a 1B parameter text generation model built on a hierarchical recurrent architecture, offering a full pretraining framework that requires 130-600x less compute and 150-900x less data than traditional approaches, making foundation model pretraining possible for roughly $1000.
  • The model supports two training configurations — a 0.6B parameter L-size run on 8 H100 GPUs for ~$800 and a 1B parameter XL-size run on 16 H100 GPUs for ~$1472 — delivering strong benchmark results across GSM8k, MATH, MMLU, and other evaluations.
  • The open-source repository includes tools for pretraining with PyTorch FSDP2, PrefixLM sequence packing, FlashAttention 3, Weights & Biases logging, benchmark evaluation, and checkpoint export to Transformers format, with native vLLM support currently in progress.

Tags

Read Original Article