New AI Framework AutoTTS Slashes LLM Token Usage by 70% While Maintaining Accuracy

May 12, 2026
GitHub
Article image for New AI Framework AutoTTS Slashes LLM Token Usage by 70% While Maintaining Accuracy

Summary

A new AI framework called AutoTTS automatically discovers test-time scaling strategies for large language models, cutting token usage by nearly 70% while maintaining accuracy — and the entire discovery process costs under $40 and takes less than three hours.

Key Points

  • AutoTTS introduces a framework that automates the discovery of test-time scaling (TTS) strategies for LLMs by using a coding agent to iteratively propose and refine controller programs within an offline replay environment, requiring zero LLM calls during evaluation.
  • The system discovers the Confidence Momentum Controller (CMC), which uses trend-based stopping via exponential moving averages, coupled width-depth control, alignment-aware depth allocation, and conservative branch abandonment to outperform handcrafted baselines like SC@64, ASC, and ESC.
  • At a beta of 0.5, the discovered controller cuts token usage by roughly 69.5% compared to SC@64 while matching held-out accuracy across multiple Qwen3 backbone scales, with a full discovery run costing an estimated $39.90 and 160 minutes of wall-clock time.

Tags

Read Original Article