New AI Framework AutoTTS Slashes LLM Token Usage by 70% While Maintaining Accuracy

May 12, 2026

GitHub

Article image for New AI Framework AutoTTS Slashes LLM Token Usage by 70% While Maintaining Accuracy

Summary

A new AI framework called AutoTTS automatically discovers test-time scaling strategies for large language models, cutting token usage by nearly 70% while maintaining accuracy — and the entire discovery process costs under $40 and takes less than three hours.

Key Points

AutoTTS introduces a framework that automates the discovery of test-time scaling (TTS) strategies for LLMs by using a coding agent to iteratively propose and refine controller programs within an offline replay environment, requiring zero LLM calls during evaluation.
The system discovers the Confidence Momentum Controller (CMC), which uses trend-based stopping via exponential moving averages, coupled width-depth control, alignment-aware depth allocation, and conservative branch abandonment to outperform handcrafted baselines like SC@64, ASC, and ESC.
At a beta of 0.5, the discovered controller cuts token usage by roughly 69.5% compared to SC@64 while matching held-out accuracy across multiple Qwen3 backbone scales, with a full discovery run costing an estimated $39.90 and 160 minutes of wall-clock time.

New AI Framework AutoTTS Slashes LLM Token Usage by 70% While Maintaining Accuracy

Summary

Key Points

Tags