Karpathy's 'Autoresearch' Tool Lets AI Run 100 GPU Training Experiments Overnight While You Sleep
Summary
Andrej Karpathy's new 'autoresearch' tool enables an AI agent to autonomously run roughly 100 LLM training experiments overnight on a single NVIDIA GPU, modifying code, evaluating results, and iterating — all guided by a simple plain-text file instead of manual coding.
Key Points
- Karpathy's 'autoresearch' repo lets an AI agent autonomously run LLM training experiments overnight on a single NVIDIA GPU, modifying train.py, evaluating results, and iterating — targeting roughly 100 experiments while you sleep.
- Each experiment runs on a fixed 5-minute time budget, using validation bits-per-byte (val_bpb) as the core metric, ensuring fair comparison across architectural and hyperparameter changes regardless of what the agent modifies.
- Researchers guide the AI agent not by editing Python code directly, but by updating a plain-text program.md file that acts as lightweight instructions, making the 'research org' itself the programmable and iterable artifact.