New AI Agent Fleet SWE-AF Crushes Industry Benchmarks at a Fraction of the Cost
Summary
A new autonomous AI agent fleet called SWE-AF is outperforming industry giants like Claude Code and Codex o3 on software engineering benchmarks, scoring 95/100 while costing as little as $6 per run using budget-friendly models.
Key Points
- SWE-AF is an autonomous software engineering fleet of AI agents that plans, codes, tests, and ships production-grade pull requests using a single API call, supporting both Claude and open-source model runtimes.
- The system uses three nested control loops for adaptive factory control, scaling from simple tasks to complex multi-issue programs with hundreds to thousands of agent invocations, and supports parallel execution across isolated git worktrees.
- In benchmark testing, SWE-AF scores 95/100 using haiku-class Claude models at approximately $20 and MiniMax M2.5 via OpenRouter at just $6, outperforming Claude Code Sonnet (73), Codex o3 (62), and Claude Code Haiku (59) on identical prompts.