New AI Agent Fleet SWE-AF Crushes Industry Benchmarks at a Fraction of the Cost

Feb 24, 2026
GitHub
Article image for New AI Agent Fleet SWE-AF Crushes Industry Benchmarks at a Fraction of the Cost

Summary

A new autonomous AI agent fleet called SWE-AF is outperforming industry giants like Claude Code and Codex o3 on software engineering benchmarks, scoring 95/100 while costing as little as $6 per run using budget-friendly models.

Key Points

  • SWE-AF is an autonomous software engineering fleet of AI agents that plans, codes, tests, and ships production-grade pull requests using a single API call, supporting both Claude and open-source model runtimes.
  • The system uses three nested control loops for adaptive factory control, scaling from simple tasks to complex multi-issue programs with hundreds to thousands of agent invocations, and supports parallel execution across isolated git worktrees.
  • In benchmark testing, SWE-AF scores 95/100 using haiku-class Claude models at approximately $20 and MiniMax M2.5 via OpenRouter at just $6, outperforming Claude Code Sonnet (73), Codex o3 (62), and Claude Code Haiku (59) on identical prompts.

Tags

Read Original Article