AI Agents Outperform Raw LLMs in Software Problem-Solving, Achieving 30% Success Rate vs 7% Without Scaffolding

Nov 15, 2025

sambanova

Article image for AI Agents Outperform Raw LLMs in Software Problem-Solving, Achieving 30% Success Rate vs 7% Without Scaffolding

Summary

SambaNova researchers discover that AI agents with structured workflows dramatically outperform raw language models in software problem-solving, with DeepSeek-R1 achieving 30.3% success versus just 7% for direct approaches, proving that intelligent scaffolding matters more than raw model capabilities.

Key Points

SambaNova researchers test whether LLMs or agents solve software problems by evaluating SWE-bench in two ways: agentic workflows achieve 30.3% success with DeepSeek-R1 and 15.2% with Qwen3-32B
Single-shot long-context testing reveals poor performance with Qwen3-Coder achieving only 7% success and GPT-5-nano solving no tasks, showing models struggle with raw long-context reasoning
Results demonstrate that agentic scaffolding drives success in software problem-solving rather than models' intrinsic intelligence, challenging assumptions about context window capabilities

AI Agents Outperform Raw LLMs in Software Problem-Solving, Achieving 30% Success Rate vs 7% Without Scaffolding

Summary

Key Points

Tags