Cursor's Composer 2 Beats Claude Opus 4.6 on Coding Benchmarks at a Fraction of the Cost
Summary
Cursor's newly released Composer 2 coding model outperforms Anthropic's Claude Opus 4.6 on Terminal-Bench 2.0 benchmarks while costing up to 90% less, thanks to a breakthrough 'self-summarization' training technique that compresses context windows and cuts compaction errors in half.
Key Points
- Cursor releases Composer 2, its third-generation in-house coding model, which outperforms Anthropic's Claude Opus 4.6 on Terminal-Bench 2.0 with a score of 61.7% versus 58.0%, while costing significantly less at $0.5/$2.5 per million input/output tokens compared to Opus 4.6's $5/$25.
- The key technical breakthrough behind Composer 2 is a training technique called 'self-summarization,' where the model is trained to compress its own context window down to roughly 1,000 tokens during long-horizon tasks, reducing compaction errors by 50% and enabling more effective reinforcement learning across extended coding sessions.
- Composer 2 marks the first version where Cursor applied continuous pre-training before reinforcement learning, a shift from previous Composer models, and represents the third major release in just five months since the original Composer launched in October 2025.