Leading AI Models Achieve Only 24% Accuracy on Real White-Collar Work Tasks in New Benchmark Test

Jan 23, 2026

TechCrunch

Article image for Leading AI Models Achieve Only 24% Accuracy on Real White-Collar Work Tasks in New Benchmark Test

Summary

Leading AI models achieve only 24% accuracy on real white-collar tasks in new benchmark testing, with Gemini 3 Flash and GPT-5.2 struggling most when tracking information across multiple workplace tools like Slack and Google Drive in consulting, banking, and legal scenarios.

Key Points

New APEX-Agents benchmark tests leading AI models on real white-collar tasks from consulting, investment banking, and law, with the best models achieving only 24% accuracy
AI systems struggle most with tracking information across multiple domains and tools like Slack and Google Drive, which represents how professionals actually work
Gemini 3 Flash leads performance at 24% accuracy followed by GPT-5.2 at 23%, while other models including Opus 4.5 and GPT-5 score around 18%

Leading AI Models Achieve Only 24% Accuracy on Real White-Collar Work Tasks in New Benchmark Test

Summary

Key Points

Tags