Leading AI Models Achieve Only 24% Accuracy on Real White-Collar Work Tasks in New Benchmark Test
Leading AI models achieve only 24% accuracy on real white-collar tasks in new benchmark testing, with Gemini 3 Flash and GPT-5.2 struggling most when tracking information across multiple workplace tools like Slack and Google Drive in consulting, banking, and legal scenarios.