New AI Benchmark Tests Web Browsing Skills, Challenging Models with Complex Queries

Apr 11, 2025

openai

Article image for New AI Benchmark Tests Web Browsing Skills, Challenging Models with Complex Queries

Summary

BrowseComp, a novel AI benchmark, evaluates web browsing skills by posing 1,266 complex queries requiring synthesis of information across multiple websites, with a deep research model achieving 51.5% accuracy, surpassing other models' performance on this challenging task.

Key Points

BrowseComp is a new benchmark that measures the ability of AI agents to locate hard-to-find information on the internet
It consists of 1,266 challenging problems with short, verifiable answers that require browsing multiple websites to solve
A deep research model trained for web browsing achieved 51.5% accuracy on BrowseComp, significantly outperforming other models

New AI Benchmark Tests Web Browsing Skills, Challenging Models with Complex Queries

Summary

Key Points

Tags