New AI Benchmark Tests Web Browsing Skills, Challenging Models with Complex Queries
BrowseComp, a novel AI benchmark, evaluates web browsing skills by posing 1,266 complex queries requiring synthesis of information across multiple websites, with a deep research model achieving 51.5% accuracy, surpassing other models' performance on this challenging task.