New AI Benchmark Tests Web Browsing Skills, Challenging Models with Complex Queries

Apr 11, 2025
openai
Article image for New AI Benchmark Tests Web Browsing Skills, Challenging Models with Complex Queries

Summary

BrowseComp, a novel AI benchmark, evaluates web browsing skills by posing 1,266 complex queries requiring synthesis of information across multiple websites, with a deep research model achieving 51.5% accuracy, surpassing other models' performance on this challenging task.

Key Points

  • BrowseComp is a new benchmark that measures the ability of AI agents to locate hard-to-find information on the internet
  • It consists of 1,266 challenging problems with short, verifiable answers that require browsing multiple websites to solve
  • A deep research model trained for web browsing achieved 51.5% accuracy on BrowseComp, significantly outperforming other models

Tags

Read Original Article