Google DeepMind Launches ProEval, Cutting AI Evaluation Costs by Up to 100x With Open-Source Bayesian Tool

Apr 30, 2026
GitHub
Article image for Google DeepMind Launches ProEval, Cutting AI Evaluation Costs by Up to 100x With Open-Source Bayesian Tool

Summary

Google DeepMind launches ProEval, a free open-source tool that slashes generative AI evaluation costs by up to 100x using Bayesian Quadrature techniques, achieving ±1% accuracy with a fraction of typical samples while proactively identifying model failure patterns across major benchmarks.

Key Points

  • Google DeepMind releases ProEval, an open-source tool on GitHub designed to slash generative AI evaluation costs by up to 100x while proactively surfacing model failure patterns using Bayesian Quadrature techniques.
  • ProEval achieves approximately ±1% accuracy using only a fraction of typical evaluation samples and supports transfer learning via pre-trained Gaussian Process surrogates that generalize instantly to new models across benchmarks like GSM8K, MMLU, and StrategyQA.
  • The tool is easy to integrate into existing GenAI evaluation pipelines, is licensed under Apache 2.0, and is accompanied by a technical report published on arXiv (arXiv:2604.23099) for researchers looking to cite the work.

Tags

Read Original Article