Google DeepMind Launches ProEval, Cutting AI Evaluation Costs by Up to 100x With Open-Source Bayesian Tool

Apr 30, 2026

GitHub

Article image for Google DeepMind Launches ProEval, Cutting AI Evaluation Costs by Up to 100x With Open-Source Bayesian Tool

Summary

Google DeepMind launches ProEval, a free open-source tool that slashes generative AI evaluation costs by up to 100x using Bayesian Quadrature techniques, achieving ±1% accuracy with a fraction of typical samples while proactively identifying model failure patterns across major benchmarks.

Key Points

Google DeepMind releases ProEval, an open-source tool on GitHub designed to slash generative AI evaluation costs by up to 100x while proactively surfacing model failure patterns using Bayesian Quadrature techniques.
ProEval achieves approximately ±1% accuracy using only a fraction of typical evaluation samples and supports transfer learning via pre-trained Gaussian Process surrogates that generalize instantly to new models across benchmarks like GSM8K, MMLU, and StrategyQA.
The tool is easy to integrate into existing GenAI evaluation pipelines, is licensed under Apache 2.0, and is accompanied by a technical report published on arXiv (arXiv:2604.23099) for researchers looking to cite the work.

Google DeepMind Launches ProEval, Cutting AI Evaluation Costs by Up to 100x With Open-Source Bayesian Tool

Summary

Key Points

Tags