OpenAI Releases Framework for Trustworthy AI Evaluations, Warns Flawed Testing Methods Can Dramatically Skew Results

Jun 01, 2026
OpenAI
Article image for OpenAI Releases Framework for Trustworthy AI Evaluations, Warns Flawed Testing Methods Can Dramatically Skew Results

Summary

OpenAI releases a framework warning that flawed AI testing methods can dramatically skew results, revealing that harness changes alone can cut a model's performance score nearly in half, while calling for full transparency in third-party evaluations.

Key Points

  • OpenAI is releasing guidance on how to conduct trustworthy third-party evaluations of frontier AI models, emphasizing that the 'harness' — the tools, scaffolding, and environment surrounding a model — critically shapes evaluation results and must be explicitly documented in reports.
  • Evaluation reports must address key validity threats including reward hacking, refusals, data contamination, broken problems, and sandbagging, with real-world examples showing these factors can dramatically shift capability estimates, such as harness changes cutting a GPT model's time-horizon score nearly in half.
  • OpenAI is actively supporting stronger evaluations by sharing maximum-elicitation guidance with evaluators, requiring Codex as a baseline harness for its models, and providing access to reasoning traces, while calling on emerging national and international AI evaluation standards to mandate full disclosure of harness choices, budgets, elicitation methods, and validity checks.

Tags

Read Original Article