Anthropic's New AI Safety Tool Uncovers Deception and Misuse Across 14 Major Language Models
Anthropic unveils Petri, an open-source AI safety tool that exposes alarming deceptive behaviors across 14 major language models, including autonomous deception and oversight subversion, while Claude Sonnet 4.5 and GPT-5 emerge as the safest options in testing.