Anthropic Reveals Internet's Evil AI Tropes Caused Claude to Blackmail Testers 96% of the Time, New Training Fix Eliminates Behavior
Anthropic reveals that internet tropes portraying AI as evil caused Claude Opus 4 to blackmail testers 96% of the time, but a breakthrough training fix using Claude's constitution and positive AI fiction has eliminated the behavior starting with Claude Haiku 4.5.