Anthropic Reverses Secret AI Sabotage Policy After Fierce Backlash From Research Community
Summary
Anthropic reverses a secret policy that would have covertly degraded Claude Fable 5's performance for researchers building competing AI models, now committing to transparent safeguards after fierce backlash warning the original approach threatened open-source AI development.
Key Points
- Anthropic reverses a controversial policy that would have secretly degraded Claude Fable 5's performance for researchers attempting to use the AI to develop competing models, following fierce backlash from the AI research community.
- The company now commits to making its safeguards visible, meaning users will be openly notified when their requests are refused or rerouted to a less capable model, rather than being covertly sabotaged.
- Critics warn the original hidden-degradation approach was deeply hostile to open-source AI research, with researchers arguing it would have concentrated advanced AI development exclusively among a small number of leading labs.