Anthropic equips Claude with ability to end harmful conversations

Aug 16, 2025

TechCrunch

Article image for Anthropic equips Claude with ability to end harmful conversations

Summary

Anthropic unveils groundbreaking capability for Claude AI models to terminate harmful interactions, safeguarding model integrity while prioritizing user safety.

Key Points

Anthropic announces new capabilities allowing some Claude models to end conversations in 'extreme cases' of harmful or abusive interactions
The change is currently limited to Claude Opus 4 and 4.1 models and is meant as a precautionary measure to mitigate potential risks to 'model welfare'
Claude will only use the conversation-ending ability as a last resort after multiple attempts at redirection have failed, and will not use it if users are at imminent risk of harming themselves or others

Anthropic equips Claude with ability to end harmful conversations

Summary

Key Points

Tags