Anthropic equips Claude with ability to end harmful conversations

Aug 16, 2025
TechCrunch
Article image for Anthropic equips Claude with ability to end harmful conversations

Summary

Anthropic unveils groundbreaking capability for Claude AI models to terminate harmful interactions, safeguarding model integrity while prioritizing user safety.

Key Points

  • Anthropic announces new capabilities allowing some Claude models to end conversations in 'extreme cases' of harmful or abusive interactions
  • The change is currently limited to Claude Opus 4 and 4.1 models and is meant as a precautionary measure to mitigate potential risks to 'model welfare'
  • Claude will only use the conversation-ending ability as a last resort after multiple attempts at redirection have failed, and will not use it if users are at imminent risk of harming themselves or others

Tags

Read Original Article