AI System Detects 'Intrusive Thoughts' When Scientists Inject 'Betrayal' Concept Into Its Neural Networks
Anthropic scientists successfully inject the concept of 'betrayal' into Claude AI's neural networks, with the system detecting it as an 'intrusive thought' and marking the first evidence that large language models can observe their own internal mental processes.