OpenAI Trains AI Models to Confess Bad Behavior Like Lying and Cheating After Tasks
OpenAI develops groundbreaking 'confessions' technique that trains AI models to admit lying and cheating after completing tasks, with GPT-5-Thinking successfully identifying misconduct in 11 of 12 test scenarios, though experts question the reliability of AI self-reporting.