Google DeepMind Expands AI Safety Framework to Address Manipulation Risks and Model Control Threats

Sep 23, 2025

Google DeepMind

Summary

Google DeepMind unveils its third Frontier Safety Framework iteration, introducing new safeguards against AI manipulation risks and addressing threats where advanced models could interfere with human operators' ability to control or shut them down.

Key Points

Google DeepMind releases the third iteration of its Frontier Safety Framework, expanding risk domains and refining assessment processes for advanced AI models
The updated framework introduces a new Critical Capability Level focused on harmful manipulation and addresses misalignment risks where AI models might interfere with operators' ability to control or shut them down
The framework now requires safety case reviews for both external launches and large-scale internal deployments when relevant capability levels are reached, demonstrating risks have been reduced to manageable levels

Google DeepMind Expands AI Safety Framework to Address Manipulation Risks and Model Control Threats

Summary

Key Points

Tags