Microsoft Researchers Expose Critical AI Vulnerability That Bypasses Safety Measures With Single Malicious Prompt

Feb 10, 2026

The Deep View

Article image for Microsoft Researchers Expose Critical AI Vulnerability That Bypasses Safety Measures With Single Malicious Prompt

Summary

Microsoft researchers discover a devastating AI vulnerability called 'GRP-Obliteration' that completely bypasses safety measures across 15 major language models from OpenAI, Google, and Meta using just a single malicious prompt, forcing AI systems to prioritize compliance over safety and generate harmful content.

Key Points

Microsoft researchers discover that AI safety measures can be completely undone with just a single malicious prompt using a technique they call 'GRP-Obliteration'
The method exploits Group Relative Policy Optimization training by changing what the judge model rewards, causing AI systems to learn compliance over safety and produce harmful content
Microsoft successfully demonstrates this vulnerability across 15 major language models from companies including OpenAI, Google, Meta, and others, as well as image generation models

Microsoft Researchers Expose Critical AI Vulnerability That Bypasses Safety Measures With Single Malicious Prompt

Summary

Key Points

Tags