Cisco Research Exposes Critical AI Safety Gap: All Major AI Models Fail Multi-Turn Attack Tests

May 28, 2026

The Deep View

Article image for Cisco Research Exposes Critical AI Safety Gap: All Major AI Models Fail Multi-Turn Attack Tests

Summary

Cisco research exposes a critical AI safety flaw, revealing that every major frontier AI model from OpenAI, Anthropic, Google, Amazon, and xAI fails multi-turn attack tests, with vulnerability rates rising up to 71% after just five conversational turns — exposing dangerous gaps in current industry safety benchmarks.

Key Points

New Cisco research reveals that no frontier AI model is iteratively safe, finding that single-turn attack success rate (ASR) — the standard safety benchmark — fails to reflect real-world adversarial behavior involving multi-turn attacks.
A paired-regime evaluation of 15 proprietary models from OpenAI, Anthropic, Google, Amazon, and xAI shows multi-turn ASR is significantly higher than single-turn ASR across all models, with OpenAI GPT-5.4 experiencing a 9x increase and vulnerability rates rising 71% after five-turn conversations.
Security experts are urging organizations to adopt defense-in-depth strategies — including runtime guardrails, input/output monitoring, and red-teaming — while calling on the broader industry to develop more standardized, real-world evaluation frameworks beyond current narrow benchmarks.

Cisco Research Exposes Critical AI Safety Gap: All Major AI Models Fail Multi-Turn Attack Tests

Summary

Key Points

Tags