Major AI Models Fail Security Tests as Claude Dominates Safety Rankings in New Benchmark

Dec 25, 2025

Dark Reading

Article image for Major AI Models Fail Security Tests as Claude Dominates Safety Rankings in New Benchmark

Summary

New security benchmark reveals major AI models including GPT and Gemini fail most jailbreak tests with scores as low as 40%, while Anthropic's Claude dominates safety rankings with 75-80% success rates, exposing widespread vulnerabilities across the industry despite advances in model size and capability.

Key Points

Giskard's PHARE benchmark report reveals that most large language models remain highly vulnerable to known jailbreak techniques, with GPT models passing security tests only 65-75% of the time while Gemini scores around 40%
Anthropic's Claude models significantly outperform all competitors across safety metrics, scoring 75-80% against jailbreaks and nearly perfectly on harmful content generation, creating a stark contrast with industry peers
Research shows that larger, more advanced LLM models perform no better at resisting attacks than smaller ones, with some smaller models actually blocking jailbreaks that larger models fall for due to their inability to parse complex malicious prompts

Major AI Models Fail Security Tests as Claude Dominates Safety Rankings in New Benchmark

Summary

Key Points

Tags