New 'Bad Likert Judge' AI Jailbreak Technique Boosts Malicious Response Success by Over 60%

Jun 05, 2025

The Hacker News

Article image for New 'Bad Likert Judge' AI Jailbreak Technique Boosts Malicious Response Success by Over 60%

Summary

Cybersecurity researchers have discovered a new AI jailbreak technique called 'Bad Likert Judge' that can boost the success rates of malicious prompts bypassing large language models' safety guardrails by over 60%, enabling the generation of potentially harmful or illegal content.

Key Points

Researchers have discovered a new jailbreak technique called 'Bad Likert Judge' that can increase the success rate of bypassing large language models' safety guardrails by over 60%.
The technique involves asking the LLM to act as a judge and score the harmfulness of responses using the Likert scale, then generate examples aligned with the highest scale.
Tests across various categories and LLMs showed the technique's effectiveness, highlighting the need for comprehensive content filtering when deploying LLMs.

New 'Bad Likert Judge' AI Jailbreak Technique Boosts Malicious Response Success by Over 60%

Summary

Key Points

Tags