New 'Bad Likert Judge' AI Jailbreak Technique Boosts Malicious Response Success by Over 60%

Jun 05, 2025
The Hacker News
Article image for New 'Bad Likert Judge' AI Jailbreak Technique Boosts Malicious Response Success by Over 60%

Summary

Cybersecurity researchers have discovered a new AI jailbreak technique called 'Bad Likert Judge' that can boost the success rates of malicious prompts bypassing large language models' safety guardrails by over 60%, enabling the generation of potentially harmful or illegal content.

Key Points

  • Researchers have discovered a new jailbreak technique called 'Bad Likert Judge' that can increase the success rate of bypassing large language models' safety guardrails by over 60%.
  • The technique involves asking the LLM to act as a judge and score the harmfulness of responses using the Likert scale, then generate examples aligned with the highest scale.
  • Tests across various categories and LLMs showed the technique's effectiveness, highlighting the need for comprehensive content filtering when deploying LLMs.

Tags

Read Original Article