AI Security Breakthrough: New System Detects Malicious Prompts by Reading AI's Internal Thoughts, Outperforms Existing Guards
Zenity Labs develops groundbreaking AI security system that reads language models' internal thoughts to detect malicious prompts, outperforming existing guards like Prompt-Guard-2 in stopping jailbreaks and prompt injections despite higher false positive rates.