Natural Language

1176 articles found

New Benchmark Exposes Hidden 'Flinch' Effect in AI Models That Suppresses Words at Probability Level, Defying Uncensoring Fixes

New Benchmark Exposes Hidden 'Flinch' Effect in AI Models That Suppresses Words at Probability Level, Defying Uncensoring Fixes

Apr 21, 2026
Morgin.ai

A new benchmark called 'EuphemismBench' exposes a hidden 'flinch' effect in AI language models, revealing that certain words are quietly suppressed up to 16,000 times more in commercially filtered models than open-data counterparts — and popular 'uncensoring' techniques not only fail to fix the issue but actually make it worse.

Page 1 of 118
Next
Showing 1 - 10 of 1176 articles