Natural Language

1498 articles found

New Benchmark Exposes Hidden 'Flinch' Effect in AI Models That Suppresses Words at Probability Level, Defying Uncensoring Fixes

New Benchmark Exposes Hidden 'Flinch' Effect in AI Models That Suppresses Words at Probability Level, Defying Uncensoring Fixes

Apr 21, 2026
Morgin.ai

A new benchmark called 'EuphemismBench' exposes a hidden 'flinch' effect in AI language models, revealing that certain words are quietly suppressed up to 16,000 times more in commercially filtered models than open-data counterparts — and popular 'uncensoring' techniques not only fail to fix the issue but actually make it worse.

Previous
Page 33 of 150
Next
Showing 321 - 330 of 1498 articles