New RL4HS Framework Outperforms Existing Models in Detecting Hallucinated Spans in AI-Generated Text
A new reinforcement learning framework called RL4HS is outperforming existing AI models in detecting hallucinated spans in large language model outputs, using Group Relative Policy Optimization and a novel Class-Aware Policy Optimization technique to deliver superior results across summarization, question answering, and data-to-text tasks.