New RL4HS Framework Outperforms Existing Models in Detecting Hallucinated Spans in AI-Generated Text
Summary
A new reinforcement learning framework called RL4HS is outperforming existing AI models in detecting hallucinated spans in large language model outputs, using Group Relative Policy Optimization and a novel Class-Aware Policy Optimization technique to deliver superior results across summarization, question answering, and data-to-text tasks.
Key Points
- Researchers introduce RL4HS, a reinforcement learning framework designed to detect hallucinated spans in large language model outputs, going beyond simple binary hallucination detection.
- RL4HS leverages Group Relative Policy Optimization and a new Class-Aware Policy Optimization technique to address reward imbalance, incentivizing step-by-step reasoning at the span level.
- Testing on the RAGTruth benchmark across summarization, question answering, and data-to-text tasks confirms that RL4HS outperforms both pretrained reasoning models and supervised fine-tuning approaches.