Base LLMs Show Strong Semantic Confidence Accuracy, But Fine-Tuning and Chain-of-Thought Reasoning Destroy It

Mar 25, 2026
Apple Machine Learning Research
Article image for Base LLMs Show Strong Semantic Confidence Accuracy, But Fine-Tuning and Chain-of-Thought Reasoning Destroy It

Summary

New research reveals that base large language models possess strong semantic confidence accuracy, but popular techniques like fine-tuning and chain-of-thought reasoning actively destroy this calibration, raising urgent questions about the reliability of widely deployed AI systems.

Key Points

  • Base LLMs demonstrate remarkable semantic calibration in open-domain question-answering tasks, meaning they can meaningfully assess confidence in the actual meaning of their responses, not just at the token level.
  • Researchers establish a theoretical mechanism explaining how semantic calibration naturally emerges as a byproduct of next-token prediction training, introducing a generalized concept called 'B-calibration' based on equivalence classes.
  • Experiments reveal that while base LLMs are semantically well-calibrated, both reinforcement learning instruction-tuning and chain-of-thought reasoning systematically break this calibration.

Tags

Read Original Article