Research

499 articles found

Base LLMs Show Strong Semantic Confidence Accuracy, But Fine-Tuning and Chain-of-Thought Reasoning Destroy It

Base LLMs Show Strong Semantic Confidence Accuracy, But Fine-Tuning and Chain-of-Thought Reasoning Destroy It

Mar 25, 2026
Apple Machine Learning Research

New research reveals that base large language models possess strong semantic confidence accuracy, but popular techniques like fine-tuning and chain-of-thought reasoning actively destroy this calibration, raising urgent questions about the reliability of widely deployed AI systems.

Previous
Page 2 of 50
Next
Showing 11 - 20 of 499 articles