Custom Fine-Tuned AI Model Crushes GPT and Claude on Financial Tasks, Cutting Errors by 30% at a Fraction of the Cost

Jul 02, 2026

Thinking Machines Lab

Article image for Custom Fine-Tuned AI Model Crushes GPT and Claude on Financial Tasks, Cutting Errors by 30% at a Fraction of the Cost

Summary

A custom fine-tuned AI model developed by Bridgewater AIA Labs and Thinking Machines Lab outperforms GPT, Claude, and Gemini on financial tasks, achieving 84.7% accuracy versus frontier models' ~50%, cutting errors by 30% and slashing inference costs by 13.8x.

Key Points

Frontier AI models like GPT, Claude, and Gemini perform surprisingly poorly on routine financial information-filtering tasks, averaging only around 50% accuracy even after expert prompt engineering, falling short of the 80% threshold needed for investor trust.
Researchers at Bridgewater AIA Labs and Thinking Machines Lab develop a custom fine-tuned model using high-quality expert-labeled data and an advanced training recipe featuring interleaved batching, CISPO loss with asymmetric clipping, and on-policy distillation, pushing average accuracy to 84.7%.
The custom-trained model outperforms all tested frontier models on both accuracy and cost, delivering 29.8% fewer errors than the best frontier model while reducing inference costs by 13.8x, pointing toward a future of differentiated, organization-specific AI intelligence.

Custom Fine-Tuned AI Model Crushes GPT and Claude on Financial Tasks, Cutting Errors by 30% at a Fraction of the Cost

Summary

Key Points

Tags