MIT Develops Self-Distillation Technique That Teaches AI New Skills Without Erasing Old Ones

Feb 11, 2026

Venturebeat

Article image for MIT Develops Self-Distillation Technique That Teaches AI New Skills Without Erasing Old Ones

Summary

MIT researchers unveil a breakthrough AI training technique called self-distillation fine-tuning that allows large language models to continuously learn new skills without forgetting old ones, potentially eliminating the need for companies to maintain multiple specialized AI models.

Key Points

MIT researchers, alongside teams from the Improbable AI Lab and ETH Zurich, introduce self-distillation fine-tuning (SDFT), a new technique that allows large language models to learn new skills without forgetting previously acquired capabilities.
SDFT uses a model's own in-context learning abilities to create a teacher-student feedback loop within a single model, enabling on-policy learning from expert demonstrations without requiring a reward function, outperforming traditional supervised fine-tuning methods.
In testing on Qwen 2.5, SDFT successfully accumulates multiple enterprise skills sequentially without performance regression, offering companies a path to maintain one model instead of separate specialized models, though it requires approximately 2.5 times more compute than standard fine-tuning.

MIT Develops Self-Distillation Technique That Teaches AI New Skills Without Erasing Old Ones

Summary

Key Points

Tags