New AI Model Dia Brings Lifelike Dialogue and Voice Cloning to Text-to-Speech

Apr 25, 2025

GitHub

Article image for New AI Model Dia Brings Lifelike Dialogue and Voice Cloning to Text-to-Speech

Summary

Dia, a groundbreaking 1.6B parameter AI model, generates lifelike dialogue and voice cloning from text-to-speech, enabling highly realistic audio with controlled emotion, tone, and nonverbal cues like laughter, while supporting voice cloning by conditioning on provided audio transcripts.

Key Points

Dia is a 1.6B parameter text-to-speech model that generates highly realistic dialogue from a transcript in one pass.
The model can be conditioned on audio to control emotion and tone, and can produce nonverbal communications like laughter and coughing.
Dia supports voice cloning by providing a transcript of the audio to be cloned before the generation text, and the model outputs only the content of the provided script.

New AI Model Dia Brings Lifelike Dialogue and Voice Cloning to Text-to-Speech

Summary

Key Points

Tags