New AI Model Dia Brings Lifelike Dialogue and Voice Cloning to Text-to-Speech
 
                Summary
Dia, a groundbreaking 1.6B parameter AI model, generates lifelike dialogue and voice cloning from text-to-speech, enabling highly realistic audio with controlled emotion, tone, and nonverbal cues like laughter, while supporting voice cloning by conditioning on provided audio transcripts.
Key Points
- Dia is a 1.6B parameter text-to-speech model that generates highly realistic dialogue from a transcript in one pass.
- The model can be conditioned on audio to control emotion and tone, and can produce nonverbal communications like laughter and coughing.
- Dia supports voice cloning by providing a transcript of the audio to be cloned before the generation text, and the model outputs only the content of the provided script.