New AI Model Dia Brings Lifelike Dialogue and Voice Cloning to Text-to-Speech

Apr 25, 2025
GitHub
Article image for New AI Model Dia Brings Lifelike Dialogue and Voice Cloning to Text-to-Speech

Summary

Dia, a groundbreaking 1.6B parameter AI model, generates lifelike dialogue and voice cloning from text-to-speech, enabling highly realistic audio with controlled emotion, tone, and nonverbal cues like laughter, while supporting voice cloning by conditioning on provided audio transcripts.

Key Points

  • Dia is a 1.6B parameter text-to-speech model that generates highly realistic dialogue from a transcript in one pass.
  • The model can be conditioned on audio to control emotion and tone, and can produce nonverbal communications like laughter and coughing.
  • Dia supports voice cloning by providing a transcript of the audio to be cloned before the generation text, and the model outputs only the content of the provided script.

Tags

Read Original Article