Mistral Launches Voxtral TTS, Claims Best Open-Source Text-to-Speech Model That Runs on Smartwatches
Summary
Mistral launches Voxtral TTS, a groundbreaking 3-billion parameter open-weight text-to-speech model that outperforms ElevenLabs, adapts to any voice in five seconds, and runs directly on smartwatches.
Key Points
- Mistral launches Voxtral TTS, its first open-weight text-to-speech model, claiming it is the best open-source TTS model to date, outperforming ElevenLabs v2.5 Flash in human naturalneness evaluations.
- Voxtral TTS is a 3-billion parameter model that supports nine languages, adapts to a voice in just five seconds, produces audio within 90 milliseconds, and is compact enough to run on-device on a smartphone or even a smartwatch.
- The model is now available for testing in Mistral Studio and as open weights on Hugging Face, targeting use cases such as customer support, real-time translation, and personal voice agents.