Miso Labs Launches MisoTTS: Open-Source 8B-Parameter Text-to-Speech Model With Voice Cloning Hits Hugging Face

Jun 04, 2026
GitHub
Article image for Miso Labs Launches MisoTTS: Open-Source 8B-Parameter Text-to-Speech Model With Voice Cloning Hits Hugging Face

Summary

Miso Labs launches MisoTTS, an open-source 8-billion-parameter text-to-speech model now live on Hugging Face, featuring voice cloning, highly emotive conversational speech, and built-in audio watermarking for safety.

Key Points

  • Miso Labs releases MisoTTS, an 8-billion-parameter text-to-speech model built on an RVQ Transformer architecture inspired by Sesame CSM, currently supporting English only and capable of highly emotive conversational speech generation.
  • The model features a dual-transformer design with a large Llama-8B backbone and a smaller Llama-300M audio decoder, supports voice cloning via audio context prompting, and watermarks all generated audio by default for safety.
  • Weights are publicly available on Hugging Face, the repository has already garnered over 1,300 stars on GitHub, and users are warned against using the model for impersonation, fraud, or any deceptive audio generation.

Tags

Read Original Article