Fish Audio Launches S1 Voice Cloning Model That Replicates Speech from 10-Second Audio Samples
Summary
Fish Audio unveils S1 voice cloning technology that perfectly replicates human speech patterns, accents, and tone using only 10 seconds of audio samples, while undercutting major competitor ElevenLabs by 600% with $5 million in annual revenue and 20,000 active developers.
Key Points
- Fish Audio launches S1, an expressive text-to-speech model that clones voices from just 10 seconds of audio while preserving accent, tone, and speaking habits
- The company reports 20,000 active developers, $5 million ARR, and pricing that is 6x cheaper than competitor ElevenLabs
- Fish Audio offers both commercial API services and open-source models, with the team behind popular voice cloning projects So-VITS-SVC and Bert-VITS2