ElevenLabs, a leading AI startup, has made a significant move in speech technology by launching its first stand-alone speech-to-text model called Scribe. This milestone comes after the company raised $180 million in funding and valued at $3.3 billion.
Scribe boasts support for over 99 languages, with high accuracy rates for many languages. The model’s word error rate is less than 5% for excellent accuracy category languages like English, French, German, Hindi, Indonesian, Japanese, and others. ElevenLabs claims that its Scribe model outperformed Google Gemini 2.0 Flash and Whisper Large V3 across multiple languages in benchmark tests.
Scribe offers advanced features such as smart speaker diarization, timestamped subtitles, and auto-tagging sound events like audience laughter. The model currently only works with pre-recorded audio formats but will soon release a low-latency real-time version.
ElevenLabs’ CEO Mati Staniszewski emphasized the importance of improving speech detection models to better understand conversations. “We want to move away from only generating content and understanding, and transcribing speech,” he said.
The company is pricing Scribe at $0.40 per hour of transcribed audio, with competitive pricing but some rivals offering lower rates for similar services.
Source: https://techcrunch.com/2025/02/26/elevenlabs-is-launching-its-own-speech-to-text-model