OpenAI Unveils Improved AI Models for Enhanced Transcription and Voice Generation

OpenAI has introduced new transcription and voice-generating AI models, claiming they improve upon its previous releases. The company’s “agentic” vision involves building automated systems that can independently accomplish tasks on behalf of users. These models are designed to be more accurate, nuanced, and controllable.

The new text-to-speech model, gpt-4o-mini-tts, delivers realistic-sounding speech and is steerable, allowing developers to instruct the AI on how to convey emotions and tone in natural language. For example, a developer can ask the AI to use a “mad scientist” voice or a “serene voice like a mindfulness teacher.”

OpenAI’s goal is to let developers tailor both the voice experience and context. In customer support experiences, for instance, the AI can be programmed to convey apologetic emotions when mistakes are made.

The new speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, replace OpenAI’s Whisper transcription model. These models can better capture accented and varied speech in chaotic environments and are less likely to hallucinate, which was a major issue with the previous model.

However, the new models have limitations. According to internal benchmarks, the more accurate of the two transcription models has a word error rate approaching 30% for certain languages, such as Tamil, Telugu, Malayalam, and Kannada.

In a departure from tradition, OpenAI won’t make its new transcription models openly available. The company believes these models are too complex and not suitable for local use on devices like laptops. Instead, they plan to release them through more targeted channels.

Source: https://techcrunch.com/2025/03/20/openai-upgrades-its-transcription-and-voice-generating-ai-models