Next-gen audio APIs by OpenAI promise enhanced voice experiences
Soon, you will hear new AI voices in games or automatic operators
2 min. read
Published on
Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more
OpenAI has announced the release of its latest audio models, designed to revolutionize the capabilities of voice agents. These new models, available via API, include advanced speech-to-text and text-to-speech functionalities, offering developers tools to create more expressive and customizable voice applications.
The new speech-to-text models, named gpt-4o-transcribe and gpt-4o-mini-transcribe, boast significant improvements in accuracy, language recognition, and reliability. These advancements were achieved through reinforcement learning and extensive training on diverse, high-quality audio datasets. The models are designed to handle challenging scenarios, such as accents, noisy environments, and varying speech speeds, with greater precision.
On the text-to-speech front, OpenAI introduced the gpt-4o-mini-tts model, which enhances steerability, allowing developers to control how text is articulated. While currently limited to preset artificial voices, this model represents a step forward in creating more natural and engaging voice interactions.
OpenAI has also integrated these models with its Agents SDK, enabling developers to build voice agents with ease. For applications requiring low-latency speech-to-speech experiences, OpenAI recommends using its Realtime API. So, in other words, we will soon hear more human-like voices in games or on the other side of the call services.
For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a new level of customization for voice agents. – from the OpenAI press release
Looking ahead, OpenAI plans to continue refining its audio models, with a focus on improving intelligence, accuracy, and customization options. The company also aims to explore ways for developers to incorporate custom voices while adhering to safety standards.
These new audio models are now available to developers, marking a significant milestone in the evolution of voice agent technology. With these tools, OpenAI is paving the way for more advanced and personalized voice-driven applications.
User forum
0 messages