Next-gen audio APIs by OpenAI promise enhanced voice experiences

Soon, you will hear new AI voices in games or automatic operators

News

2 min. read

Published on March 21, 2025

by Claudiu Andone

published on March 21, 2025

Share this article

Readers help support Windows Report. We may get a commission if you buy through our links.

OpenAI introduces new AI APIs for text to speech

OpenAI has announced the release of its latest audio models, designed to revolutionize the capabilities of voice agents. These new models, available via API, include advanced speech-to-text and text-to-speech functionalities, offering developers tools to create more expressive and customizable voice applications.

The new speech-to-text models, named gpt-4o-transcribe and gpt-4o-mini-transcribe, boast significant improvements in accuracy, language recognition, and reliability. These advancements were achieved through reinforcement learning and extensive training on diverse, high-quality audio datasets. The models are designed to handle challenging scenarios, such as accents, noisy environments, and varying speech speeds, with greater precision.

On the text-to-speech front, OpenAI introduced the gpt-4o-mini-tts model, which enhances steerability, allowing developers to control how text is articulated. While currently limited to preset artificial voices, this model represents a step forward in creating more natural and engaging voice interactions.

OpenAI has also integrated these models with its Agents SDK, enabling developers to build voice agents with ease. For applications requiring low-latency speech-to-speech experiences, OpenAI recommends using its Realtime API. So, in other words, we will soon hear more human-like voices in games or on the other side of the call services.

For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a new level of customization for voice agents. – from the OpenAI press release

Looking ahead, OpenAI plans to continue refining its audio models, with a focus on improving intelligence, accuracy, and customization options. The company also aims to explore ways for developers to incorporate custom voices while adhering to safety standards.

These new audio models are now available to developers, marking a significant milestone in the evolution of voice agent technology. With these tools, OpenAI is paving the way for more advanced and personalized voice-driven applications.

More about the topics: AI, OpenAI

Claudiu Andone

Windows Toubleshooting Expert

Oldtimer in the tech and science press, Claudiu is focused on whatever comes new from Microsoft. His abrupt interest in computers started when he saw the first Home Computer as a kid. However, his passion for Windows and everything related became obvious when he became a sys admin in a computer science high school. With 14 years of experience in writing about everything there is to know about science and technology, Claudiu also likes rock music, chilling in the garden, and Star Wars. May the force be with you, always!

User forum

0 messages

Sort by:

Leave a Reply Cancel reply