Next-gen audio APIs by OpenAI promise enhanced voice experiences

Soon, you will hear new AI voices in games or automatic operators

Reading time icon 2 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

OpenAI introduces new AI APIs for text to speech

OpenAI has announced the release of its latest audio models, designed to revolutionize the capabilities of voice agents. These new models, available via API, include advanced speech-to-text and text-to-speech functionalities, offering developers tools to create more expressive and customizable voice applications.

The new speech-to-text models, named gpt-4o-transcribe and gpt-4o-mini-transcribe, boast significant improvements in accuracy, language recognition, and reliability. These advancements were achieved through reinforcement learning and extensive training on diverse, high-quality audio datasets. The models are designed to handle challenging scenarios, such as accents, noisy environments, and varying speech speeds, with greater precision.

On the text-to-speech front, OpenAI introduced the gpt-4o-mini-tts model, which enhances steerability, allowing developers to control how text is articulated. While currently limited to preset artificial voices, this model represents a step forward in creating more natural and engaging voice interactions.

OpenAI has also integrated these models with its Agents SDK, enabling developers to build voice agents with ease. For applications requiring low-latency speech-to-speech experiences, OpenAI recommends using its Realtime API. So, in other words, we will soon hear more human-like voices in games or on the other side of the call services.

For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a new level of customization for voice agents. – from the OpenAI press release

Looking ahead, OpenAI plans to continue refining its audio models, with a focus on improving intelligence, accuracy, and customization options. The company also aims to explore ways for developers to incorporate custom voices while adhering to safety standards.

These new audio models are now available to developers, marking a significant milestone in the evolution of voice agent technology. With these tools, OpenAI is paving the way for more advanced and personalized voice-driven applications.

More about the topics: AI, OpenAI

User forum

0 messages