Microsoft launches gpt-realtime speech-to-speech model on Azure AI Foundry

gpt-realtime is 20% cheaper then the earlier gpt-4o-realtime preview


GPT-4o Azure AI

Microsoft has officially announced the general availability of gpt-realtime, its latest speech-to-speech (S2S) model, on Azure AI Foundry. The new model brings together Microsoft’s speech-to-speech improvements into one unified offering, with a focus on natural language, audio quality, and better instruction following.

Developers can now access gpt-realtime through the Real-time API, which supports natural, expressive voices and higher-quality audio. Two new voices, Marin and Cedar, are included in this release, designed to provide lifelike and clear speech output.

Microsoft highlights several improvements, including enhanced function calling, better instruction accuracy, and image input support, allowing users to add images into conversations and discuss them via voice, without requiring video.

In addition to technical upgrades, pricing has also been adjusted. gpt-realtime is 20% cheaper compared to the earlier gpt-4o-realtime preview, with costs based on per-million-token usage.

The launch signals Microsoft’s push to expand real-time AI capabilities for both developers and enterprises. By combining expressive voice synthesis, higher-quality audio, and multimodal input, gpt-realtime is positioned to support a wide range of use cases, from customer support systems to accessibility tools.

The model is available today through Azure AI Foundry, with full documentation published on Microsoft Learn.

More about the topics: Microsoft Azure, OpenAI

Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

User forum

0 messages