Microsoft launches gpt-realtime speech-to-speech model on Azure AI Foundry
gpt-realtime is 20% cheaper then the earlier gpt-4o-realtime preview
Microsoft has officially announced the general availability of gpt-realtime, its latest speech-to-speech (S2S) model, on Azure AI Foundry. The new model brings together Microsoft’s speech-to-speech improvements into one unified offering, with a focus on natural language, audio quality, and better instruction following.
Developers can now access gpt-realtime through the Real-time API, which supports natural, expressive voices and higher-quality audio. Two new voices, Marin and Cedar, are included in this release, designed to provide lifelike and clear speech output.
Microsoft highlights several improvements, including enhanced function calling, better instruction accuracy, and image input support, allowing users to add images into conversations and discuss them via voice, without requiring video.
In addition to technical upgrades, pricing has also been adjusted. gpt-realtime is 20% cheaper compared to the earlier gpt-4o-realtime preview, with costs based on per-million-token usage.
The launch signals Microsoft’s push to expand real-time AI capabilities for both developers and enterprises. By combining expressive voice synthesis, higher-quality audio, and multimodal input, gpt-realtime is positioned to support a wide range of use cases, from customer support systems to accessibility tools.
The model is available today through Azure AI Foundry, with full documentation published on Microsoft Learn.
Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more
User forum
0 messages