OpenAI Launches GPT-Realtime-2, Real-Time Translation and Whisper Audio APIs

Available now through OpenAI’s Realtime API for developers


openai voice api

OpenAI has announced three new real-time voice and audio API models, giving developers more options for building live voice agents, translation tools, and speech-to-text apps.

The new lineup includes GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. All three are available now through OpenAI’s Realtime API, while consumer ChatGPT voice upgrades remain in development.

GPT-Realtime-2 targets live voice agents

GPT-Realtime-2 is built for natural, back-and-forth voice conversations. It supports tool calling, corrections during conversations, and long-running interactions, making it suitable for voice assistants, customer support agents, and interactive AI apps.

The model also adds preambles, so an agent can say phrases like “let me check that” before taking action. It supports parallel tool calls, better recovery from failures, and a larger 128K context window, up from 32K.

OpenAI says the model also handles specialized vocabulary better, including healthcare terms, proper nouns, and technical language. Developers can also adjust tone and reasoning levels, ranging from minimal to Xhigh.

Benchmarks show major gains

OpenAI says GPT-Realtime-2 reaches 96.6% on Big Bench Audio in High mode, compared with 81.4% for GPT-Realtime-1.5.

On Audio MultiChallenge, GPT-Realtime-2 reaches 48.5% in Xhigh mode, compared with 34.7% for GPT-Realtime-1.5.

Translation and transcription also get new models

GPT-Realtime-Translate focuses on real-time multilingual speech translation. It supports more than 70 input languages and 13 output languages, while handling accents, regional pronunciations, domain-specific terms, and context switching during speech.

GPT-Realtime-Whisper is a streaming speech-to-text model designed for low-latency transcription while users speak. OpenAI positions it for live captions, meeting notes, classroom transcripts, and real-time subtitles.

Pricing and availability

GPT-Realtime-2 costs $32 per 1 million audio input tokens, $0.40 per 1 million cached input tokens, and $64 per 1 million audio output tokens.

GPT-Realtime-Translate costs $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.

Developers can test the new models in the Playground through OpenAI’s Realtime API.

In other OpenAI news, the company has added a Trusted Contact feature, made Codex available in Chrome through a plugin, and released GPT-5.5-Cyber to cyber defenders in a limited preview.

More about the topics: AI, OpenAI

Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

User forum

0 messages