OpenAI Expands Realtime API With Advanced Voice Model and New Capabilities

OpenAI

OpenAI has made its Realtime API generally available, introducing major upgrades including support for remote MCP servers, image inputs, and phone integration via Session Initiation Protocol (SIP). Alongside the API enhancements, the company unveiled gpt-realtime, its most advanced speech-to-speech model to date, designed to power production-ready voice agents with greater accuracy, intelligence, and natural expression.

Unlike conventional pipelines that separate speech-to-text and text-to-speech, the Realtime API processes audio end-to-end within a single model, enabling faster, more expressive responses. Developers can now deploy conversational agents optimized for real-world tasks such as customer service, education, and personal assistance, with improved reliability, low latency, and richer context.

The gpt-realtime model demonstrates notable advancements in instruction following, tool usage, and speech generation, scoring 82.8% on the Big Bench Audio benchmark—outperforming earlier iterations. It can seamlessly switch languages mid-sentence, interpret non-verbal cues, and follow nuanced developer prompts. Two new voices, Cedar and Marin, debut exclusively on the platform, joining an updated library of eight existing voices for more natural-sounding interactions.

Also Read: Virtualitics Launches Iris AI Agents to Transform Defense Readiness and Decision-Making

“The new speech-to-speech model in OpenAI’s Realtime API shows stronger reasoning and more natural speech—allowing it to handle complex, multi-step requests like narrowing listings by lifestyle needs or guiding affordability discussions with tools like our BuyAbility score. This could make searching for a home on Zillow or exploring financing options feel as natural as a conversation with a friend, helping simplify decisions like buying, selling, and renting a home,” said Josh Weisberg, Head of AI at Zillow.

Additional features include reusable prompts for faster development, enhanced asynchronous function calling for smoother multi-step tasks, and EU data residency support to meet enterprise privacy standards. The API also integrates safeguards against misuse, with layered safety checks and enterprise-grade compliance.

With thousands of developers having tested the service since its beta launch in October 2024, OpenAI’s Realtime API is now positioned as a key foundation for next-generation, voice-driven AI applications.