OpenAI Overhauls WebRTC Architecture To Scale Real-Time Voice AI For 900 Million Users Worldwide
Summary
OpenAI overhauled its WebRTC architecture with a split relay-plus-transceiver model, enabling real-time voice AI to scale globally across Kubernetes for over 900 million weekly active users while minimizing latency without requiring custom client modifications.
Key Points
- OpenAI rearchitects its WebRTC stack using a split relay-plus-transceiver model, allowing real-time voice AI to scale globally across Kubernetes without exposing thousands of UDP ports.
- A lightweight relay layer routes incoming media packets to the correct stateful transceiver by decoding routing metadata embedded in the ICE username fragment, eliminating the need for per-session port allocation while preserving standard WebRTC behavior for clients.
- Global relay ingress points combined with geo-steered signaling minimize first-hop latency for over 900 million weekly active users, keeping voice interactions fast and natural without requiring kernel-bypass frameworks or custom client modifications.