LiveKit Architecture

Why LiveKit's architecture matters for voice AI

The architecture behind the platform. WebRTC vs WebSockets, the SFU model, rooms/participants/tracks, the data plane, telephony/egress/ingress, E2EE, and the SDK ecosystem. No coding — pure understanding.

What You Build

No project — this is conceptual. You'll understand LiveKit's architecture deeply enough to explain it to your CTO.

Chapters

Why WebRTC? The latency problem in voice AI

15m

Voice AI has a ~500ms latency budget. WebSocket pipelines burn 200-500ms on buffering alone. WebRTC uses UDP and RTP packets for ~10-30ms transport. We break down the difference.

WebRTCWebSocketsLatency budgetsRTP+2 more

P2P vs MCU vs SFU: why LiveKit chose the SFU

15m

P2P doesn't scale past 3. MCU decodes and re-encodes. The SFU forwards packets without touching media — and LiveKit's Go + Pion implementation scales horizontally with Redis.

P2PMCUSFUSelective forwarding+4 more

Rooms, Participants & Tracks: universal primitives

20m

Room = virtual space. Participant = anything that connects. Track = media stream. These three primitives model everything from voice AI to video conferencing to robotics.

RoomParticipantTrackParticipantKind+1 more

AI agents as room participants

15m

The key insight: agents join rooms using the same SDK as humans. Subscribe to mic track, run STT->LLM->TTS, publish audio back. No special agent API needed.

Agent as participantSame SDKMulti-partyVision+1 more

The data plane: streams, RPC & state sync

15m

Text streams, byte streams, RPC, participant attributes, and room metadata — all through one WebRTC connection. LiveKit is a full realtime data platform.

Text streamsByte streamsRPCParticipant attributes+1 more

Extending the room: telephony, egress & ingress

15m

SIP turns phone callers into participants. Egress records to S3 or livestreams via RTMP. Ingress brings external streams in. All extend the room model.

SIP/PSTNEgressRTMPIngress+1 more

Security, encryption & self-hosting

15m

End-to-end encryption for media and data. JWT access tokens with granular grants. Open-source transparency with LiveKit Cloud's exclusive features.

E2EEAccess tokensJWTGrants+1 more

The universal SDK ecosystem

10m

12+ SDKs spanning web, mobile, desktop, Unity, embedded, and server. Same API everywhere. The broadest realtime SDK ecosystem available.

SDKsConsistent APIAgents UIServer SDKs+1 more

What You Walk Away With

Deep understanding of why WebRTC beats WebSockets for voice AI, how the SFU scales, rooms/participants/tracks as universal primitives, the data plane, and why LiveKit is architecturally unique.