The data plane: streams, RPC & state sync — LiveKit Architecture

In this chapter, you will learn how LiveKit moves far beyond audio and video. You will understand text streams, byte streams, RPC, participant attributes, and room metadata — five data primitives that all travel through the same WebRTC connection your media already uses. By the end, you will see LiveKit not as a media platform with some data features bolted on, but as a full realtime data platform that happens to be exceptional at media.

Text streamsByte streamsRPCParticipant attributesRoom metadata

Beyond audio and video

In chapters 1 through 4, every example centered on media: microphone tracks, camera tracks, audio flowing between participants and agents. But real applications need more than media. A voice AI agent needs to send transcriptions as text. A collaborative app needs shared state. A file-sharing tool needs binary data transfer. A control interface needs request/response messaging.

Traditionally, developers bolt on a separate WebSocket connection, a REST API, or a third-party service to handle these needs. That means a second connection to manage, a second authentication flow, a second set of failure modes. LiveKit takes a different approach: it puts all of these data capabilities inside the same WebRTC connection that already carries your media.

Text streams

Text streams provide ordered, reliable delivery of text data between participants. Think of them as named channels for UTF-8 text that flow alongside your audio and video tracks.

Each text stream has a topic — a string identifier that lets you multiplex many independent text flows over a single connection. A participant can have multiple concurrent streams on different topics, and receivers can subscribe to exactly the topics they care about.

Use case	Topic example	Why text streams fit
Chat messages	`chat`	Ordered delivery ensures messages appear in sequence
LLM streaming output	`agent.transcription`	Token-by-token delivery as the model generates
Live transcription	`transcription`	Real-time speech-to-text results pushed to all participants
Status updates	`agent.status`	Lightweight signals like "thinking..." or "processing..."

Ordered and reliable

Unlike raw WebRTC data channels, LiveKit text streams guarantee ordering and reliability. Messages arrive in the sequence they were sent, and none are silently dropped. This is critical for chat and transcription, where a missing or reordered message destroys the user experience.

Text streams are the backbone of how AI agents communicate non-audio information. When a voice agent transcribes what the user said, that transcription arrives as a text stream. When the agent streams its LLM-generated response before converting it to speech, that response arrives as a text stream. The agent does not need a separate API to push this data — it flows through the same room connection.

Byte streams

Byte streams handle binary data delivery. Where text streams carry UTF-8 strings, byte streams carry raw bytes — images, files, audio clips, serialized data structures, anything that is not plain text.

Byte streams are chunked for efficient transport. Large files are broken into manageable pieces and reassembled on the receiving end. This chunking also enables progress tracking: the receiver knows how many bytes have arrived versus how many are expected, making it straightforward to display upload or download progress.

Use case	Why byte streams fit
File transfer between participants	Chunked delivery with progress, no separate upload service needed
Image sharing	Send screenshots, photos, or generated images directly through the room
Audio clip delivery	Share pre-recorded audio without publishing it as a live track
Serialized data	Protocol buffers, MessagePack, or any binary format

No separate file server

For many use cases, byte streams eliminate the need for a separate file upload service. The data flows participant-to-participant (or agent-to-participant) through the LiveKit infrastructure that is already connected and authenticated.

RPC: Remote Procedure Calls

RPC lets one participant call a method on another participant and receive a response. This is a request/response pattern — fundamentally different from the one-way push of streams.

When participant A calls an RPC method on participant B, participant B's registered handler executes and returns a result. Participant A awaits that result. It is conceptually identical to calling a function on a remote machine, except the "remote machine" is another participant in the same LiveKit room.

Use case	What the RPC call does
Trigger an action	"Agent, start recording this conversation"
Query state	"Agent, what tools do you have available?"
Control flow	"Agent, switch to Spanish"
Confirmation	"Agent, did you successfully book that appointment?"

What's happening

RPC fills a gap that streams cannot. Streams are one-directional pushes — sender writes, receiver reads. RPC is bidirectional and transactional: caller sends a request, callee processes it, caller gets a response. When you need to know whether an action succeeded, or when you need to retrieve a specific piece of data on demand, RPC is the right primitive.

Participant attributes

Every participant in a LiveKit room carries a set of key-value attributes — string pairs that describe that participant. These attributes are visible to all other participants in the room and can be updated at any time. When an attribute changes, every participant receives a notification.

Participant attributes are ideal for state that is owned by one participant but observed by many:

User profile information: display name, avatar URL, role
Agent conversation state: current intent, language, session phase
Preferences: volume level, subtitle language, UI mode
Status indicators: muted, away, hand raised

Observe, don't poll

Participant attributes are reactive. You do not poll for changes — you register a listener and get called when any attribute on any participant changes. This makes them efficient for driving UI updates across all connected clients.

Room metadata

While participant attributes belong to individual participants, room metadata is shared state that belongs to the room itself. It is a single key-value store visible to everyone in the room, and any participant with the appropriate permissions can update it.

Room metadata is the right choice for state that is not owned by any single participant:

Room configuration: recording enabled, max participants, current mode
Shared settings: language, topic, agenda
Application state: current slide number, game state, workflow phase

The distinction between participant attributes and room metadata mirrors a common pattern in distributed systems: per-entity state versus shared global state. Both are necessary, and LiveKit provides both through the same connection.

One connection to rule them all

Here is the architectural insight that ties everything together:

Capability	Traditional approach	LiveKit approach
Audio/video	WebRTC connection	WebRTC connection
Chat messages	Separate WebSocket	Text stream (same connection)
File transfer	Separate upload API	Byte stream (same connection)
Remote actions	REST API calls	RPC (same connection)
User state	Database + polling	Participant attributes (same connection)
Shared state	Database + WebSocket	Room metadata (same connection)

This matters architecturally

A single connection means a single authentication handshake, a single reconnection strategy, a single set of network traversal (TURN/ICE) negotiations, and a single point of monitoring. Every additional connection you add to a realtime system is another thing that can fail independently, another thing that can fall out of sync, another thing to debug at 3 AM.

Test your knowledge

Question 1 of 2

What is the primary advantage of LiveKit carrying data (text streams, RPC, state) over the same WebRTC connection as media?

LiveKit is a realtime data platform

By now, the framing of "LiveKit is a video conferencing tool" should feel inadequate. LiveKit provides six distinct communication primitives — audio tracks, video tracks, text streams, byte streams, RPC, and state synchronization — all unified under one connection, one authentication model, and one set of SDKs.

This is what makes it a realtime data platform. The room abstraction is not just a container for media — it is a container for any kind of realtime interaction between connected participants, whether those participants are humans, AI agents, phone callers, or IoT devices.

What's happening

The data plane is what elevates LiveKit from "good WebRTC infrastructure" to "general-purpose realtime platform." When you build on LiveKit, you are not just getting low-latency audio — you are getting a complete communication substrate. Every feature you need for a realtime application is available through one SDK, one connection, one mental model.