The data plane: streams, RPC & state sync
In this chapter, you will learn how LiveKit moves far beyond audio and video. You will understand text streams, byte streams, RPC, participant attributes, and room metadata — five data primitives that all travel through the same WebRTC connection your media already uses. By the end, you will see LiveKit not as a media platform with some data features bolted on, but as a full realtime data platform that happens to be exceptional at media.
Beyond audio and video
In chapters 1 through 4, every example centered on media: microphone tracks, camera tracks, audio flowing between participants and agents. But real applications need more than media. A voice AI agent needs to send transcriptions as text. A collaborative app needs shared state. A file-sharing tool needs binary data transfer. A control interface needs request/response messaging.
Traditionally, developers bolt on a separate WebSocket connection, a REST API, or a third-party service to handle these needs. That means a second connection to manage, a second authentication flow, a second set of failure modes. LiveKit takes a different approach: it puts all of these data capabilities inside the same WebRTC connection that already carries your media.
Text streams
Text streams provide ordered, reliable delivery of text data between participants. Think of them as named channels for UTF-8 text that flow alongside your audio and video tracks.
Each text stream has a topic — a string identifier that lets you multiplex many independent text flows over a single connection. A participant can have multiple concurrent streams on different topics, and receivers can subscribe to exactly the topics they care about.
| Use case | Topic example | Why text streams fit |
|---|---|---|
| Chat messages | chat | Ordered delivery ensures messages appear in sequence |
| LLM streaming output | agent.transcription | Token-by-token delivery as the model generates |
| Live transcription | transcription | Real-time speech-to-text results pushed to all participants |
| Status updates | agent.status | Lightweight signals like "thinking..." or "processing..." |
Ordered and reliable
Unlike raw WebRTC data channels, LiveKit text streams guarantee ordering and reliability. Messages arrive in the sequence they were sent, and none are silently dropped. This is critical for chat and transcription, where a missing or reordered message destroys the user experience.
Text streams are the backbone of how AI agents communicate non-audio information. When a voice agent transcribes what the user said, that transcription arrives as a text stream. When the agent streams its LLM-generated response before converting it to speech, that response arrives as a text stream. The agent does not need a separate API to push this data — it flows through the same room connection.
Byte streams
Byte streams handle binary data delivery. Where text streams carry UTF-8 strings, byte streams carry raw bytes — images, files, audio clips, serialized data structures, anything that is not plain text.
Byte streams are chunked for efficient transport. Large files are broken into manageable pieces and reassembled on the receiving end. This chunking also enables progress tracking: the receiver knows how many bytes have arrived versus how many are expected, making it straightforward to display upload or download progress.
| Use case | Why byte streams fit |
|---|---|
| File transfer between participants | Chunked delivery with progress, no separate upload service needed |
| Image sharing | Send screenshots, photos, or generated images directly through the room |
| Audio clip delivery | Share pre-recorded audio without publishing it as a live track |
| Serialized data | Protocol buffers, MessagePack, or any binary format |
No separate file server
For many use cases, byte streams eliminate the need for a separate file upload service. The data flows participant-to-participant (or agent-to-participant) through the LiveKit infrastructure that is already connected and authenticated.
RPC: Remote Procedure Calls
RPC lets one participant call a method on another participant and receive a response. This is a request/response pattern — fundamentally different from the one-way push of streams.
When participant A calls an RPC method on participant B, participant B's registered handler executes and returns a result. Participant A awaits that result. It is conceptually identical to calling a function on a remote machine, except the "remote machine" is another participant in the same LiveKit room.
| Use case | What the RPC call does |
|---|---|
| Trigger an action | "Agent, start recording this conversation" |
| Query state | "Agent, what tools do you have available?" |
| Control flow | "Agent, switch to Spanish" |
| Confirmation | "Agent, did you successfully book that appointment?" |
RPC fills a gap that streams cannot. Streams are one-directional pushes — sender writes, receiver reads. RPC is bidirectional and transactional: caller sends a request, callee processes it, caller gets a response. When you need to know whether an action succeeded, or when you need to retrieve a specific piece of data on demand, RPC is the right primitive.
Participant attributes
Every participant in a LiveKit room carries a set of key-value attributes — string pairs that describe that participant. These attributes are visible to all other participants in the room and can be updated at any time. When an attribute changes, every participant receives a notification.
Participant attributes are ideal for state that is owned by one participant but observed by many:
- User profile information: display name, avatar URL, role
- Agent conversation state: current intent, language, session phase
- Preferences: volume level, subtitle language, UI mode
- Status indicators: muted, away, hand raised
Observe, don't poll
Participant attributes are reactive. You do not poll for changes — you register a listener and get called when any attribute on any participant changes. This makes them efficient for driving UI updates across all connected clients.
Room metadata
While participant attributes belong to individual participants, room metadata is shared state that belongs to the room itself. It is a single key-value store visible to everyone in the room, and any participant with the appropriate permissions can update it.
Room metadata is the right choice for state that is not owned by any single participant:
- Room configuration: recording enabled, max participants, current mode
- Shared settings: language, topic, agenda
- Application state: current slide number, game state, workflow phase
The distinction between participant attributes and room metadata mirrors a common pattern in distributed systems: per-entity state versus shared global state. Both are necessary, and LiveKit provides both through the same connection.
One connection to rule them all
Here is the architectural insight that ties everything together:
| Capability | Traditional approach | LiveKit approach |
|---|---|---|
| Audio/video | WebRTC connection | WebRTC connection |
| Chat messages | Separate WebSocket | Text stream (same connection) |
| File transfer | Separate upload API | Byte stream (same connection) |
| Remote actions | REST API calls | RPC (same connection) |
| User state | Database + polling | Participant attributes (same connection) |
| Shared state | Database + WebSocket | Room metadata (same connection) |
This matters architecturally
A single connection means a single authentication handshake, a single reconnection strategy, a single set of network traversal (TURN/ICE) negotiations, and a single point of monitoring. Every additional connection you add to a realtime system is another thing that can fail independently, another thing that can fall out of sync, another thing to debug at 3 AM.
Test your knowledge
Question 1 of 2
What is the primary advantage of LiveKit carrying data (text streams, RPC, state) over the same WebRTC connection as media?
LiveKit is a realtime data platform
By now, the framing of "LiveKit is a video conferencing tool" should feel inadequate. LiveKit provides six distinct communication primitives — audio tracks, video tracks, text streams, byte streams, RPC, and state synchronization — all unified under one connection, one authentication model, and one set of SDKs.
This is what makes it a realtime data platform. The room abstraction is not just a container for media — it is a container for any kind of realtime interaction between connected participants, whether those participants are humans, AI agents, phone callers, or IoT devices.
The data plane is what elevates LiveKit from "good WebRTC infrastructure" to "general-purpose realtime platform." When you build on LiveKit, you are not just getting low-latency audio — you are getting a complete communication substrate. Every feature you need for a realtime application is available through one SDK, one connection, one mental model.