Chapter 715m

Security, encryption & self-hosting

Security, encryption & self-hosting

In this chapter, you will learn how LiveKit secures realtime communication at every layer. You will understand end-to-end encryption, the access token system, fine-grained permission grants, and how self-hosting gives you full sovereignty over your infrastructure. By the end, you will have a clear mental model of LiveKit's trust architecture — who can do what, who can see what, and how you enforce both.

E2EEAccess tokensJWTGrantsSelf-hosting

The trust model

Before diving into individual security features, it helps to understand the trust model that governs the entire system.

In LiveKit's architecture, there are three distinct layers of trust:

  1. Your server is fully trusted. It holds the API key and secret, generates access tokens, and has complete control over rooms and participants through the server API.
  2. The LiveKit SFU (whether Cloud or self-hosted) is a trusted intermediary. It authenticates participants via tokens, enforces permissions, and routes media. Without E2EE, the SFU can observe media content. With E2EE, it forwards encrypted packets it cannot decrypt.
  3. Clients are untrusted by default. They receive only the permissions explicitly granted in their access token. They cannot call server APIs directly, cannot join rooms without a valid token, and cannot exceed their granted permissions.

This layered model means a compromised client can only do what its token allows. A compromised token is limited in scope and time. Your server's API secret is the crown jewel — everything else derives from it.

Never expose your API secret to clients

The API key and secret are used to sign access tokens on your server. They must never appear in client-side code, environment variables accessible to the browser, or mobile app bundles. If the secret is compromised, an attacker can generate tokens with any permissions for any room.

Access tokens: JWT-based authentication

Every participant that connects to a LiveKit room must present an access token — a JSON Web Token (JWT) signed with your API secret. Your backend server generates these tokens and hands them to clients. The client presents the token when connecting, and the SFU validates the signature before allowing entry.

An access token encodes several critical pieces of information:

Token fieldPurpose
API keyIdentifies which LiveKit project this token belongs to
IdentityA unique string identifying this participant (e.g., user ID)
Room nameWhich room the participant is authorized to join
GrantsSpecific permissions for what the participant can do
ExpirationWhen the token becomes invalid
MetadataOptional data attached to the participant on join

Tokens are short-lived by design. A typical token might be valid for 10 minutes — long enough for the client to connect, but short enough that a leaked token has minimal exposure. Once connected, the participant maintains their session regardless of token expiration; the token is only checked at connection time.

What's happening

The token pattern follows a well-established security principle: your server acts as the authorization authority, and the SFU acts as the enforcement point. The client never decides what it is allowed to do — the server decides (by generating a token with specific grants) and the SFU enforces (by validating the token and respecting the grants). This separation means client-side code cannot be manipulated to escalate permissions.

Grants: fine-grained permissions

The grants embedded in an access token define exactly what a participant can and cannot do. This is not a coarse "admin or user" distinction — grants are fine-grained enough to model complex permission scenarios.

GrantControls
Room joinWhether the participant can join the specified room
Can publishWhether the participant can publish audio and video tracks
Can subscribeWhether the participant can subscribe to other participants' tracks
Can publish dataWhether the participant can send data messages (text streams, byte streams, RPC)
Can publish sourcesWhich specific sources (microphone, camera, screen share) the participant can publish
Can update own metadataWhether the participant can modify their own metadata and attributes
HiddenWhether the participant is invisible to other participants
AdminFull control: can mute others, kick participants, manage the room

This granularity enables precise security modeling. Consider these scenarios:

A listen-only audience member receives a token with canSubscribe: true but canPublish: false and canPublishData: false. They can hear and see everything but cannot transmit audio, video, or data. They are a pure consumer.

A voice AI agent receives a token with canPublish: true, canSubscribe: true, and canPublishData: true. It needs to hear the user (subscribe), speak back (publish audio), and send transcriptions (publish data).

A recording bot receives a token with canSubscribe: true, canPublish: false, and hidden: true. It subscribes to all tracks for recording but is invisible to other participants and cannot transmit anything.

A moderated speaker receives a token with canPublish: true but canPublishSources restricted to microphone only. They can speak but cannot share their screen or camera.

Grants match your business logic

Because grants are generated server-side when creating the token, they can be driven by your application's business logic. A free-tier user gets listen-only grants. A premium user gets full publish rights. A moderator gets admin grants. The decision happens on your server, encoded in the token, and enforced by the SFU — no client-side permission checks needed.

End-to-end encryption (E2EE)

By default, media traveling through a LiveKit SFU is encrypted in transit — DTLS secures the connection between each client and the server, and SRTP encrypts media packets on the wire. This prevents eavesdropping by anyone between the client and the SFU. However, the SFU itself decrypts and re-encrypts packets as it forwards them, meaning the SFU operator (LiveKit Cloud or your self-hosted infrastructure) can theoretically access the media content.

End-to-end encryption changes this. With E2EE enabled, media is encrypted on the sender's device before it leaves, and decrypted on the receiver's device after it arrives. The SFU forwards the encrypted packets without being able to decrypt them. Even if the SFU infrastructure is compromised, the attacker sees only encrypted bytes.

LiveKit's E2EE implementation uses insertable streams (also called encoded transforms) — a browser API that allows JavaScript to transform encoded media frames before they enter the WebRTC pipeline and after they exit. The encryption and decryption happen at this layer, ensuring the SFU only ever handles encrypted media.

Encryption modeClient to SFUSFU processingSFU to clientSFU can read media?
Default (DTLS/SRTP)EncryptedDecrypts, forwardsEncryptedYes
E2EEEncryptedForwards encrypted bytesEncryptedNo

E2EE and SFU features

E2EE comes with trade-offs. Because the SFU cannot access media content, server-side features that require media processing — such as server-side recording (egress), transcription, or simulcast layer switching — may be limited or unavailable when E2EE is active. The SFU can still route packets and manage room state, but it cannot inspect or transform the media itself.

Key management in LiveKit's E2EE uses a shared key approach. Participants in a room negotiate a shared encryption key, and each participant's client uses that key to encrypt outgoing media and decrypt incoming media. Key rotation is handled automatically to limit the exposure window if a key is compromised.

What's happening

E2EE is not required for every application, and many use cases — particularly those involving AI agents that need server-side media processing — work best without it. But for applications handling sensitive conversations (healthcare, legal, financial), E2EE provides a level of assurance that even the infrastructure operator cannot access the content. It is a meaningful option to have available.

Self-hosting: full infrastructure control

LiveKit Cloud is the managed service — LiveKit operates the SFU infrastructure, handles scaling, and manages availability. But LiveKit is fully open source, and self-hosting gives you complete control over where your data flows and who operates the infrastructure.

Self-hosting uses the same APIs and SDKs as LiveKit Cloud. Your application code does not change when you switch between Cloud and self-hosted — you change the server URL and credentials, and everything else remains identical. This is a deliberate design decision that prevents vendor lock-in.

Deployment optionInfrastructureOperated byBest for
LiveKit CloudLiveKit's global infrastructureLiveKitMost applications, fastest start
Self-hosted (Docker)Your servers, single nodeYouDevelopment, small deployments
Self-hosted (Kubernetes)Your Kubernetes cluster, Helm chartsYouProduction, horizontal scaling

Self-hosting considerations:

  • Data sovereignty: keep all media and signaling within your own infrastructure, in your chosen geographic region. Critical for regulated industries (healthcare, government, finance).
  • Network control: deploy LiveKit nodes close to your users or inside your private network. Useful for enterprise deployments behind firewalls.
  • Cost control: for high-volume deployments, self-hosting can be more cost-effective than a managed service, though you absorb the operational burden.
  • Customization: modify the open-source server for specialized needs, though this is rarely necessary given the extensibility of the standard APIs.

Same APIs, your infrastructure

The self-hosting story is straightforward: pull the Docker image or deploy the Helm chart, configure your TURN servers and Redis for multi-node scaling, point your application at your own LiveKit URL, and everything works identically to Cloud. Your tokens, your SDKs, your client code — none of it changes.

Test your knowledge

Question 1 of 2

What is the key difference between default DTLS/SRTP encryption and end-to-end encryption (E2EE) in LiveKit?

Putting it all together

LiveKit's security architecture is a coherent system, not a collection of bolted-on features:

  • Access tokens ensure only authorized participants can join rooms, with permissions defined before they connect.
  • Grants enforce what each participant can do inside the room, with granularity down to specific media sources.
  • DTLS/SRTP encrypts all traffic between clients and the SFU by default.
  • E2EE optionally encrypts media end-to-end, making the SFU itself a zero-knowledge forwarder.
  • Self-hosting gives you full control when you need data sovereignty or regulatory compliance.

The common thread is that security decisions are made by your server (which generates tokens), enforced by the SFU (which validates tokens and respects grants), and never delegated to clients (which are untrusted by design). This is a robust trust architecture that scales from prototypes to production systems handling sensitive data.

What's happening

Security in realtime systems is harder than in request-response systems because connections are long-lived, media flows continuously, and the attack surface includes not just data at rest but data in motion. LiveKit's approach — short-lived tokens, granular grants, transport encryption by default, optional E2EE, and a clean trust hierarchy — addresses these challenges systematically rather than ad hoc.

Concepts covered
E2EEAccess tokensJWTGrantsSelf-hosting