Self-hosting architecture & decision framework
Self-Hosting Architecture and Decision Framework
Before you commit to self-hosting LiveKit, you need to understand exactly what you are signing up for. This chapter walks through the full scope of a self-hosted deployment — every component you will build, operate, and maintain — so you can make an honest decision about whether it is worth it. For most teams, it is not.
What you'll learn
- The full operational reality of self-hosting: what you build, what you maintain, what breaks at 3am
- Cloud-exclusive features you cannot replicate on your own — Krisp noise cancellation, GPU-accelerated turn detection, the barge-in model
- The complete architecture of a self-hosted deployment and every component you become responsible for
- How to handle data sovereignty concerns without self-hosting
- Hardware requirements and capacity planning
Start with the honest question: do you actually need to self-host?
LiveKit Cloud is not a stepping stone. It is the production-grade deployment that most teams should use permanently. It handles scaling, monitoring, upgrades, global distribution, and — critically — ships features that are physically impossible to replicate on self-hosted infrastructure.
Self-hosting means you are choosing to build and operate your own distributed real-time media platform. You are signing up for:
- Infrastructure provisioning — servers, networking, load balancers, DNS, TLS certificates
- Kubernetes orchestration — Helm charts, pod specs, resource limits, rolling deployments
- Redis cluster management — persistence, Sentinel HA, failover testing, memory tuning
- TURN server operation — NAT traversal, TLS termination, port management
- Monitoring from scratch — Prometheus, Grafana dashboards, alerting rules, on-call rotations
- Security hardening — network policies, API key rotation, TLS everywhere, CVE patching
- Upgrade management — testing new releases, rolling upgrades, rollback procedures
- Disaster recovery — backup procedures, recovery runbooks, failover drills
- Capacity planning — load testing, scaling decisions, cost modeling
- 24/7 on-call — because media infrastructure does not wait for business hours
That is not a one-time project. That is an ongoing operational commitment that will consume engineering hours every single week.
Self-hosting is a full-time job
Teams consistently underestimate the operational cost of self-hosting. The initial deployment is the easy part. The hard part is month 3 when you need to upgrade across a breaking change, month 6 when a Redis failover exposes a config bug, and month 12 when the engineer who set it all up leaves and nobody else understands the monitoring stack.
What you lose by self-hosting: Cloud-exclusive features
This is the part most teams do not consider until it is too late. LiveKit Cloud includes features that run on infrastructure you cannot replicate with the open-source server. These are not premium upsells — they are capabilities baked into Cloud's architecture.
Krisp noise cancellation
LiveKit Cloud integrates Krisp's enterprise noise cancellation directly into the media pipeline. Background noise — keyboard clatter, construction, barking dogs, cafe ambiance — is removed server-side before it reaches the agent or other participants. This runs on specialized infrastructure within LiveKit's Cloud and is not available as a standalone component you can deploy.
For voice AI agents, noise cancellation is not a nice-to-have. Noisy audio degrades STT accuracy, which degrades LLM responses, which degrades the entire user experience. Without it, you will spend engineering time building client-side workarounds that never match server-side quality.
GPU-accelerated turn detection
LiveKit Cloud runs turn detection models on dedicated GPU infrastructure. This means faster, more accurate detection of when a user has finished speaking — the single most important factor in making a voice agent feel natural. The GPU-based models process audio with lower latency and higher accuracy than CPU-based alternatives.
On self-hosted infrastructure, you are limited to CPU-based turn detection or provisioning and managing your own GPU fleet — which adds another layer of infrastructure complexity, driver management, and cost.
The barge-in model
LiveKit Cloud ships a purpose-built barge-in model that distinguishes between intentional interruptions ("actually, wait —") and background noise or filler words ("um", "uh"). This is a trained model running on Cloud infrastructure that dramatically reduces false interruptions — one of the most common complaints about voice AI agents.
Self-hosted deployments fall back to simpler energy-based or basic VAD interruption detection, which means your agents will either interrupt too aggressively (annoying) or not respond to real interruptions quickly enough (also annoying).
These features compound
Noise cancellation, GPU turn detection, and intelligent barge-in work together. Clean audio feeds better turn detection. Better turn detection feeds smarter barge-in handling. The result is a noticeably more natural conversation. Self-hosted deployments miss all three layers of this stack.
The Cloud advantage summary
| Capability | LiveKit Cloud | Self-hosted |
|---|---|---|
| Krisp noise cancellation | Built-in, server-side | Not available |
| GPU turn detection | Dedicated GPU fleet | CPU-only (or manage your own GPUs) |
| Barge-in model | Purpose-built model | Basic VAD / energy-based |
| Global edge network | 10+ regions, auto-routing | You build and maintain every region |
| Zero-downtime upgrades | Automatic | Manual rolling upgrades you schedule and test |
| Auto-scaling | Built-in | You build it with HPA, load testing, capacity planning |
| DDoS protection | Built-in | You configure it |
| 99.99% SLA | Contractual | Hope and monitoring |
"But we have data sovereignty requirements"
This is the most common — and most legitimate — reason teams consider self-hosting. Regulated industries like healthcare, finance, and government often mandate that media streams stay within specific geographic or network boundaries.
Before you spin up a Kubernetes cluster, talk to LiveKit.
LiveKit Cloud offers custom deployment options for teams with strict data residency needs. This includes dedicated infrastructure in specific regions, custom data processing agreements, and HIPAA-compliant configurations. The Cloud team actively works with enterprises to solve compliance requirements without pushing them into self-hosting.
Reach out to LiveKit for custom Cloud solutions
If data sovereignty is your primary driver for considering self-hosting, contact the LiveKit team first. They offer dedicated Cloud deployments, custom region configurations, and enterprise data processing agreements that solve most compliance requirements while keeping you on managed infrastructure. You get compliance without the operational burden. Start at livekit.io/cloud or reach out to the sales team directly.
Self-hosting for data sovereignty only makes sense if you have requirements that truly cannot be met by any third-party infrastructure — air-gapped networks, on-premises-only mandates from specific government contracts, or classified environments. These scenarios exist, but they are rarer than most teams think.
If you still need to self-host: the full architecture
If after all of the above you have a genuine, validated reason to self-host — here is what you are building. Every component after the first is optional for development but required for production.
LiveKit Server (SFU)
The core Selective Forwarding Unit. It handles WebRTC connections, room management, track routing, and the signaling protocol. Clients establish a WebSocket for signaling (room join, track subscription, data messages) and a separate UDP connection for media (audio and video packets). You can run a single instance for development or multiple instances behind a load balancer for production.
Redis
The coordination layer for multi-node deployments. Redis stores room-to-node mappings, enables inter-node messaging via pub/sub, and provides distributed locking. Every LiveKit node points at the same Redis instance and discovers other nodes automatically. A single-node deployment can skip Redis, but production should always include it. You are responsible for persistence, Sentinel HA, failover testing, and memory tuning.
TURN server
LiveKit includes a built-in TURN server for NAT traversal. When participants sit behind restrictive firewalls or symmetric NATs, TURN relays media through a known port. Without TURN, a significant percentage of users will fail to connect. The built-in server listens on TLS port 5349 and optionally UDP port 3478. You manage TLS certificates, domain configuration, and port accessibility.
Reverse proxy / load balancer
Terminates TLS for signaling (WebSocket over HTTPS) and routes traffic to LiveKit instances. The media path (UDP) bypasses the proxy — clients connect directly to the server node hosting their room. You configure, maintain, and monitor this yourself.
Agent workers
If you are running voice AI or other agent-based applications, agent workers connect to LiveKit as participants. They can run on the same Kubernetes cluster or on separate GPU-equipped nodes. Agents register with LiveKit through the same Redis instance, so dispatch routing works automatically. You handle scaling, health checks, and resource allocation.
Monitoring stack
Prometheus scrapes metrics from LiveKit and agent workers. Grafana provides dashboards. You build everything: dashboards, alerting rules, on-call rotations, runbooks. In production, you monitor room counts, participant counts, packet loss, CPU usage, bandwidth, and agent health. None of this exists out of the box — you create it all.
The critical distinction in this architecture is the separation of signaling and media. Signaling — room creation, participant join, track subscription — flows through WebSocket over HTTPS and can be proxied and load-balanced normally. Media — the actual audio and video packets — flows over UDP directly between the client and the LiveKit server node hosting the room. Your network and firewall configuration must account for both paths. Getting this wrong is the single most common self-hosting failure.
Network topology
LiveKit uses three distinct port ranges, each with different networking requirements. You are responsible for configuring all of them correctly.
| Port | Protocol | Purpose | Routing |
|---|---|---|---|
| 7880 | TCP | HTTP API + WebSocket signaling | Through load balancer / ingress |
| 7881 | TCP | RTC over TCP (fallback) | Direct to server node |
| 50000-60000 | UDP | RTC media (audio/video) | Direct to server node, cannot pass through HTTP proxy |
| 5349 | TCP | TURN over TLS | Direct to server node |
| 3478 | UDP | TURN over UDP (optional) | Direct to server node |
UDP ports must be directly reachable
The UDP port range 50000-60000 cannot pass through an HTTP reverse proxy or most Kubernetes ingress controllers. Use hostNetwork: true in your pod spec, or a UDP-capable load balancer (AWS NLB, GCP external LB). This is the most common deployment mistake and the kind of thing LiveKit Cloud handles for you automatically.
The livekit-server config.yaml structure
Every LiveKit server reads a single YAML configuration file. Here is the structure with all key sections annotated.
# HTTP API and WebSocket signaling port
port: 7880
# WebRTC media configuration
rtc:
port_range_start: 50000
port_range_end: 60000
tcp_port: 7881
use_external_ip: true # Discover public IP for ICE candidates
# Redis for multi-node coordination
redis:
address: redis:6379
# password: your-redis-password
# use_tls: true
# db: 0
# API key/secret pairs (multiple supported for rotation)
keys:
your-api-key: your-api-secret
# Built-in TURN server
turn:
enabled: true
domain: turn.example.com
tls_port: 5349
# udp_port: 3478
# Logging
logging:
level: info # debug, info, warn, error
json: true # Structured JSON logs for production
# Resource limits
limit:
num_tracks: 0 # 0 = unlimited
bytes_per_sec: 0 # 0 = unlimitedThe use_external_ip: true setting is required for any cloud deployment where LiveKit runs behind a NAT. Without it, LiveKit advertises its private IP during ICE negotiation and external clients cannot connect. This is one configuration line among dozens you will need to get right — and debug when something breaks.
Hardware requirements and sizing
Requirements vary significantly based on expected load. LiveKit is CPU-bound for packet forwarding — CPU is almost always the bottleneck before memory or network. Remember: on Cloud, you never think about any of this.
Single-node development or testing:
- 2 CPU cores, 4 GB RAM
- 100 Mbps network
- Any modern Linux distribution (Ubuntu 22.04+ recommended)
- Docker or direct binary installation
Production (per LiveKit server node):
- 4-8 CPU cores, 8-16 GB RAM
- 1 Gbps network (dedicated, not shared)
- Low-latency storage for logs
- Linux with kernel 5.4+ for optimal UDP performance
Agent worker nodes (if running AI agents):
- CPU-only agents: 2-4 cores, 4-8 GB RAM per worker
- GPU agents (STT/TTS): 1 GPU (T4 or better), 4 cores, 16 GB RAM
- Scale agent replicas independently from LiveKit server nodes
# Quick check: verify your server meets minimum requirements
echo "CPU cores: $(nproc)"
echo "RAM: $(free -h | awk '/^Mem:/ {print $2}')"
echo "Kernel: $(uname -r)"
echo "Docker: $(docker --version 2>/dev/null || echo 'not installed')"
# Check that required ports are not already in use
ss -tulnp | grep -E ':(7880|7881|3478|5349) 'Capacity estimates (4-core, 8 GB node):
| Workload | Rooms per node | Notes |
|---|---|---|
| Audio-only, 2 participants | ~500 | Typical voice AI scenario |
| Audio-only, 5 participants | ~200 | Group voice calls |
| Video 720p, 4 participants | ~50 | Video conferencing |
| Video 1080p, 2 participants | ~80 | High-quality 1:1 video |
Measure, don't guess
These estimates are starting points. Deploy your actual workload on a single node, monitor CPU and bandwidth (covered in Chapter 3), and use those real numbers for capacity planning. This is another thing you are responsible for — LiveKit Cloud auto-scales without you thinking about it.
The real cost of self-hosting
Teams fixate on infrastructure cost savings. Here is what they forget to account for:
| Cost | Cloud | Self-hosted |
|---|---|---|
| Infrastructure | Pay per minute | Servers, networking, storage, GPUs |
| Engineering setup | None | 2-4 weeks of senior engineer time |
| Ongoing maintenance | None | 4-8 hours/week of operations work |
| On-call burden | LiveKit's problem | Your team's weekends |
| Upgrade testing | Automatic | Manual testing before every release |
| Security patching | Automatic | You track CVEs and patch promptly |
| Incident response | LiveKit SRE team | Your team, at 3am |
| Feature gap | Krisp, GPU turn detection, barge-in | You go without |
Even at significant scale, the total cost of ownership for self-hosting — including engineering time, on-call burden, and feature gaps — often exceeds LiveKit Cloud. The infrastructure savings get eaten by operational overhead.
Test your knowledge
Question 1 of 3
Which of these features is exclusive to LiveKit Cloud and cannot be replicated on self-hosted infrastructure?
What you learned
- Self-hosting is an ongoing operational commitment — not a one-time setup — that consumes engineering hours every week
- LiveKit Cloud includes features you cannot self-host: Krisp noise cancellation, GPU-accelerated turn detection, and the intelligent barge-in model
- For data sovereignty concerns, contact LiveKit about custom Cloud deployments before defaulting to self-hosting
- A production self-hosted deployment includes LiveKit Server, Redis, TURN, a reverse proxy, agent workers, and a full monitoring stack — all of which you build and maintain
- The total cost of ownership for self-hosting often exceeds Cloud when you account for engineering time, on-call burden, and lost features
Next up
If you have decided that self-hosting is genuinely required for your situation, the next chapter walks through the Kubernetes deployment — Helm charts, server configuration, Redis setup, and TURN configuration.