Multi-tenant architecture
Multi-Tenant Architecture
Most production LiveKit deployments serve multiple customers from a single infrastructure. A healthcare startup, an enterprise contact center, and a retail kiosk company might all be tenants on your platform. Each needs isolated rooms, separate billing, different agent configurations, and their own API keys — all running on shared infrastructure. This chapter shows you how to build that.
What you'll build
A multi-tenant voice AI platform where a single agent codebase serves many customers. Each tenant gets isolated rooms, custom agent instructions, separate billing, and configurable provider settings — without deploying separate infrastructure per tenant.
The isolation model
LiveKit does not have a built-in tenant concept. You build tenancy at the application layer using three mechanisms:
Room namespacing
Prefix every room name with the tenant ID: tenant-abc_support-room-1. This guarantees participants from different tenants never end up in the same room. Your token server enforces this by only issuing tokens for rooms matching the authenticated tenant.
JWT metadata
Embed tenant context in the JWT access token's metadata field. When the agent receives a session, it reads the tenant ID and configuration from the token — no extra database lookup needed in the hot path.
Per-tenant agent configuration
A single agent codebase loads tenant-specific instructions, tool sets, and provider preferences from a config store. The agent adapts its behavior per session based on the token metadata.
The key insight is that LiveKit rooms are already isolated — participants in different rooms cannot hear each other. By namespacing room names with tenant IDs and enforcing this in token generation, you get strong isolation without any changes to the LiveKit server. The agent framework then uses token metadata to load per-tenant behavior.
Token server with tenant context
Your token server is the enforcement point. It authenticates the tenant, validates the room name matches their namespace, and embeds tenant config in the token metadata.
from livekit.api import AccessToken, VideoGrants
from fastapi import FastAPI, HTTPException, Depends
import json
app = FastAPI()
async def get_tenant(api_key: str) -> dict:
"""Look up tenant by API key. Returns tenant config."""
tenant = await db.tenants.find_one({"api_key": api_key})
if not tenant:
raise HTTPException(status_code=401, detail="Invalid API key")
return tenant
@app.post("/token")
async def create_token(
room: str,
identity: str,
tenant: dict = Depends(get_tenant),
):
tenant_id = tenant["id"]
# Enforce room namespacing
if not room.startswith(f"{tenant_id}_"):
raise HTTPException(
status_code=403,
detail=f"Room must be prefixed with {tenant_id}_",
)
# Embed tenant config in token metadata
metadata = json.dumps({
"tenant_id": tenant_id,
"plan": tenant["plan"],
"agent_config": {
"instructions": tenant.get("custom_instructions", ""),
"stt_provider": tenant.get("stt_provider", "deepgram"),
"llm_provider": tenant.get("llm_provider", "openai"),
"llm_model": tenant.get("llm_model", "gpt-4o"),
"tts_provider": tenant.get("tts_provider", "cartesia"),
"tts_voice": tenant.get("tts_voice", "professional"),
"tools_enabled": tenant.get("tools_enabled", []),
},
})
token = (
AccessToken()
.with_identity(identity)
.with_metadata(metadata)
.with_grants(VideoGrants(
room_join=True,
room=room,
))
.to_jwt()
)
return {"token": token}import { AccessToken } from "livekit-server-sdk";
import express from "express";
const app = express();
app.post("/token", async (req, res) => {
const { room, identity, apiKey } = req.body;
const tenant = await db.tenants.findOne({ apiKey });
if (!tenant) return res.status(401).json({ error: "Invalid API key" });
// Enforce room namespacing
if (!room.startsWith(`${tenant.id}_`)) {
return res.status(403).json({
error: `Room must be prefixed with ${tenant.id}_`,
});
}
const token = new AccessToken(
process.env.LIVEKIT_API_KEY,
process.env.LIVEKIT_API_SECRET,
{ identity, metadata: JSON.stringify({
tenant_id: tenant.id,
plan: tenant.plan,
agent_config: {
instructions: tenant.customInstructions || "",
stt_provider: tenant.sttProvider || "deepgram",
llm_provider: tenant.llmProvider || "openai",
llm_model: tenant.llmModel || "gpt-4o",
tts_provider: tenant.ttsProvider || "cartesia",
tts_voice: tenant.ttsVoice || "professional",
tools_enabled: tenant.toolsEnabled || [],
},
})},
);
token.addGrant({ roomJoin: true, room });
res.json({ token: await token.toJwt() });
});Single agent, many tenants
The agent reads tenant configuration from the participant metadata at session start and configures itself accordingly. One agent codebase, one deployment, many behaviors.
from livekit.agents import Agent, AgentSession, RoomInputOptions, function_tool, RunContext
from livekit.plugins import deepgram, openai, anthropic, cartesia, elevenlabs, google
import json
# Provider factories — map config strings to plugin instances
STT_PROVIDERS = {
"deepgram": lambda cfg: deepgram.STT(model="nova-3"),
"google": lambda cfg: google.STT(),
"assemblyai": lambda cfg: __import__("livekit.plugins.assemblyai", fromlist=["STT"]).STT(),
}
LLM_PROVIDERS = {
"openai": lambda cfg: openai.LLM(model=cfg.get("llm_model", "gpt-4o")),
"anthropic": lambda cfg: anthropic.LLM(model=cfg.get("llm_model", "claude-sonnet-4-20250514")),
"google": lambda cfg: google.LLM(model=cfg.get("llm_model", "gemini-2.0-flash")),
}
TTS_PROVIDERS = {
"cartesia": lambda cfg: cartesia.TTS(voice=cfg.get("tts_voice", "professional")),
"elevenlabs": lambda cfg: elevenlabs.TTS(voice_id=cfg.get("tts_voice")),
"openai": lambda cfg: openai.TTS(voice=cfg.get("tts_voice", "alloy")),
}
class TenantAgent(Agent):
def __init__(self, tenant_config: dict):
base_instructions = (
"You are a voice AI assistant. Follow your instructions carefully."
)
custom = tenant_config.get("instructions", "")
instructions = f"{base_instructions}\n\n{custom}" if custom else base_instructions
super().__init__(instructions=instructions)
self.tenant_config = tenant_config
async def entrypoint(ctx):
# Read tenant config from the room's participant metadata
# The first remote participant has the tenant context
await ctx.wait_for_participant()
participant = ctx.room.remote_participants.values().__iter__().__next__()
metadata = json.loads(participant.metadata or "{}")
tenant_id = metadata.get("tenant_id", "unknown")
config = metadata.get("agent_config", {})
# Build provider stack from tenant config
stt_factory = STT_PROVIDERS.get(config.get("stt_provider", "deepgram"))
llm_factory = LLM_PROVIDERS.get(config.get("llm_provider", "openai"))
tts_factory = TTS_PROVIDERS.get(config.get("tts_provider", "cartesia"))
session = AgentSession(
stt=stt_factory(config),
llm=llm_factory(config),
tts=tts_factory(config),
)
agent = TenantAgent(tenant_config=config)
await session.start(
agent=agent,
room=ctx.room,
room_input_options=RoomInputOptions(),
)
print(f"Session started for tenant {tenant_id}")Cache tenant configs
For high-traffic platforms, cache tenant configs in Redis or in-memory with a short TTL. You do not want a database query on every session start. The token metadata approach above avoids this entirely by embedding config in the JWT.
Per-tenant tool registration
Different tenants need different tools. A healthcare tenant needs appointment scheduling. A retail tenant needs order lookup. Register tools dynamically based on the tenant's enabled tools list.
from livekit.agents import Agent, function_tool, RunContext
# Tool registry — all available tools across all tenants
TOOL_REGISTRY = {}
def tenant_tool(name: str):
"""Decorator to register a tool in the global registry."""
def decorator(func):
TOOL_REGISTRY[name] = func
return func
return decorator
@tenant_tool("appointment_scheduling")
@function_tool()
async def schedule_appointment(context: RunContext, date: str, time: str):
"""Schedule an appointment for the caller."""
# Implementation here
return f"Appointment scheduled for {date} at {time}."
@tenant_tool("order_lookup")
@function_tool()
async def lookup_order(context: RunContext, order_id: str):
"""Look up an order by ID."""
# Implementation here
return f"Order {order_id}: shipped, arriving Thursday."
@tenant_tool("knowledge_base")
@function_tool()
async def search_knowledge(context: RunContext, query: str):
"""Search the tenant's knowledge base."""
# Implementation with tenant-specific vector store
return "Based on your documentation..."
class TenantAgent(Agent):
def __init__(self, tenant_config: dict):
super().__init__(
instructions=tenant_config.get("instructions", ""),
)
# Register only the tools this tenant has enabled
enabled = tenant_config.get("tools_enabled", [])
for tool_name in enabled:
if tool_name in TOOL_REGISTRY:
self.register_tool(TOOL_REGISTRY[tool_name])Usage tracking and billing
Track per-tenant usage by hooking into agent lifecycle events. Every session logs duration, token usage, and provider costs back to your billing system.
from livekit.agents import Agent, AgentSession
from datetime import datetime
import time
class UsageTracker:
def __init__(self, tenant_id: str, session_id: str):
self.tenant_id = tenant_id
self.session_id = session_id
self.start_time = time.time()
self.llm_tokens_in = 0
self.llm_tokens_out = 0
self.stt_seconds = 0.0
self.tts_characters = 0
def track_llm_usage(self, input_tokens: int, output_tokens: int):
self.llm_tokens_in += input_tokens
self.llm_tokens_out += output_tokens
def track_stt_usage(self, duration_seconds: float):
self.stt_seconds += duration_seconds
def track_tts_usage(self, characters: int):
self.tts_characters += characters
async def flush(self):
duration = time.time() - self.start_time
record = {
"tenant_id": self.tenant_id,
"session_id": self.session_id,
"timestamp": datetime.utcnow().isoformat(),
"duration_seconds": round(duration, 1),
"llm_tokens_in": self.llm_tokens_in,
"llm_tokens_out": self.llm_tokens_out,
"stt_seconds": round(self.stt_seconds, 1),
"tts_characters": self.tts_characters,
"estimated_cost_usd": self._estimate_cost(),
}
await billing_db.usage.insert_one(record)
def _estimate_cost(self) -> float:
# Example cost model — adjust per provider pricing
llm_cost = (self.llm_tokens_in * 0.000003) + (self.llm_tokens_out * 0.000015)
stt_cost = self.stt_seconds * (0.0059 / 60) # Deepgram per-second
tts_cost = self.tts_characters * 0.000015 # Cartesia per-char
return round(llm_cost + stt_cost + tts_cost, 6)Wire the tracker into the agent session:
async def entrypoint(ctx):
# ... tenant config loading from above ...
tracker = UsageTracker(tenant_id=tenant_id, session_id=ctx.room.name)
session = AgentSession(
stt=stt_factory(config),
llm=llm_factory(config),
tts=tts_factory(config),
)
# Hook into session events for usage tracking
@session.on("agent_speech_committed")
def on_speech(event):
tracker.track_tts_usage(len(event.content))
@session.on("user_speech_committed")
def on_user_speech(event):
tracker.track_stt_usage(event.duration)
agent = TenantAgent(tenant_config=config)
await session.start(
agent=agent,
room=ctx.room,
room_input_options=RoomInputOptions(),
)
# Flush usage when session ends
ctx.add_shutdown_callback(tracker.flush)Scaling multi-tenant agent workers
With many tenants, you need to scale agent workers efficiently. Key patterns:
Shared worker pool. All tenants share the same pool of agent workers. LiveKit dispatches sessions to available workers automatically. This maximizes utilization — idle capacity for one tenant serves another.
Tenant-weighted scaling. Scale the worker pool based on total concurrent sessions across all tenants, not per-tenant. A platform with 100 tenants averaging 5 concurrent sessions each needs the same capacity as one tenant with 500 sessions.
Priority tiers. Enterprise tenants get dedicated worker capacity. Free-tier tenants share a best-effort pool. Implement this with separate agent worker deployments and LiveKit's room-level agent dispatch.
from livekit.agents import WorkerOptions, cli
# Enterprise agent pool — dedicated capacity
enterprise_worker = WorkerOptions(
entrypoint=entrypoint,
agent_name="enterprise-agent",
num_idle_processes=5, # Keep 5 warm for instant response
)
# Standard agent pool — shared capacity
standard_worker = WorkerOptions(
entrypoint=entrypoint,
agent_name="standard-agent",
num_idle_processes=1,
)
# In your token server, set the room's agent dispatch:
# Enterprise tenants -> room metadata requests "enterprise-agent"
# Standard tenants -> room metadata requests "standard-agent"Rate limiting per tenant
Without rate limits, a single tenant can starve others of agent capacity. Enforce concurrent session limits per tenant in your token server — reject new room creation when a tenant exceeds their plan's limit.
Tenant management dashboard
Track per-tenant metrics for your operations dashboard:
| Metric | Purpose | Alert threshold |
|---|---|---|
| Concurrent sessions | Capacity planning | Approaching plan limit |
| Avg session duration | Usage patterns | Unusual spikes |
| LLM tokens per session | Cost monitoring | 2x above average |
| Error rate | Quality monitoring | Above 5% |
| P95 response latency | SLA compliance | Above 2 seconds |
Test your knowledge
Question 1 of 3
How does room namespacing provide tenant isolation in LiveKit without server-side changes?
What you learned
- Room namespacing with tenant ID prefixes provides isolation without LiveKit server changes
- JWT metadata embeds tenant config directly in the token, avoiding database lookups in the hot path
- A single agent codebase serves many tenants by loading per-tenant instructions, provider configs, and tools dynamically
- Usage tracking hooks into agent lifecycle events to build per-tenant billing
- Shared worker pools maximize utilization; priority tiers with separate agent names serve enterprise vs standard tenants
Next up
In the next chapter, you will set up auto-scaling to handle traffic spikes across your multi-tenant platform.