Chapter 725m

Multi-tenant architecture

Multi-Tenant Architecture

Most production LiveKit deployments serve multiple customers from a single infrastructure. A healthcare startup, an enterprise contact center, and a retail kiosk company might all be tenants on your platform. Each needs isolated rooms, separate billing, different agent configurations, and their own API keys — all running on shared infrastructure. This chapter shows you how to build that.

Tenant isolationRoom namespacingPer-tenant configUsage tracking

What you'll build

A multi-tenant voice AI platform where a single agent codebase serves many customers. Each tenant gets isolated rooms, custom agent instructions, separate billing, and configurable provider settings — without deploying separate infrastructure per tenant.

The isolation model

LiveKit does not have a built-in tenant concept. You build tenancy at the application layer using three mechanisms:

1

Room namespacing

Prefix every room name with the tenant ID: tenant-abc_support-room-1. This guarantees participants from different tenants never end up in the same room. Your token server enforces this by only issuing tokens for rooms matching the authenticated tenant.

2

JWT metadata

Embed tenant context in the JWT access token's metadata field. When the agent receives a session, it reads the tenant ID and configuration from the token — no extra database lookup needed in the hot path.

3

Per-tenant agent configuration

A single agent codebase loads tenant-specific instructions, tool sets, and provider preferences from a config store. The agent adapts its behavior per session based on the token metadata.

What's happening

The key insight is that LiveKit rooms are already isolated — participants in different rooms cannot hear each other. By namespacing room names with tenant IDs and enforcing this in token generation, you get strong isolation without any changes to the LiveKit server. The agent framework then uses token metadata to load per-tenant behavior.

Token server with tenant context

Your token server is the enforcement point. It authenticates the tenant, validates the room name matches their namespace, and embeds tenant config in the token metadata.

token_server.pypython
from livekit.api import AccessToken, VideoGrants
from fastapi import FastAPI, HTTPException, Depends
import json

app = FastAPI()

async def get_tenant(api_key: str) -> dict:
  """Look up tenant by API key. Returns tenant config."""
  tenant = await db.tenants.find_one({"api_key": api_key})
  if not tenant:
      raise HTTPException(status_code=401, detail="Invalid API key")
  return tenant

@app.post("/token")
async def create_token(
  room: str,
  identity: str,
  tenant: dict = Depends(get_tenant),
):
  tenant_id = tenant["id"]

  # Enforce room namespacing
  if not room.startswith(f"{tenant_id}_"):
      raise HTTPException(
          status_code=403,
          detail=f"Room must be prefixed with {tenant_id}_",
      )

  # Embed tenant config in token metadata
  metadata = json.dumps({
      "tenant_id": tenant_id,
      "plan": tenant["plan"],
      "agent_config": {
          "instructions": tenant.get("custom_instructions", ""),
          "stt_provider": tenant.get("stt_provider", "deepgram"),
          "llm_provider": tenant.get("llm_provider", "openai"),
          "llm_model": tenant.get("llm_model", "gpt-4o"),
          "tts_provider": tenant.get("tts_provider", "cartesia"),
          "tts_voice": tenant.get("tts_voice", "professional"),
          "tools_enabled": tenant.get("tools_enabled", []),
      },
  })

  token = (
      AccessToken()
      .with_identity(identity)
      .with_metadata(metadata)
      .with_grants(VideoGrants(
          room_join=True,
          room=room,
      ))
      .to_jwt()
  )

  return {"token": token}
token_server.tstypescript
import { AccessToken } from "livekit-server-sdk";
import express from "express";

const app = express();

app.post("/token", async (req, res) => {
const { room, identity, apiKey } = req.body;

const tenant = await db.tenants.findOne({ apiKey });
if (!tenant) return res.status(401).json({ error: "Invalid API key" });

// Enforce room namespacing
if (!room.startsWith(`${tenant.id}_`)) {
  return res.status(403).json({
    error: `Room must be prefixed with ${tenant.id}_`,
  });
}

const token = new AccessToken(
  process.env.LIVEKIT_API_KEY,
  process.env.LIVEKIT_API_SECRET,
  { identity, metadata: JSON.stringify({
    tenant_id: tenant.id,
    plan: tenant.plan,
    agent_config: {
      instructions: tenant.customInstructions || "",
      stt_provider: tenant.sttProvider || "deepgram",
      llm_provider: tenant.llmProvider || "openai",
      llm_model: tenant.llmModel || "gpt-4o",
      tts_provider: tenant.ttsProvider || "cartesia",
      tts_voice: tenant.ttsVoice || "professional",
      tools_enabled: tenant.toolsEnabled || [],
    },
  })},
);

token.addGrant({ roomJoin: true, room });
res.json({ token: await token.toJwt() });
});

Single agent, many tenants

The agent reads tenant configuration from the participant metadata at session start and configures itself accordingly. One agent codebase, one deployment, many behaviors.

multi_tenant_agent.pypython
from livekit.agents import Agent, AgentSession, RoomInputOptions, function_tool, RunContext
from livekit.plugins import deepgram, openai, anthropic, cartesia, elevenlabs, google
import json

# Provider factories — map config strings to plugin instances
STT_PROVIDERS = {
  "deepgram": lambda cfg: deepgram.STT(model="nova-3"),
  "google": lambda cfg: google.STT(),
  "assemblyai": lambda cfg: __import__("livekit.plugins.assemblyai", fromlist=["STT"]).STT(),
}

LLM_PROVIDERS = {
  "openai": lambda cfg: openai.LLM(model=cfg.get("llm_model", "gpt-4o")),
  "anthropic": lambda cfg: anthropic.LLM(model=cfg.get("llm_model", "claude-sonnet-4-20250514")),
  "google": lambda cfg: google.LLM(model=cfg.get("llm_model", "gemini-2.0-flash")),
}

TTS_PROVIDERS = {
  "cartesia": lambda cfg: cartesia.TTS(voice=cfg.get("tts_voice", "professional")),
  "elevenlabs": lambda cfg: elevenlabs.TTS(voice_id=cfg.get("tts_voice")),
  "openai": lambda cfg: openai.TTS(voice=cfg.get("tts_voice", "alloy")),
}


class TenantAgent(Agent):
  def __init__(self, tenant_config: dict):
      base_instructions = (
          "You are a voice AI assistant. Follow your instructions carefully."
      )
      custom = tenant_config.get("instructions", "")
      instructions = f"{base_instructions}\n\n{custom}" if custom else base_instructions

      super().__init__(instructions=instructions)
      self.tenant_config = tenant_config


async def entrypoint(ctx):
  # Read tenant config from the room's participant metadata
  # The first remote participant has the tenant context
  await ctx.wait_for_participant()
  participant = ctx.room.remote_participants.values().__iter__().__next__()

  metadata = json.loads(participant.metadata or "{}")
  tenant_id = metadata.get("tenant_id", "unknown")
  config = metadata.get("agent_config", {})

  # Build provider stack from tenant config
  stt_factory = STT_PROVIDERS.get(config.get("stt_provider", "deepgram"))
  llm_factory = LLM_PROVIDERS.get(config.get("llm_provider", "openai"))
  tts_factory = TTS_PROVIDERS.get(config.get("tts_provider", "cartesia"))

  session = AgentSession(
      stt=stt_factory(config),
      llm=llm_factory(config),
      tts=tts_factory(config),
  )

  agent = TenantAgent(tenant_config=config)

  await session.start(
      agent=agent,
      room=ctx.room,
      room_input_options=RoomInputOptions(),
  )

  print(f"Session started for tenant {tenant_id}")

Cache tenant configs

For high-traffic platforms, cache tenant configs in Redis or in-memory with a short TTL. You do not want a database query on every session start. The token metadata approach above avoids this entirely by embedding config in the JWT.

Per-tenant tool registration

Different tenants need different tools. A healthcare tenant needs appointment scheduling. A retail tenant needs order lookup. Register tools dynamically based on the tenant's enabled tools list.

tenant_tools.pypython
from livekit.agents import Agent, function_tool, RunContext

# Tool registry — all available tools across all tenants
TOOL_REGISTRY = {}

def tenant_tool(name: str):
  """Decorator to register a tool in the global registry."""
  def decorator(func):
      TOOL_REGISTRY[name] = func
      return func
  return decorator

@tenant_tool("appointment_scheduling")
@function_tool()
async def schedule_appointment(context: RunContext, date: str, time: str):
  """Schedule an appointment for the caller."""
  # Implementation here
  return f"Appointment scheduled for {date} at {time}."

@tenant_tool("order_lookup")
@function_tool()
async def lookup_order(context: RunContext, order_id: str):
  """Look up an order by ID."""
  # Implementation here
  return f"Order {order_id}: shipped, arriving Thursday."

@tenant_tool("knowledge_base")
@function_tool()
async def search_knowledge(context: RunContext, query: str):
  """Search the tenant's knowledge base."""
  # Implementation with tenant-specific vector store
  return "Based on your documentation..."


class TenantAgent(Agent):
  def __init__(self, tenant_config: dict):
      super().__init__(
          instructions=tenant_config.get("instructions", ""),
      )

      # Register only the tools this tenant has enabled
      enabled = tenant_config.get("tools_enabled", [])
      for tool_name in enabled:
          if tool_name in TOOL_REGISTRY:
              self.register_tool(TOOL_REGISTRY[tool_name])

Usage tracking and billing

Track per-tenant usage by hooking into agent lifecycle events. Every session logs duration, token usage, and provider costs back to your billing system.

usage_tracking.pypython
from livekit.agents import Agent, AgentSession
from datetime import datetime
import time

class UsageTracker:
  def __init__(self, tenant_id: str, session_id: str):
      self.tenant_id = tenant_id
      self.session_id = session_id
      self.start_time = time.time()
      self.llm_tokens_in = 0
      self.llm_tokens_out = 0
      self.stt_seconds = 0.0
      self.tts_characters = 0

  def track_llm_usage(self, input_tokens: int, output_tokens: int):
      self.llm_tokens_in += input_tokens
      self.llm_tokens_out += output_tokens

  def track_stt_usage(self, duration_seconds: float):
      self.stt_seconds += duration_seconds

  def track_tts_usage(self, characters: int):
      self.tts_characters += characters

  async def flush(self):
      duration = time.time() - self.start_time
      record = {
          "tenant_id": self.tenant_id,
          "session_id": self.session_id,
          "timestamp": datetime.utcnow().isoformat(),
          "duration_seconds": round(duration, 1),
          "llm_tokens_in": self.llm_tokens_in,
          "llm_tokens_out": self.llm_tokens_out,
          "stt_seconds": round(self.stt_seconds, 1),
          "tts_characters": self.tts_characters,
          "estimated_cost_usd": self._estimate_cost(),
      }
      await billing_db.usage.insert_one(record)

  def _estimate_cost(self) -> float:
      # Example cost model — adjust per provider pricing
      llm_cost = (self.llm_tokens_in * 0.000003) + (self.llm_tokens_out * 0.000015)
      stt_cost = self.stt_seconds * (0.0059 / 60)  # Deepgram per-second
      tts_cost = self.tts_characters * 0.000015     # Cartesia per-char
      return round(llm_cost + stt_cost + tts_cost, 6)

Wire the tracker into the agent session:

tracked_session.pypython
async def entrypoint(ctx):
  # ... tenant config loading from above ...

  tracker = UsageTracker(tenant_id=tenant_id, session_id=ctx.room.name)

  session = AgentSession(
      stt=stt_factory(config),
      llm=llm_factory(config),
      tts=tts_factory(config),
  )

  # Hook into session events for usage tracking
  @session.on("agent_speech_committed")
  def on_speech(event):
      tracker.track_tts_usage(len(event.content))

  @session.on("user_speech_committed")
  def on_user_speech(event):
      tracker.track_stt_usage(event.duration)

  agent = TenantAgent(tenant_config=config)

  await session.start(
      agent=agent,
      room=ctx.room,
      room_input_options=RoomInputOptions(),
  )

  # Flush usage when session ends
  ctx.add_shutdown_callback(tracker.flush)

Scaling multi-tenant agent workers

With many tenants, you need to scale agent workers efficiently. Key patterns:

Shared worker pool. All tenants share the same pool of agent workers. LiveKit dispatches sessions to available workers automatically. This maximizes utilization — idle capacity for one tenant serves another.

Tenant-weighted scaling. Scale the worker pool based on total concurrent sessions across all tenants, not per-tenant. A platform with 100 tenants averaging 5 concurrent sessions each needs the same capacity as one tenant with 500 sessions.

Priority tiers. Enterprise tenants get dedicated worker capacity. Free-tier tenants share a best-effort pool. Implement this with separate agent worker deployments and LiveKit's room-level agent dispatch.

priority_dispatch.pypython
from livekit.agents import WorkerOptions, cli

# Enterprise agent pool — dedicated capacity
enterprise_worker = WorkerOptions(
  entrypoint=entrypoint,
  agent_name="enterprise-agent",
  num_idle_processes=5,  # Keep 5 warm for instant response
)

# Standard agent pool — shared capacity
standard_worker = WorkerOptions(
  entrypoint=entrypoint,
  agent_name="standard-agent",
  num_idle_processes=1,
)

# In your token server, set the room's agent dispatch:
# Enterprise tenants -> room metadata requests "enterprise-agent"
# Standard tenants -> room metadata requests "standard-agent"

Rate limiting per tenant

Without rate limits, a single tenant can starve others of agent capacity. Enforce concurrent session limits per tenant in your token server — reject new room creation when a tenant exceeds their plan's limit.

Tenant management dashboard

Track per-tenant metrics for your operations dashboard:

MetricPurposeAlert threshold
Concurrent sessionsCapacity planningApproaching plan limit
Avg session durationUsage patternsUnusual spikes
LLM tokens per sessionCost monitoring2x above average
Error rateQuality monitoringAbove 5%
P95 response latencySLA complianceAbove 2 seconds

Test your knowledge

Question 1 of 3

How does room namespacing provide tenant isolation in LiveKit without server-side changes?

What you learned

  • Room namespacing with tenant ID prefixes provides isolation without LiveKit server changes
  • JWT metadata embeds tenant config directly in the token, avoiding database lookups in the hot path
  • A single agent codebase serves many tenants by loading per-tenant instructions, provider configs, and tools dynamically
  • Usage tracking hooks into agent lifecycle events to build per-tenant billing
  • Shared worker pools maximize utilization; priority tiers with separate agent names serve enterprise vs standard tenants

Next up

In the next chapter, you will set up auto-scaling to handle traffic spikes across your multi-tenant platform.

Concepts covered
Tenant isolationRoom namespacingPer-tenant configUsage billing