Chapter 425m

Monitoring setup

Monitoring setup

A deployed agent without monitoring is a black box. You have no idea if callers are getting good responses, if latency is creeping up, or if errors are silently piling up. This chapter sets up comprehensive monitoring with LiveKit Cloud Insights so you can see exactly what is happening inside every conversation.

Cloud InsightsTranscriptsTraces

What you'll learn

  • How to access and navigate LiveKit Cloud Insights
  • How to read conversation transcripts and identify quality issues
  • How to use traces to diagnose latency bottlenecks in the STT/LLM/TTS pipeline
  • The key metrics every voice AI operator should watch daily

Why monitoring voice agents is different

Monitoring a web API is straightforward: track response times, error rates, and throughput. Voice agents add dimensions that traditional APM tools do not cover:

  • Conversation quality -- Did the agent understand the caller? Did it give the right answer?
  • Turn-level latency -- Not just overall response time, but how long each pipeline stage (STT, LLM, TTS) took for each conversational turn
  • Audio quality -- Was there echo, packet loss, or jitter that degraded the experience?
  • Session lifecycle -- Did the caller hang up in frustration, or did the conversation reach a natural conclusion?
What's happening

Monitoring a web API is like checking if the restaurant is open and serving food. Monitoring a voice agent is like checking if the food tastes good, the waiter is attentive, and the conversation at the table flows naturally. You need qualitative insight, not just quantitative metrics.

Accessing Cloud Insights

Cloud Insights is built into the LiveKit Cloud dashboard. No additional setup or third-party tools required.

1

Open the Cloud dashboard

Navigate to your LiveKit Cloud project at cloud.livekit.io. Select your project from the project list.

2

Navigate to Insights

Click the Insights tab in the left navigation. You will see an overview of recent sessions, including total sessions, error rates, and average latency.

3

Select a session

Click any session to drill into its details -- transcripts, traces, and session metadata.

Transcripts: understanding what happened

Every conversation your agent handles is transcribed and stored. The transcript view shows:

  • User messages -- What the caller said, as transcribed by STT
  • Agent messages -- What the agent said in response
  • Timestamps -- When each message occurred
  • Tool calls -- Any function calls the agent made (booking appointments, looking up data)
terminalbash
# List recent sessions from the CLI
lk cloud sessions list dental-receptionist

# View transcript for a specific session
lk cloud sessions view sess_abc123 --transcript

What to look for in transcripts:

PatternIndicatesAction
Agent misunderstands callerSTT errors or poor audioCheck audio quality metrics, consider STT model upgrade
Agent gives wrong informationLLM hallucination or missing contextUpdate instructions, add guardrails
Caller repeats themselvesAgent response was unclear or too longShorten responses, improve TTS pacing
Caller hangs up mid-conversationFrustration or latency issuesCheck turn-level latency traces
Tool call failsBackend service issueCheck tool call error logs, verify API connectivity

Schedule weekly transcript reviews

Block 30 minutes per week to read 10-20 random transcripts. This is the single most effective way to improve your agent. Automated metrics tell you something is wrong; transcripts tell you what to fix.

Traces: diagnosing latency

Traces break down each conversational turn into its pipeline stages. For every turn, you can see:

  • STT duration -- How long speech-to-text processing took
  • LLM time to first token -- How long until the LLM started generating
  • LLM total duration -- Full generation time including all tokens
  • TTS time to first audio -- How long until the first audio chunk was ready
  • End-to-end latency -- Total time from user finishing speech to agent starting speech
terminalbash
# View traces for a specific session
lk cloud sessions view sess_abc123 --traces

A healthy voice agent trace looks like this:

example trace outputtext
Turn 1:
STT:           142ms
LLM (TTFT):     89ms
LLM (total):   340ms
TTS (TTFA):    112ms
End-to-end:    450ms  ✓ Under 500ms budget

Turn 4:
STT:           156ms
LLM (TTFT):   1240ms  ⚠ Slow - check prompt length
LLM (total):  3200ms
TTS (TTFA):    118ms
End-to-end:   1520ms  ✗ Over budget
What's happening

Traces are like an itemized receipt for each turn. Instead of knowing "this turn took 1.5 seconds" you know exactly where those 1.5 seconds went. Turn 4 above is slow because of the LLM -- maybe the conversation context grew too large, or the model is overloaded. Without traces, you would be guessing.

Key metrics to watch

Set up a daily monitoring routine around these metrics:

1

End-to-end latency (P50 and P95)

P50 under 500ms is good. P95 under 800ms is acceptable. If P95 exceeds 1 second, callers are noticing delays. Drill into traces to find the bottleneck stage.

2

Error rate

Track the percentage of sessions with errors. A healthy agent runs under 1% error rate. Spikes usually mean an API key expired, a provider is down, or a code bug was deployed.

3

Session completion rate

What percentage of sessions reach a natural conclusion versus the caller hanging up early? A drop in completion rate signals a quality problem -- the agent is frustrating callers.

4

STT accuracy

Review transcripts for misheard words. If STT accuracy is low, consider switching to a higher-quality model or adding domain-specific vocabulary.

5

Tool call success rate

Track how often tool calls succeed versus fail. A spike in tool failures means your backend services need attention.

Building a monitoring dashboard

For teams that want a consolidated view, export key metrics to a dashboard:

monitoring.pypython
import logging
from datetime import datetime

logger = logging.getLogger("agent-monitor")

class SessionMonitor:
  """Track key metrics for each agent session."""

  def __init__(self, session_id: str):
      self.session_id = session_id
      self.start_time = datetime.now()
      self.turn_count = 0
      self.tool_calls = 0
      self.tool_failures = 0

  def on_turn_complete(self, latency_ms: float):
      self.turn_count += 1
      logger.info(
          "turn_complete",
          extra={
              "session_id": self.session_id,
              "turn": self.turn_count,
              "latency_ms": latency_ms,
          },
      )

  def on_tool_call(self, tool_name: str, success: bool, duration_ms: float):
      self.tool_calls += 1
      if not success:
          self.tool_failures += 1
      logger.info(
          "tool_call",
          extra={
              "session_id": self.session_id,
              "tool": tool_name,
              "success": success,
              "duration_ms": duration_ms,
          },
      )

  def on_session_end(self):
      duration = (datetime.now() - self.start_time).total_seconds()
      logger.info(
          "session_end",
          extra={
              "session_id": self.session_id,
              "duration_seconds": duration,
              "turns": self.turn_count,
              "tool_calls": self.tool_calls,
              "tool_failures": self.tool_failures,
          },
      )
monitoring.tstypescript
import { Logger } from "pino";

class SessionMonitor {
private sessionId: string;
private startTime: number;
private turnCount = 0;
private toolCalls = 0;
private toolFailures = 0;

constructor(
  sessionId: string,
  private logger: Logger
) {
  this.sessionId = sessionId;
  this.startTime = Date.now();
}

onTurnComplete(latencyMs: number): void {
  this.turnCount++;
  this.logger.info({
    event: "turn_complete",
    sessionId: this.sessionId,
    turn: this.turnCount,
    latencyMs,
  });
}

onToolCall(toolName: string, success: boolean, durationMs: number): void {
  this.toolCalls++;
  if (!success) this.toolFailures++;
  this.logger.info({
    event: "tool_call",
    sessionId: this.sessionId,
    tool: toolName,
    success,
    durationMs,
  });
}

onSessionEnd(): void {
  const durationSeconds = (Date.now() - this.startTime) / 1000;
  this.logger.info({
    event: "session_end",
    sessionId: this.sessionId,
    durationSeconds,
    turns: this.turnCount,
    toolCalls: this.toolCalls,
    toolFailures: this.toolFailures,
  });
}
}

Structured logs feed dashboards

The structured log output above can be collected by any log aggregator (Datadog, Grafana Loki, CloudWatch) and turned into dashboards and alerts. You will set up alerting in a later chapter.

Monitoring checklist

Run through this checklist daily:

CheckFrequencyWhat to look for
Error rateDailySpikes above 1%
P95 latencyDailyExceeding 800ms consistently
Session completionDailyDrops below 80%
Transcript reviewWeeklyMisunderstandings, wrong answers, awkward phrasing
Tool call failuresDailySpikes in failure rate
Audio qualityWeeklyListen to 3-5 random sessions

Test your knowledge

Question 1 of 3

Why are traces more useful than overall response time for diagnosing voice agent latency issues?

What you learned

  • Cloud Insights provides transcripts, traces, and session data for every conversation without additional setup
  • Transcripts reveal what happened in a conversation -- the qualitative view
  • Traces reveal where time was spent in each turn -- the quantitative view
  • A daily monitoring routine covering latency, errors, completion rate, and tool reliability catches problems before callers notice
  • Structured logging in your agent code enables custom dashboards and alerting

Next up

Built-in monitoring covers the basics. In the next chapter, you will implement custom metrics and data hooks to track business-specific KPIs like booking rates, call duration by intent, and cost per conversation.

Concepts covered
Cloud InsightsTranscriptsTraces