Chapter 415m

Endpointing configuration

Endpointing configuration

VAD tells you when speech stops. Endpointing tells you when the turn is over. The gap between those two decisions is where conversation quality lives. Too short and your agent interrupts. Too long and it feels sluggish. This chapter shows you how to dial in the exact delays that make your agent feel responsive without being rude.

min_delaymax_delayEndpointing strategy

What you'll learn

  • How min_endpointing_delay and max_endpointing_delay work together to define an endpointing window
  • Why the two-threshold approach is better than a single fixed delay
  • How to tune endpointing for different use cases — from snappy Q&A to patient intake forms
  • How endpointing interacts with VAD padding to determine total response latency

The two-threshold endpointing model

LiveKit Agents uses two endpointing thresholds that work together. The system declares a turn boundary sometime between the minimum and maximum delay, using additional signals (like STT partial results) to decide the optimal moment within that window.

agent.pypython
from livekit.agents import AgentSession, TurnDetectionOptions

session = AgentSession(
  turn_detection=TurnDetectionOptions(
      min_endpointing_delay=0.5,
      max_endpointing_delay=1.0,
  ),
)
agent.tstypescript
import { AgentSession } from "@livekit/agents";

const session = new AgentSession({
turnDetection: {
  minEndpointingDelay: 0.5,
  maxEndpointingDelay: 1.0,
},
});
What's happening

The min_endpointing_delay is the earliest the system will declare a turn boundary after VAD reports silence. The max_endpointing_delay is the latest — if silence persists this long, the turn is always considered complete. Between these two values, the system uses contextual signals to pick the right moment. This range-based approach balances responsiveness with accuracy.

How the endpointing window works

Here is what happens when a user stops speaking:

VAD detects silence after the user's speech ends.

The endpointing timer starts counting from zero.

Before min_endpointing_delay, the system never declares a turn boundary — even if the silence seems final.

Between min_endpointing_delay and max_endpointing_delay, the system may declare a boundary if contextual signals (like a complete sentence in the STT transcript) confirm the turn is over.

At max_endpointing_delay, the system always declares a turn boundary regardless of other signals.

Total silence includes VAD padding

Remember that VAD's padding_duration runs first. If your VAD padding is 0.3 seconds and your min endpointing delay is 0.5 seconds, the user will experience 0.8 seconds of total silence before the agent can begin responding. Keep this in mind when calculating perceived latency.

Tuning for specific use cases

The right endpointing delays depend entirely on your application. Here are four common profiles with the reasoning behind each.

profiles.pypython
from livekit.agents import TurnDetectionOptions

# Quick Q&A bot — speed is everything
quick_qa = TurnDetectionOptions(
  min_endpointing_delay=0.3,
  max_endpointing_delay=0.6,
)

# Customer service — balanced responsiveness
customer_service = TurnDetectionOptions(
  min_endpointing_delay=0.5,
  max_endpointing_delay=1.0,
)

# Medical intake — never rush the patient
medical_intake = TurnDetectionOptions(
  min_endpointing_delay=0.8,
  max_endpointing_delay=1.5,
)

# Elderly users — extra patience for slower speech
elderly_friendly = TurnDetectionOptions(
  min_endpointing_delay=1.0,
  max_endpointing_delay=2.0,
)
profiles.tstypescript
// Quick Q&A bot — speed is everything
const quickQa = {
minEndpointingDelay: 0.3,
maxEndpointingDelay: 0.6,
};

// Customer service — balanced responsiveness
const customerService = {
minEndpointingDelay: 0.5,
maxEndpointingDelay: 1.0,
};

// Medical intake — never rush the patient
const medicalIntake = {
minEndpointingDelay: 0.8,
maxEndpointingDelay: 1.5,
};

// Elderly users — extra patience for slower speech
const elderlyFriendly = {
minEndpointingDelay: 1.0,
maxEndpointingDelay: 2.0,
};

Do not set min equal to max

If min_endpointing_delay equals max_endpointing_delay, you lose the adaptive window entirely and get a fixed delay. The power of the two-threshold model is the range between them, where contextual signals can fire an early turn boundary for clearly finished sentences while still waiting for users who pause mid-thought.

Measuring the impact

When tuning endpointing delays, track two key metrics:

MetricWhat it measuresDesired direction
Interruption ratePercentage of turns where the agent spoke before the user finishedLower is better
Average response latencyTime from user silence to agent speech onsetLower is better (but not at the cost of interruptions)

These two metrics pull in opposite directions. Your job is to find the sweet spot for your use case. A/B testing different configurations (covered later in this course) is the most reliable way to find it.

Test your knowledge

Question 1 of 3

If your VAD padding_duration is 0.3 seconds and your min_endpointing_delay is 0.5 seconds, what is the minimum total silence the user experiences before the agent can respond?

What you learned

  • Endpointing uses a two-threshold model with min_endpointing_delay and max_endpointing_delay
  • The minimum delay prevents premature turn boundaries; the maximum delay guarantees eventual ones
  • Between the two thresholds, contextual signals like STT results help determine the optimal boundary
  • Total perceived silence is VAD padding plus endpointing delay
  • Different use cases require different delay profiles, from fast Q&A to patient medical intake

Next up

Next, you will tackle interruption handling — what happens when a user starts speaking while the agent is still talking, and how adaptive mode uses the LLM to distinguish real interruptions from background noise.

Concepts covered
min_delaymax_delayEndpointing strategy