Noise cancellation & room configuration
Noise cancellation & room configuration
Your dental receptionist can check availability and book appointments. But callers are not in sound studios — they are in cars, at coffee shops, in waiting rooms with TVs blaring. Background noise confuses STT models, triggers false turn detections, and degrades the entire experience. In this chapter, you will add noise cancellation to the audio input pipeline and configure room options that make your agent production-ready.
Why noise cancellation matters for voice AI
Noise does not just sound bad — it breaks the pipeline. A car horn in the background becomes garbled text in the STT output. A TV conversation gets transcribed and mixed into the caller's words. The VAD model detects "speech" from a barking dog and triggers a turn. Every stage of the STT to LLM to TTS pipeline degrades when the input audio is noisy.
Noise cancellation sits before everything else. It processes the raw audio from the caller's microphone and strips out non-speech sounds before the audio reaches STT or VAD. Clean audio in means accurate transcription, reliable turn detection, and a conversation that works even in a busy parking lot.
Adding noise cancellation
LiveKit provides a noise cancellation plugin with two modes: BVC for standard WebRTC callers (browser, mobile app) and BVCTelephony for SIP/telephone callers. You configure it through RoomOptions and AudioInputOptions:
from livekit.agents import (
AgentServer,
Agent,
AgentSession,
RoomOptions,
AudioInputOptions,
)
from livekit.plugins import openai, deepgram, cartesia, noise_cancellation
server = AgentServer()
@server.rtc_session
async def entrypoint(session: AgentSession):
await session.start(
agent=Agent(
instructions="""You are a friendly receptionist at Bright Smile Dental clinic.
Keep responses brief and conversational. Never use markdown or emojis.
When a caller asks about availability, use check_availability to look up
real slots. Never guess or make up times.
When a caller wants to book, collect their full name, preferred date, and
preferred time slot. Then use book_appointment to complete the booking.
After booking, ask if there is anything else you can help with.""",
tools=[check_availability, book_appointment],
),
room=session.room,
room_options=RoomOptions(
audio_input=AudioInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
),
stt=deepgram.STT(model="nova-3"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=cartesia.TTS(voice="<voice-id>"),
)
if __name__ == "__main__":
server.run()Three new imports, three new lines of configuration. The RoomOptions object wraps AudioInputOptions, which wraps the noise cancellation plugin. This nested structure reflects what is happening: room-level settings contain audio-level settings, which contain processing-level settings.
BVC for browser and mobile callers
noise_cancellation.BVC() is the standard noise cancellation model. It removes background noise, echo, and non-speech sounds from WebRTC audio. Use this when your callers connect through a web browser or a mobile app — the typical case for most voice AI applications.
BVCTelephony for SIP callers
If your dental clinic accepts calls over a phone line via SIP trunking, use noise_cancellation.BVCTelephony() instead. Telephone audio has different characteristics — narrower frequency range, different codecs, different noise profiles. The telephony model is tuned for these conditions.
How do I know which one to use?
If callers connect through a web browser or mobile app, use BVC(). If callers dial a phone number that routes through a SIP trunk to LiveKit, use BVCTelephony(). If you support both, you can check the participant's connection type at runtime and configure accordingly.
Linked participant
By default, your agent subscribes to audio from all participants in the room. For a one-on-one conversation like a dental receptionist call, you want the agent linked to a specific caller. The linked_participant option on RoomOptions does this:
room_options=RoomOptions(
audio_input=AudioInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
linked_participant="caller",
)When you set linked_participant, the agent only processes audio from that specific participant identity. This matters in two scenarios: rooms where multiple participants might join (a supervisor listening in), and rooms where system audio tracks exist that should not be transcribed.
In production, the participant identity is set when the caller's access token is generated. Your backend creates a token with identity="caller" (or whatever identifier you use), and the agent's linked_participant matches that identity. For development with the Playground, LiveKit assigns an identity automatically — you do not need to set linked_participant until you deploy.
Session lifecycle options
Two additional RoomOptions settings control what happens when the conversation ends:
room_options=RoomOptions(
audio_input=AudioInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
close_on_disconnect=True,
delete_room_on_close=True,
)close_on_disconnect tells the agent session to close automatically when the linked participant disconnects. Without this, the agent session stays alive after the caller hangs up, consuming resources until a timeout kicks in. For a dental receptionist, the caller disconnecting means the call is over — close immediately.
delete_room_on_close deletes the LiveKit room when the session closes. This is clean-up hygiene. Each call creates a room, and rooms consume server resources. Deleting the room after the session ensures you are not accumulating stale rooms.
close_on_disconnect requires linked_participant
The close_on_disconnect option only works when linked_participant is set. The agent needs to know which participant's disconnection should trigger the close. Without a linked participant, it does not know whose departure matters.
The complete configuration
Here is the full agent.py with noise cancellation and room options added to the booking agent from the previous chapter:
from livekit.agents import (
AgentServer,
Agent,
AgentSession,
RoomOptions,
AudioInputOptions,
function_tool,
RunContext,
ToolError,
)
from livekit.plugins import openai, deepgram, cartesia, noise_cancellation
server = AgentServer()
@function_tool
async def check_availability(context: RunContext, date: str) -> str:
"""Check available appointment slots for a given date.
Args:
date: The date to check availability for (e.g., "next Tuesday", "March 15")
"""
available_slots = ["9:00 AM", "11:30 AM", "2:00 PM", "4:30 PM"]
return f"Available slots for {date}: {', '.join(available_slots)}"
@function_tool
async def book_appointment(
context: RunContext,
patient_name: str,
date: str,
time: str,
) -> str:
"""Book a dental appointment for a patient.
Args:
patient_name: The patient's full name
date: The appointment date (e.g., "next Tuesday", "March 15")
time: The appointment time slot (e.g., "9:00 AM", "2:00 PM")
"""
valid_slots = ["9:00 AM", "11:30 AM", "2:00 PM", "4:30 PM"]
if time not in valid_slots:
raise ToolError(
f"{time} is not an available slot. Available: {', '.join(valid_slots)}"
)
booking = {
"patient_name": patient_name,
"date": date,
"time": time,
"status": "confirmed",
}
if not hasattr(context.session, "userdata"):
context.session.userdata = {}
context.session.userdata["last_booking"] = booking
await context.session.say(
f"I have booked an appointment for {patient_name} on {date} at {time}.",
allow_interruptions=False,
)
return f"Appointment confirmed: {patient_name}, {date} at {time}. Ask if there is anything else you can help with."
@server.rtc_session
async def entrypoint(session: AgentSession):
await session.start(
agent=Agent(
instructions="""You are a friendly receptionist at Bright Smile Dental clinic.
Keep responses brief and conversational. Never use markdown or emojis.
When a caller asks about availability, use check_availability to look up
real slots. Never guess or make up times.
When a caller wants to book, collect their full name, preferred date, and
preferred time slot. Then use book_appointment to complete the booking.
After booking, ask if there is anything else you can help with.""",
tools=[check_availability, book_appointment],
),
room=session.room,
room_options=RoomOptions(
audio_input=AudioInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
),
stt=deepgram.STT(model="nova-3"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=cartesia.TTS(voice="<voice-id>"),
)
if __name__ == "__main__":
server.run()Test noise cancellation
Run your agent and test with real background noise:
lk agent devTest 1: baseline without noise cancellation. Temporarily comment out the room_options parameter and connect from a noisy environment — play music from your phone, turn on a fan, or open a window to street noise. Ask the agent about availability. You will likely see garbled transcriptions in the logs and erratic turn detection.
Test 2: with noise cancellation enabled. Uncomment the room_options and restart. With the same background noise, try the same conversation. The STT transcription should be dramatically cleaner, and the agent should respond without false triggers from background sounds.
Try saying: "I'd like to book an appointment for next Wednesday at 2 PM. My name is Casey Morgan."
Even with noise in the background, the agent should correctly transcribe the name, date, and time, call book_appointment, and deliver the confirmation.
Test your knowledge
Question 1 of 2
Why does noise cancellation sit before STT and VAD in the audio processing pipeline rather than after?
Noise cancellation adds minimal latency
BVC runs as a preprocessing step on the audio input and adds roughly 10-20ms of latency. This is negligible compared to the STT, LLM, and TTS stages. The accuracy improvement in noisy environments far outweighs the tiny latency cost.
Your dental receptionist now handles real-world audio conditions. In the next chapter, you will write automated tests that verify the agent behaves correctly — without needing to talk to it manually every time you make a change.