Chapter 125m

How telephony works in LiveKit

How telephony works in LiveKit

Your dental receptionist from Course 1.1 works beautifully over WebRTC in a browser. But your patients are not going to open a web app to schedule a cleaning -- they are going to pick up a phone and dial a number. In this chapter, you will learn how LiveKit bridges the century-old phone network into modern WebRTC rooms so your AI agent can answer real phone calls without a single line of telephony code.

SIPParticipantKind.SIPTelephony architecture

What you'll learn

  • How the SIP protocol works at a high level (signaling vs media)
  • How LiveKit bridges phone calls into rooms as SIP participants
  • Why your agent does not need to know it is talking to a phone
  • The full architecture from phone network to AI agent

SIP: the protocol that powers phone calls

SIP -- Session Initiation Protocol -- is the standard protocol for initiating, managing, and terminating voice calls over IP networks. When you make a phone call today, SIP is almost certainly involved somewhere in the chain, even if the first hop is still analog.

SIP handles two distinct concerns:

Signaling -- the control plane. SIP messages set up and tear down calls. "I want to call this number." "Ringing." "Answered." "Hang up." These are text-based messages (similar to HTTP) that negotiate the call parameters: who is calling whom, what audio codecs to use, where to send media. Signaling does not carry any voice audio.

Media -- the data plane. Once SIP signaling establishes a call, the actual voice audio flows over RTP (Real-time Transport Protocol). RTP packets carry the encoded audio between endpoints. Media and signaling travel on separate paths -- signaling might go through multiple SIP proxies while media flows directly between endpoints.

What's happening

Think of SIP like a maitre d' at a restaurant. The maitre d' (signaling) greets you, checks your reservation, and escorts you to your table. Once you are seated, the waiter (media) takes over and delivers the food. The maitre d' does not carry plates, and the waiter does not handle reservations. SIP signaling sets up the call; RTP carries the voice.

The bridge: PSTN to LiveKit Room

Here is the full path a phone call takes to reach your AI agent:

1

Patient dials your number

The patient picks up their phone and dials the number you have provisioned for your dental office. The call enters the PSTN (Public Switched Telephone Network) -- the global phone network.

2

PSTN routes to your SIP trunk provider

The PSTN routes the call to your SIP trunk provider (Twilio, Telnyx, or LiveKit's built-in phone numbers). The trunk provider converts the traditional phone call into a SIP session.

3

SIP trunk provider connects to LiveKit

The trunk provider sends a SIP INVITE to LiveKit's SIP bridge. This message says: "Incoming call to +15551234567 from +15559876543."

4

LiveKit creates a room and admits the caller

LiveKit's SIP bridge evaluates your dispatch rules (you will configure these in Chapter 3) and creates or joins a Room. The phone caller becomes a Participant in that Room with a special kind: ParticipantKind.SIP. Their voice audio is published as an audio Track, just like any WebRTC participant.

5

Your agent joins and starts talking

Your agent, registered to handle rooms matching the dispatch pattern, joins the Room. It subscribes to the caller's audio Track, feeds it through the STT/LLM/TTS pipeline, and publishes its responses as audio Tracks that LiveKit's SIP bridge converts back into phone audio.

The architecture looks like this:

Telephony architecture

Phone (PSTN)

SIP Trunk Provider

LiveKit SIP Bridge

Room

Caller joins as ParticipantKind.SIP

Agent

The SIP bridge is transparent

LiveKit's SIP bridge handles all the protocol translation between SIP/RTP and WebRTC. Your agent never deals with SIP messages, RTP packets, or phone network signaling. It just sees a Participant with an audio Track -- exactly the same as a browser-based caller.

ParticipantKind.SIP: a phone caller in the room

When a phone caller enters a LiveKit Room through the SIP bridge, they appear as a Participant with kind set to ParticipantKind.SIP. This is the only difference between a phone caller and a WebRTC caller from your agent's perspective.

Your agent code from Course 1.1 works without modification. The AgentSession receives audio from the caller's Track, transcribes it, sends it to the LLM, synthesizes the response, and publishes it back. Whether that audio came from a browser microphone over WebRTC or from a phone over SIP/RTP is invisible to your application logic.

What's happening

This is the power of LiveKit's room-based architecture. A Room is a universal meeting point. Browser users join via WebRTC SDKs. Phone callers join via the SIP bridge. Agents join via the Agents framework. Once inside the Room, everyone is just a Participant with Tracks. The transport differences are abstracted away entirely.

You can detect that a caller is on the phone if you need to adapt behavior:

agent.pypython
from livekit import rtc

for participant in session.room.remote_participants.values():
  if participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP:
      # This participant is on a phone
      print(f"Phone caller: {participant.identity}")
agent.tstypescript
import { ParticipantKind } from "@livekit/rtc-node";

for (const participant of session.room.remoteParticipants.values()) {
if (participant.kind === ParticipantKind.SIP) {
  // This participant is on a phone
  console.log(`Phone caller: ${participant.identity}`);
}
}

In practice, you rarely need this check. Your dental receptionist greets everyone the same way, checks availability the same way, and books appointments the same way regardless of how the caller connected.

What changes with phone callers

While your agent code stays the same, there are a few practical differences to be aware of when callers are on phones:

Audio quality -- Phone audio defaults to narrowband codecs like G.711 (8kHz sampling rate) rather than Opus (48kHz). This means the STT model receives lower-fidelity audio. You will learn how to upgrade to HD voice in Chapter 6.

No visual interface -- Phone callers cannot see buttons, forms, or screen shares. Any interaction must be voice-only or use DTMF keypad tones (Chapter 5).

Caller identity -- Phone callers are identified by their phone number (E.164 format like +15559876543) rather than a username or email. This appears in the Participant's attributes.

Call control -- Phone-specific actions like transfers, hold, and hang-up use SIP-specific APIs rather than WebRTC track controls.

Design for voice first

If you build your agent to work well over voice alone -- clear spoken responses, no reliance on visual elements, confirmation through conversation -- it will work perfectly for both phone and browser callers.

Test your knowledge

Question 1 of 3

Why does a LiveKit voice agent not need any code changes to handle phone callers compared to WebRTC browser callers?

SIP deep dive: what is actually on the wire

Understanding SIP at the protocol level is not strictly necessary for building voice agents, but it is invaluable when things go wrong. When calls fail to connect, audio is one-way, or quality degrades, the answers are in the SIP messages and RTP streams. This section gives you the vocabulary and tools to diagnose telephony issues.

SIP message anatomy

SIP is a text-based protocol, structurally similar to HTTP. A SIP message has a request line (or status line), headers, and an optional body. Here is a simplified SIP INVITE — the message that initiates a call:

SIP INVITE (simplified)text
INVITE sip:+15551234567@sip.livekit.io SIP/2.0
Via: SIP/2.0/UDP 203.0.113.10:5060;branch=z9hG4bK776asdhds
From: "Patient" <sip:+15559876543@trunk.twilio.com>;tag=1928301774
To: <sip:+15551234567@sip.livekit.io>
Call-ID: a84b4c76e66710@203.0.113.10
CSeq: 314159 INVITE
Contact: <sip:+15559876543@203.0.113.10:5060>
Content-Type: application/sdp
Content-Length: 142

v=0
o=- 2890844526 2890844526 IN IP4 203.0.113.10
s=-
c=IN IP4 203.0.113.10
t=0 0
m=audio 49170 RTP/AVP 0 8 96
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:96 opus/48000/2

Key parts to understand:

ComponentWhat it tells you
Request-URI (sip:+15551234567@sip.livekit.io)The destination number and SIP server
FromThe caller's identity and originating trunk
ToThe intended recipient
Call-IDUnique identifier for this call — used to correlate all messages in a dialog
SDP body (m=audio ...)Media description: which codecs the caller supports, where to send audio, which ports to use
a=rtpmap linesCodec offers — PCMU (G.711 u-law), PCMA (G.711 A-law), Opus

The SDP (Session Description Protocol) body is where codec negotiation happens. The caller offers codecs it supports; the answering side responds with the codecs it accepts. If your trunk offers only G.711 and your LiveKit config prefers Opus, the response SDP will reflect the negotiated choice.

The SIP call flow

A successful inbound call follows this exchange:

SIP call flowtext
Caller (Trunk)              LiveKit SIP Bridge
   |                              |
   |------- INVITE (SDP offer) -->|  "I want to call +15551234567"
   |                              |
   |<------ 100 Trying -----------|  "Working on it"
   |                              |
   |<------ 180 Ringing ----------|  "Ringing the destination"
   |                              |
   |<------ 200 OK (SDP answer) --|  "Call accepted, here are my media params"
   |                              |
   |------- ACK ----------------->|  "Got it, let's talk"
   |                              |
   |<======= RTP audio =========>|  Bidirectional audio flows
   |                              |
   |------- BYE ----------------->|  "Hanging up"
   |                              |
   |<------ 200 OK ---------------|  "Goodbye acknowledged"
   |                              |

This is the "happy path." When debugging, you are looking for where this flow breaks down:

  • No 100 Trying: LiveKit is not receiving the INVITE — check network connectivity, DNS, and firewall rules (SIP uses port 5060 for UDP/TCP, 5061 for TLS).
  • 408 Request Timeout: INVITE was sent but no response — the SIP bridge may be unreachable.
  • 403 Forbidden: Authentication failure — check trunk credentials and IP allowlisting.
  • 404 Not Found: The dialed number does not match any configured trunk — verify the number in your trunk config matches exactly (E.164 format).
  • 488 Not Acceptable Here: Codec negotiation failed — the caller and LiveKit could not agree on a common codec. Check allowed_codecs on your trunk.
  • 503 Service Unavailable: The SIP bridge is overloaded or down — check LiveKit service status.

Capturing and reading SIP traffic with pcaps

When Cloud Insights is not enough — when you need to see the raw protocol exchange — packet captures (pcaps) are the definitive debugging tool. A pcap file captures every network packet, letting you reconstruct the exact SIP messages and RTP streams.

Capturing with tcpdump

terminalbash
# Capture SIP signaling on port 5060 (save to file for analysis)
sudo tcpdump -i any -s 0 -w sip-debug.pcap port 5060

# Capture both SIP signaling and RTP media
sudo tcpdump -i any -s 0 -w full-call.pcap "port 5060 or portrange 10000-60000"

# Filter to a specific trunk provider IP
sudo tcpdump -i any -s 0 -w trunk-debug.pcap host 54.172.60.0 and port 5060

# Capture TLS-encrypted SIP on port 5061
sudo tcpdump -i any -s 0 -w sip-tls.pcap port 5061

pcaps contain call audio

If your SIP trunk is not encrypted with SRTP, the pcap file contains the raw voice audio of the call. Handle pcaps with the same sensitivity as call recordings — they may contain PHI or other sensitive data. Delete them after debugging.

Analyzing with Wireshark

Open the pcap in Wireshark and use its SIP/RTP analysis tools:

Wireshark workflowtext
1. Open sip-debug.pcap in Wireshark
2. Filter: "sip" to see only SIP messages
3. Telephony → SIP Flows — shows the complete call ladder diagram
4. Select an INVITE → inspect the SDP body for codec offers
5. Select the 200 OK → inspect the SDP answer for negotiated codecs
6. Filter: "rtp" to see media packets
7. Telephony → RTP → RTP Streams — shows jitter, packet loss, codec in use
8. Telephony → RTP → RTP Stream Analysis — graphs jitter over time

What to look for in Wireshark:

ProblemWhat you'll see in the pcap
Call not connectingINVITE sent but no response, or a 4xx/5xx error response
One-way audioRTP packets flowing in only one direction — check NAT/firewall blocking return media
No audio at allSIP 200 OK exchanged but no RTP packets — SDP media address may be wrong (often a NAT issue)
Choppy audioRTP stream analysis shows high jitter (>30ms) or packet loss (>1%)
Wrong codecSDP answer shows G.711 when you expected Opus — check allowed_codecs order and trunk provider support
Oob DTMF not workingLook for SIP INFO messages or RFC 2833 telephone-event RTP packets — if absent, DTMF is in-band

Using sngrep for live SIP debugging

For quick SIP debugging without capturing full pcaps, sngrep provides a real-time terminal UI for SIP traffic:

terminalbash
# Install sngrep
# Debian/Ubuntu: sudo apt install sngrep
# macOS: brew install sngrep

# Watch live SIP traffic
sudo sngrep

# Filter to a specific call
sudo sngrep -d any -c "INVITE"

# Watch only traffic to/from your trunk provider
sudo sngrep -d any host 54.172.60.0

sngrep shows the SIP ladder diagram in real time — you can see INVITEs, responses, and BYEs as they happen. It is faster than capturing a pcap when you just need to verify that signaling is flowing correctly.

Common telephony debugging scenarios

Scenario: Calls ring but agent never answers

1

Check the SIP flow

Use sngrep or a pcap to verify the INVITE reaches LiveKit and a 200 OK is sent. If you see 180 Ringing but never 200 OK, the SIP bridge accepted the call but could not dispatch it.

2

Check dispatch rules

Run lk sip dispatch list and verify the dispatch rule matches the incoming trunk. The room prefix must match what your agent is registered to handle.

3

Check agent registration

Verify your agent is running and registered with lk agent list. If the agent is not registered, the SIP bridge has no one to dispatch to.

Scenario: Audio works in one direction only

One-way audio is almost always a NAT or firewall issue. The SDP body in the INVITE and 200 OK contains the IP addresses and ports for RTP media. If one side advertises a private IP (e.g., 192.168.x.x) that the other side cannot reach, audio flows only in the direction where the address is routable.

1

Check the SDP addresses

Open the pcap and examine the c= line (connection address) and m= line (media port) in both the INVITE and 200 OK. Both sides need publicly routable addresses for RTP.

2

Check for NAT traversal

If you are self-hosting, ensure your LiveKit server has TURN configured or is on a host with a public IP. For LiveKit Cloud, this is handled automatically.

3

Check firewall rules

RTP uses UDP on a wide port range (typically 10000-60000). Ensure your firewall allows UDP traffic in both directions on these ports.

Scenario: Poor audio quality (choppy, robotic, dropping)

1

Capture and analyze RTP

Use Wireshark's RTP stream analysis to check jitter and packet loss. Jitter above 30ms or packet loss above 1% will degrade audio quality.

2

Check for codec mismatch

If the trunk negotiated G.711 instead of Opus, the lower-quality codec may amplify network issues. Verify your allowed_codecs configuration and trunk provider settings.

3

Check network path

Use traceroute or mtr to examine the network path between your trunk provider and LiveKit. High latency hops or packet loss on the path indicate a network issue outside your control — consider region pinning to reduce the path length.

When to use pcaps vs Cloud Insights

Start with Cloud Insights — it shows call timelines, latency breakdowns, and audio quality without any setup. Reach for pcaps only when you need to see the raw SIP exchange: codec negotiation failures, NAT issues, or when calls fail before reaching the LiveKit platform (so Insights has no data).

What you learned

  • SIP is the protocol that sets up phone calls (signaling) while RTP carries the actual voice audio (media)
  • LiveKit's SIP bridge translates between the phone network and WebRTC rooms
  • Phone callers appear as Participants with ParticipantKind.SIP -- your agent code needs no changes
  • The full path is: Phone -> PSTN -> SIP Trunk Provider -> LiveKit SIP Bridge -> Room -> Agent
  • Practical differences include audio quality, no visual interface, and phone-number-based identity
  • SIP messages are text-based (like HTTP) with an SDP body that negotiates codecs and media addresses
  • Use tcpdump to capture pcaps and Wireshark to analyze them — look at SIP flows, SDP bodies, and RTP stream quality
  • Use sngrep for quick real-time SIP debugging in the terminal
  • Common issues (one-way audio, no audio, codec mismatches) are diagnosed by examining SDP addresses and RTP streams

Next up

In the next chapter, you will get a real phone number, configure an inbound SIP trunk, and point it at LiveKit so calls can start flowing into your rooms.

Concepts covered
SIPParticipantKind.SIPTelephony architectureSDPpcapsWiresharksngrepSIP debugging