Avatar integration
Avatar integration with Tavus
Your study partner can hear and see, but the user is talking to an invisible voice. In this chapter you will give the agent a face by integrating Tavus video avatars that lip-sync to the agent's speech in real time. The result is a visual persona that makes the study experience feel like sitting across from a tutor.
What you'll learn
- How Tavus avatars work within the LiveKit Agents framework
- How to create and configure a persona for your agent
- How lip sync connects TTS audio to avatar video
- How to display the avatar video Track in a frontend application
How Tavus avatars work
Tavus provides real-time video avatars that subscribe to your agent's audio output and generate a lip-synced video stream. The avatar runs as a separate service — your agent code stays the same, and the avatar layer watches the agent's TTS output to produce synchronized facial movements.
Agent speaks via TTS
Your agent generates a text response, and the TTS plugin converts it to audio. This audio is published as a Track in the LiveKit Room, just like any voice agent.
AvatarSession subscribes to audio
The Tavus AvatarSession subscribes to the agent's audio Track. It receives the audio stream in real time.
Tavus generates lip-synced video
Tavus processes the audio and generates a video stream where the avatar's mouth movements, facial expressions, and head gestures match the speech. This video is published as a new Track in the Room.
Frontend displays the avatar
The frontend subscribes to the avatar's video Track and renders it alongside the conversation UI. The user sees a talking face that matches every word.
The avatar does not change your agent logic at all. You write the same Agent class with the same instructions and tools. The AvatarSession is a separate object that sits alongside your AgentSession and handles the visual layer independently. You can add or remove the avatar without touching a single line of agent code.
Creating a Tavus persona
A persona defines the avatar's appearance — face, hairstyle, clothing, background. You create personas through the Tavus dashboard or API and reference them by ID in your agent code.
from livekit.agents import Agent, AgentSession
from livekit.plugins import openai, deepgram, cartesia
from livekit.plugins.tavus import AvatarSession
class StudyPartnerAgent(Agent):
def __init__(self):
super().__init__(
instructions="""You are a friendly study partner. You have a visible avatar
that the user can see. Make eye contact references natural — say things like
'let me look at that' when examining their camera feed. Your visual presence
should make the conversation feel personal and engaging.""",
)
async def entrypoint(ctx):
avatar = AvatarSession(persona_id="your-persona-id")
session = AgentSession(
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=cartesia.TTS(),
)
await session.start(agent=StudyPartnerAgent(), room=ctx.room)
await avatar.start(agent_session=session, room=ctx.room)import { Agent, AgentSession } from "@livekit/agents";
import { OpenAI } from "@livekit/agents-plugin-openai";
import { Deepgram } from "@livekit/agents-plugin-deepgram";
import { Cartesia } from "@livekit/agents-plugin-cartesia";
import { AvatarSession } from "@livekit/agents-plugin-tavus";
class StudyPartnerAgent extends Agent {
constructor() {
super({
instructions: `You are a friendly study partner. You have a visible avatar
that the user can see. Make eye contact references natural — say things like
'let me look at that' when examining their camera feed. Your visual presence
should make the conversation feel personal and engaging.`,
});
}
}
async function entrypoint(ctx) {
const avatar = new AvatarSession({ personaId: "your-persona-id" });
const session = new AgentSession({
stt: new Deepgram.STT(),
llm: new OpenAI.LLM({ model: "gpt-4o" }),
tts: new Cartesia.TTS(),
});
await session.start({ agent: new StudyPartnerAgent(), room: ctx.room });
await avatar.start({ agentSession: session, room: ctx.room });
}Start the avatar after the session
Always call avatar.start() after session.start(). The avatar needs the agent session to be running so it can subscribe to the TTS audio Track. Starting in the wrong order will result in a silent avatar.
Configuring lip sync quality
Tavus offers configuration options that let you trade off between latency and visual fidelity. For a study partner where natural conversation flow matters, you want low latency even if it means slightly less precise lip movements.
avatar = AvatarSession(
persona_id="your-persona-id",
# Reduce latency for more natural conversation flow
max_idle_timeout=300,
)| Setting | Effect | Study partner recommendation |
|---|---|---|
| Low latency mode | Faster lip sync, slightly less precise | Preferred for conversation |
| High fidelity mode | More accurate lip sync, higher latency | Better for presentations |
| Idle timeout | How long the avatar stays active without speech | 300s for study sessions |
Displaying the avatar in the frontend
On the frontend, the avatar appears as a video Track published by the Tavus service participant. You subscribe to this Track and render it like any other video element.
import {
LiveKitRoom,
VideoTrack,
useParticipants,
useTracks,
} from "@livekit/components-react";
import { Track } from "livekit-client";
function AvatarDisplay() {
const tracks = useTracks([Track.Source.Camera]);
// The avatar publishes a camera Track from its own participant
const avatarTrack = tracks.find(
(t) => t.participant.identity.startsWith("tavus-")
);
if (!avatarTrack) return <div className="avatar-placeholder">Connecting...</div>;
return (
<div className="avatar-container">
<VideoTrack trackRef={avatarTrack} />
</div>
);
}
function StudyRoom({ token, serverUrl }) {
return (
<LiveKitRoom token={token} serverUrl={serverUrl} connect>
<AvatarDisplay />
{/* Other UI components */}
</LiveKitRoom>
);
}The Tavus avatar joins the Room as its own participant with an identity that starts with tavus-. You filter for this participant's video Track and render it with the standard VideoTrack component. No special video handling is required — it is just another participant in the Room.
Avatar bandwidth considerations
A Tavus avatar stream consumes approximately 1-2 Mbps of bandwidth. For users on mobile networks, consider providing a toggle to disable the avatar and fall back to voice-only mode. Your agent logic remains identical either way.
Test your knowledge
Question 1 of 2
Why must avatar.start() be called after session.start() rather than before?
What you learned
- Tavus avatars are a separate layer that subscribes to the agent's TTS audio and produces lip-synced video
- A persona defines the avatar's appearance and is referenced by ID in your agent code
- The
AvatarSessionstarts after theAgentSessionand requires no changes to your agent logic - The frontend renders the avatar as a standard video Track from a Tavus participant
Next up
In the next chapter, you will add document processing to the study partner. The agent will read PDFs, extract text from images via OCR, and answer questions grounded in uploaded documents.