Avatar integration with Tavus

Your study partner can hear and see, but the user is talking to an invisible voice. In this chapter you will give the agent a face by integrating Tavus video avatars that lip-sync to the agent's speech in real time. The result is a visual persona that makes the study experience feel like sitting across from a tutor.

TavusAvatarLip syncPersona

What you'll learn

How Tavus avatars work within the LiveKit Agents framework
How to create and configure a persona for your agent
How lip sync connects TTS audio to avatar video
How to display the avatar video Track in a frontend application

How Tavus avatars work

Tavus provides real-time video avatars that subscribe to your agent's audio output and generate a lip-synced video stream. The avatar runs as a separate service — your agent code stays the same, and the avatar layer watches the agent's TTS output to produce synchronized facial movements.

Agent speaks via TTS

Your agent generates a text response, and the TTS plugin converts it to audio. This audio is published as a Track in the LiveKit Room, just like any voice agent.

AvatarSession subscribes to audio

The Tavus AvatarSession subscribes to the agent's audio Track. It receives the audio stream in real time.

Tavus generates lip-synced video

Tavus processes the audio and generates a video stream where the avatar's mouth movements, facial expressions, and head gestures match the speech. This video is published as a new Track in the Room.

Frontend displays the avatar

The frontend subscribes to the avatar's video Track and renders it alongside the conversation UI. The user sees a talking face that matches every word.

What's happening

The avatar does not change your agent logic at all. You write the same Agent class with the same instructions and tools. The AvatarSession is a separate object that sits alongside your AgentSession and handles the visual layer independently. You can add or remove the avatar without touching a single line of agent code.

Creating a Tavus persona

A persona defines the avatar's appearance — face, hairstyle, clothing, background. You create personas through the Tavus dashboard or API and reference them by ID in your agent code.

example.pypython

from livekit.agents import Agent, AgentSession
from livekit.plugins import openai, deepgram, cartesia
from livekit.plugins.tavus import AvatarSession

class StudyPartnerAgent(Agent):
  def __init__(self):
      super().__init__(
          instructions="""You are a friendly study partner. You have a visible avatar
          that the user can see. Make eye contact references natural — say things like
          'let me look at that' when examining their camera feed. Your visual presence
          should make the conversation feel personal and engaging.""",
      )

async def entrypoint(ctx):
  avatar = AvatarSession(persona_id="your-persona-id")

  session = AgentSession(
      stt=deepgram.STT(),
      llm=openai.LLM(model="gpt-4o"),
      tts=cartesia.TTS(),
  )

  await session.start(agent=StudyPartnerAgent(), room=ctx.room)
  await avatar.start(agent_session=session, room=ctx.room)

example.tstypescript

import { Agent, AgentSession } from "@livekit/agents";
import { OpenAI } from "@livekit/agents-plugin-openai";
import { Deepgram } from "@livekit/agents-plugin-deepgram";
import { Cartesia } from "@livekit/agents-plugin-cartesia";
import { AvatarSession } from "@livekit/agents-plugin-tavus";

class StudyPartnerAgent extends Agent {
constructor() {
  super({
    instructions: `You are a friendly study partner. You have a visible avatar
      that the user can see. Make eye contact references natural — say things like
      'let me look at that' when examining their camera feed. Your visual presence
      should make the conversation feel personal and engaging.`,
  });
}
}

async function entrypoint(ctx) {
const avatar = new AvatarSession({ personaId: "your-persona-id" });

const session = new AgentSession({
  stt: new Deepgram.STT(),
  llm: new OpenAI.LLM({ model: "gpt-4o" }),
  tts: new Cartesia.TTS(),
});

await session.start({ agent: new StudyPartnerAgent(), room: ctx.room });
await avatar.start({ agentSession: session, room: ctx.room });
}

Start the avatar after the session

Always call avatar.start() after session.start(). The avatar needs the agent session to be running so it can subscribe to the TTS audio Track. Starting in the wrong order will result in a silent avatar.

Configuring lip sync quality

Tavus offers configuration options that let you trade off between latency and visual fidelity. For a study partner where natural conversation flow matters, you want low latency even if it means slightly less precise lip movements.

example.pypython

avatar = AvatarSession(
  persona_id="your-persona-id",
  # Reduce latency for more natural conversation flow
  max_idle_timeout=300,
)

Setting	Effect	Study partner recommendation
Low latency mode	Faster lip sync, slightly less precise	Preferred for conversation
High fidelity mode	More accurate lip sync, higher latency	Better for presentations
Idle timeout	How long the avatar stays active without speech	300s for study sessions

Displaying the avatar in the frontend

On the frontend, the avatar appears as a video Track published by the Tavus service participant. You subscribe to this Track and render it like any other video element.

example.tstypescript

import {
LiveKitRoom,
VideoTrack,
useParticipants,
useTracks,
} from "@livekit/components-react";
import { Track } from "livekit-client";

function AvatarDisplay() {
const tracks = useTracks([Track.Source.Camera]);

// The avatar publishes a camera Track from its own participant
const avatarTrack = tracks.find(
  (t) => t.participant.identity.startsWith("tavus-")
);

if (!avatarTrack) return <div className="avatar-placeholder">Connecting...</div>;

return (
  <div className="avatar-container">
    <VideoTrack trackRef={avatarTrack} />
  </div>
);
}

function StudyRoom({ token, serverUrl }) {
return (
  <LiveKitRoom token={token} serverUrl={serverUrl} connect>
    <AvatarDisplay />
    {/* Other UI components */}
  </LiveKitRoom>
);
}

What's happening

The Tavus avatar joins the Room as its own participant with an identity that starts with tavus-. You filter for this participant's video Track and render it with the standard VideoTrack component. No special video handling is required — it is just another participant in the Room.

Avatar bandwidth considerations

A Tavus avatar stream consumes approximately 1-2 Mbps of bandwidth. For users on mobile networks, consider providing a toggle to disable the avatar and fall back to voice-only mode. Your agent logic remains identical either way.

Test your knowledge

Question 1 of 2

Why must avatar.start() be called after session.start() rather than before?

What you learned

Tavus avatars are a separate layer that subscribes to the agent's TTS audio and produces lip-synced video
A persona defines the avatar's appearance and is referenced by ID in your agent code
The AvatarSession starts after the AgentSession and requires no changes to your agent logic
The frontend renders the avatar as a standard video Track from a Tavus participant

Next up

In the next chapter, you will add document processing to the study partner. The agent will read PDFs, extract text from images via OCR, and answer questions grounded in uploaded documents.