Project setup & your first agent

In this chapter, you will install the LiveKit CLI, scaffold the dental receptionist project, understand every line of the generated code, run your first voice AI agent, and speak to it. By the end, you will have a working agent that responds to your voice in real time.

What you will build: A running voice AI agent that you can talk to in the LiveKit Playground.

LiveKit CLIlk agent initAgentServer@rtc_sessionAgentSessionAgentdev mode

Install the LiveKit CLI

The LiveKit CLI (lk) is your primary tool for creating, running, and deploying agents. It scaffolds projects, manages authentication, and connects to LiveKit Cloud.

Install the CLI

Choose the install command for your operating system:

macOS (Homebrew):

terminalbash

brew install livekit-cli

Windows (winget):

terminalbash

winget install LiveKit.LiveKitCLI

Linux:

terminalbash

curl -sSL https://get.livekit.io/cli | bash

Authenticate with LiveKit Cloud

Run the auth command to connect your CLI to your LiveKit Cloud account. If you do not have a Cloud account yet, this command will guide you through creating one.

terminalbash

lk cloud auth

This opens your browser for authentication and stores the credentials locally. You only need to do this once per machine.

LiveKit Cloud is free to start

LiveKit Cloud includes a generous free tier that is more than enough for development and testing. You do not need a credit card to get started.

Scaffold the project

The lk agent init command creates a complete agent project with all the boilerplate handled for you.

terminal

lk agent init dental-receptionist --template voice-agent-python

This creates a dental-receptionist directory with the following structure:

dental-receptionist/

dental-receptionist/
agent.py          # Your agent's main file — this is where all the logic lives
requirements.txt  # Python dependencies (livekit-agents, plugins)
.env.example      # Template for environment variables (API keys)
Dockerfile        # For deployment to LiveKit Cloud (Chapter 12)

Move into the project directory and install dependencies:

terminal

cd dental-receptionist
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Copy the environment template and add your API keys:

terminalbash

cp .env.example .env
# Edit .env and add your OPENAI_API_KEY, DEEPGRAM_API_KEY, etc.

You need API keys for model providers

The agent uses external AI services: an LLM (OpenAI), STT (Deepgram), and TTS (Cartesia). Each requires an API key in your .env file. Your LiveKit Cloud credentials were configured automatically when you ran lk cloud auth.

The generated agent code

Open agent.py. Here is the complete generated file:

agent.py

from livekit.agents import AgentServer, rtc_session, Agent, AgentSession

server = AgentServer()


@server.rtc_session
async def entrypoint(session: AgentSession):
  await session.start(
      agent=Agent(instructions="You are a helpful voice assistant."),
      room=session.room,
  )


if __name__ == "__main__":
  server.run()

This is a complete, working voice AI agent. Every line matters. Let us walk through each part.

AgentServer: the runtime container

AgentServer is the process that manages your agent. It connects to LiveKit Cloud, listens for incoming sessions (when someone joins a room), and dispatches them to your handler function. Think of it as a web server, but instead of handling HTTP requests, it handles real-time voice sessions.

@server.rtc_session: the session handler

The @server.rtc_session decorator registers a function as the handler for new sessions. Every time a user connects to a room that needs an agent, LiveKit calls this function with a fresh AgentSession. This is equivalent to an HTTP route handler — one call per session.

AgentSession: the live connection

The session object represents a single active conversation. It holds the connection to the room, the audio pipeline, the conversation history, and all the state for this specific call. Each caller gets their own AgentSession — sessions are completely isolated from each other.

session.start(): wire everything together

Calling session.start() activates the full voice pipeline. It connects the agent to the room, starts listening to the user's audio, begins STT transcription, feeds transcripts to the LLM, and routes LLM output through TTS back to the user. One method call starts the entire pipeline.

Agent: the brain

The Agent class defines what the agent knows and how it behaves. Right now it only has instructions — the system prompt that shapes the LLM's personality and behavior. In later chapters, you will add tools, event handlers, and more configuration to this class.

room=session.room: the connection target

This tells the agent which LiveKit Room to join. The session.room is the room that was created when the user connected. Your agent joins as a participant in that room — just like a human would join a video call.

server.run(): start listening

This starts the AgentServer process. It connects to LiveKit Cloud and waits for incoming sessions. When you run in dev mode, it also connects to the Playground for testing.

What's happening

The flow is: server.run() starts listening. A user connects to a room. LiveKit dispatches the session to your entrypoint function. session.start() activates the voice pipeline. The user speaks, and the agent responds. When the user disconnects, the session ends and the function returns.

Run your agent

Start the agent in development mode:

terminal

python agent.py dev

You should see output indicating the agent is running and connected to LiveKit Cloud. Dev mode does two important things: it auto-reloads when you edit your agent file, and it registers the agent with the LiveKit Playground so you can test it immediately.

Talk to your agent

Open the Playground

Go to cloud.livekit.io and open the Playground. Your running agent should appear automatically because you are in dev mode.

Connect and speak

Click "Connect" in the Playground. Allow microphone access when prompted. You are now in a LiveKit Room with your agent — it is listening.

Say something

Try saying: "Hello, how are you?"

The agent will respond with a friendly greeting. You are hearing the full pipeline in action: your voice was captured by WebRTC, transcribed by STT, processed by the LLM, synthesized by TTS, and delivered back to you over WebRTC — all in under a second.

Make it a dental receptionist

The default agent is generic. Let us give it a personality. Open agent.py and update the instructions:

agent.py

from livekit.agents import AgentServer, rtc_session, Agent, AgentSession

server = AgentServer()


@server.rtc_session
async def entrypoint(session: AgentSession):
  await session.start(
      agent=Agent(
          instructions=(
              "You are a friendly receptionist at Bright Smile Dental clinic. "
              "Greet callers warmly and help them with appointment inquiries. "
              "Keep your responses short and conversational — one to two sentences at most."
          ),
      ),
      room=session.room,
  )


if __name__ == "__main__":
  server.run()

Save the file. Because you are running in dev mode, the agent reloads automatically. Go back to the Playground, disconnect, and reconnect to start a fresh session.

Try saying: "Hi, I'd like to book an appointment."

The agent should now respond as a dental receptionist — something like "Welcome to Bright Smile Dental! I'd be happy to help you with an appointment. What day works best for you?"

Try saying: "What are your office hours?"

The agent will answer based on the LLM's general knowledge. It might guess or make something up — that is fine for now. In Chapter 5, you will add tools that let it look up real data instead of guessing.

Try saying: "Can I speak to a human?"

Notice how it handles this. With no specific instructions about transfers, it will do its best. In Chapter 4, you will write detailed instructions that cover edge cases like this.

What's happening

What just happened: you changed three lines of text in the instructions parameter, and the agent's entire personality shifted. The instructions are the system prompt sent to the LLM on every turn. The LLM uses them to shape every response. The voice, the knowledge, and the conversational style all come from these instructions — you will learn to write great ones in Chapter 4.

Test your knowledge

Question 1 of 2

What does the `lk agent init` command do?

What you have built so far

You now have a running dental receptionist agent with these components:

AgentServer managing the runtime
@server.rtc_session dispatching incoming calls to your entrypoint
AgentSession representing each live conversation
Agent with custom instructions defining the receptionist persona
LiveKit Playground as your testing interface

The agent can greet callers and have basic conversations, but it cannot check real data, book appointments, or handle complex workflows. In the next chapter, you will configure the models that power the voice pipeline — choosing specific STT, LLM, and TTS providers — and learn about the session lifecycle that governs every conversation.

Looking ahead

In the next chapter, you will configure the STT, LLM, and TTS models explicitly, set up voice activity detection, and learn about the AgentSession lifecycle. You will swap TTS voices and hear the difference immediately.