Project setup & your first agent
Project setup & your first agent
In this chapter, you will install the LiveKit CLI, scaffold the dental receptionist project, understand every line of the generated code, run your first voice AI agent, and speak to it. By the end, you will have a working agent that responds to your voice in real time.
What you will build: A running voice AI agent that you can talk to in the LiveKit Playground.
Install the LiveKit CLI
The LiveKit CLI (lk) is your primary tool for creating, running, and deploying agents. It scaffolds projects, manages authentication, and connects to LiveKit Cloud.
Install the CLI
Choose the install command for your operating system:
macOS (Homebrew):
brew install livekit-cliWindows (winget):
winget install LiveKit.LiveKitCLILinux:
curl -sSL https://get.livekit.io/cli | bashAuthenticate with LiveKit Cloud
Run the auth command to connect your CLI to your LiveKit Cloud account. If you do not have a Cloud account yet, this command will guide you through creating one.
lk cloud authThis opens your browser for authentication and stores the credentials locally. You only need to do this once per machine.
LiveKit Cloud is free to start
LiveKit Cloud includes a generous free tier that is more than enough for development and testing. You do not need a credit card to get started.
Scaffold the project
The lk agent init command creates a complete agent project with all the boilerplate handled for you.
lk agent init dental-receptionist --template voice-agent-pythonThis creates a dental-receptionist directory with the following structure:
dental-receptionist/
agent.py # Your agent's main file — this is where all the logic lives
requirements.txt # Python dependencies (livekit-agents, plugins)
.env.example # Template for environment variables (API keys)
Dockerfile # For deployment to LiveKit Cloud (Chapter 12)Move into the project directory and install dependencies:
cd dental-receptionist
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtCopy the environment template and add your API keys:
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY, DEEPGRAM_API_KEY, etc.You need API keys for model providers
The agent uses external AI services: an LLM (OpenAI), STT (Deepgram), and TTS (Cartesia). Each requires an API key in your .env file. Your LiveKit Cloud credentials were configured automatically when you ran lk cloud auth.
The generated agent code
Open agent.py. Here is the complete generated file:
from livekit.agents import AgentServer, rtc_session, Agent, AgentSession
server = AgentServer()
@server.rtc_session
async def entrypoint(session: AgentSession):
await session.start(
agent=Agent(instructions="You are a helpful voice assistant."),
room=session.room,
)
if __name__ == "__main__":
server.run()This is a complete, working voice AI agent. Every line matters. Let us walk through each part.
AgentServer: the runtime container
AgentServer is the process that manages your agent. It connects to LiveKit Cloud, listens for incoming sessions (when someone joins a room), and dispatches them to your handler function. Think of it as a web server, but instead of handling HTTP requests, it handles real-time voice sessions.
@server.rtc_session: the session handler
The @server.rtc_session decorator registers a function as the handler for new sessions. Every time a user connects to a room that needs an agent, LiveKit calls this function with a fresh AgentSession. This is equivalent to an HTTP route handler — one call per session.
AgentSession: the live connection
The session object represents a single active conversation. It holds the connection to the room, the audio pipeline, the conversation history, and all the state for this specific call. Each caller gets their own AgentSession — sessions are completely isolated from each other.
session.start(): wire everything together
Calling session.start() activates the full voice pipeline. It connects the agent to the room, starts listening to the user's audio, begins STT transcription, feeds transcripts to the LLM, and routes LLM output through TTS back to the user. One method call starts the entire pipeline.
Agent: the brain
The Agent class defines what the agent knows and how it behaves. Right now it only has instructions — the system prompt that shapes the LLM's personality and behavior. In later chapters, you will add tools, event handlers, and more configuration to this class.
room=session.room: the connection target
This tells the agent which LiveKit Room to join. The session.room is the room that was created when the user connected. Your agent joins as a participant in that room — just like a human would join a video call.
server.run(): start listening
This starts the AgentServer process. It connects to LiveKit Cloud and waits for incoming sessions. When you run in dev mode, it also connects to the Playground for testing.
The flow is: server.run() starts listening. A user connects to a room. LiveKit dispatches the session to your entrypoint function. session.start() activates the voice pipeline. The user speaks, and the agent responds. When the user disconnects, the session ends and the function returns.
Run your agent
Start the agent in development mode:
python agent.py devYou should see output indicating the agent is running and connected to LiveKit Cloud. Dev mode does two important things: it auto-reloads when you edit your agent file, and it registers the agent with the LiveKit Playground so you can test it immediately.
Talk to your agent
Open the Playground
Go to cloud.livekit.io and open the Playground. Your running agent should appear automatically because you are in dev mode.
Connect and speak
Click "Connect" in the Playground. Allow microphone access when prompted. You are now in a LiveKit Room with your agent — it is listening.
Say something
Try saying: "Hello, how are you?"
The agent will respond with a friendly greeting. You are hearing the full pipeline in action: your voice was captured by WebRTC, transcribed by STT, processed by the LLM, synthesized by TTS, and delivered back to you over WebRTC — all in under a second.
Make it a dental receptionist
The default agent is generic. Let us give it a personality. Open agent.py and update the instructions:
from livekit.agents import AgentServer, rtc_session, Agent, AgentSession
server = AgentServer()
@server.rtc_session
async def entrypoint(session: AgentSession):
await session.start(
agent=Agent(
instructions=(
"You are a friendly receptionist at Bright Smile Dental clinic. "
"Greet callers warmly and help them with appointment inquiries. "
"Keep your responses short and conversational — one to two sentences at most."
),
),
room=session.room,
)
if __name__ == "__main__":
server.run()Save the file. Because you are running in dev mode, the agent reloads automatically. Go back to the Playground, disconnect, and reconnect to start a fresh session.
Try saying: "Hi, I'd like to book an appointment."
The agent should now respond as a dental receptionist — something like "Welcome to Bright Smile Dental! I'd be happy to help you with an appointment. What day works best for you?"
Try saying: "What are your office hours?"
The agent will answer based on the LLM's general knowledge. It might guess or make something up — that is fine for now. In Chapter 5, you will add tools that let it look up real data instead of guessing.
Try saying: "Can I speak to a human?"
Notice how it handles this. With no specific instructions about transfers, it will do its best. In Chapter 4, you will write detailed instructions that cover edge cases like this.
What just happened: you changed three lines of text in the instructions parameter, and the agent's entire personality shifted. The instructions are the system prompt sent to the LLM on every turn. The LLM uses them to shape every response. The voice, the knowledge, and the conversational style all come from these instructions — you will learn to write great ones in Chapter 4.
Test your knowledge
Question 1 of 2
What does the `lk agent init` command do?
What you have built so far
You now have a running dental receptionist agent with these components:
- AgentServer managing the runtime
- @server.rtc_session dispatching incoming calls to your entrypoint
- AgentSession representing each live conversation
- Agent with custom instructions defining the receptionist persona
- LiveKit Playground as your testing interface
The agent can greet callers and have basic conversations, but it cannot check real data, book appointments, or handle complex workflows. In the next chapter, you will configure the models that power the voice pipeline — choosing specific STT, LLM, and TTS providers — and learn about the session lifecycle that governs every conversation.
Looking ahead
In the next chapter, you will configure the STT, LLM, and TTS models explicitly, set up voice activity detection, and learn about the AgentSession lifecycle. You will swap TTS voices and hear the difference immediately.