Physical AI: Voice on Hardware

Connect microcontrollers and edge devices to LiveKit

Deploy voice AI agents on physical hardware. Connect ESP32 microcontrollers to LiveKit Cloud over WebRTC, handle audio streaming with Opus, implement wake word detection, and control hardware through agent tools.

What You Build

ESP32 voice device that connects to LiveKit Cloud, streams audio, responds to wake words, and controls physical hardware via agent tools.

Prerequisites

->Course 1.1

Chapters

Embedded voice architecture

25m

How an ESP32 connects to LiveKit Cloud: hardware setup (INMP441 mic, MAX98357A speaker), I2S audio configuration, Opus codec on constrained devices, and the architecture that makes it work.

ESP32-S3I2S audioWebRTC on microcontrollersEmbedded constraints

Wake word detection & audio streaming

25m

Implement always-on wake word detection with Porcupine or ESP-SR. Transition from low-power listening to full audio streaming when activated. Manage power for battery-operated devices.

PorcupineESP-SRPower managementLow-power listening

Hardware control via agent tools

25m

Use LiveKit data channels for bidirectional hardware control. Register agent tools that control LEDs, relays, and servos. Send sensor data from the device to the agent.

Data channelsBidirectional controlAgent tools for hardwareSensor data

Production: OTA updates & fleet management

20m

Ship embedded voice devices to production: OTA firmware updates with secure boot, graceful offline fallback with local command recognition, health telemetry, and fleet management with staged rollouts.

OTA firmwareOffline fallbackHealth monitoringFleet management

What You Walk Away With

Ability to connect embedded devices to LiveKit Cloud for voice AI, with wake word detection, hardware control, and offline fallback.