Realtime Models vs Pipelines

Choose the right architecture

Compare realtime speech-to-speech models with traditional STT->LLM->TTS pipelines. Understand tradeoffs, use cases, and hybrid approaches.

Side-by-side comparison app with both pipeline and realtime implementations of the same agent, with latency and quality metrics.

Chapters

Pipeline (STT->LLM->TTS) vs realtime (speech-to-speech) — architecture, latency, and capability tradeoffs.

Build a pipeline agent with optimized model selection and latency tuning.

Implement a realtime agent using OpenAI's Realtime API with GPT-4o.

Implement a realtime agent using Google's Gemini Live for multimodal conversations.

Use a realtime model for speech comprehension with a dedicated TTS for voice output — the half-cascade pattern.

Build hybrid architectures that switch between pipeline and realtime based on context.

Measure and compare latency, quality, and cost across pipeline and realtime approaches.

A decision framework for choosing between pipeline, realtime, and hybrid architectures.

Clear understanding of when to use pipeline vs realtime models, how to implement both, and how to build hybrid architectures.