Chapter 225m

STT, LLM & TTS provider comparison

STT, LLM, and TTS provider comparison

Now that you understand the pipeline architecture, it is time to evaluate the providers you can plug into each stage. This chapter is your reference guide: for every STT, LLM, and TTS provider supported by LiveKit — whether through LiveKit Inference or the open source plugin ecosystem — you will see the actual code, key configuration options, and how they compare on latency, accuracy, and cost. By the end, you will know exactly which providers to reach for — and how to benchmark them by swapping plugins.

LiveKit InferenceSTT providersLLM providersTTS providersLatency vs quality tradeoffs

What you will learn

  • How LiveKit Inference provides a unified interface to the best voice AI models without managing API keys per provider
  • How every STT provider compares and how to configure each — via Inference or plugins
  • How every LLM provider compares for voice AI, including function calling quality
  • How every TTS provider compares on latency, voice quality, and cost
  • The "swap test" pattern for benchmarking providers head-to-head

LiveKit Inference: the unified model interface

Before diving into individual providers, you should know about LiveKit Inference — a unified model interface included in LiveKit Cloud. It provides access to many of the best STT, LLM, and TTS models from providers like OpenAI, Google, Deepgram, AssemblyAI, Cartesia, ElevenLabs, Rime, Inworld, xAI, and more — without needing to manage separate API keys or plugin installations for each provider.

from livekit.agents import AgentSession, inference

# LiveKit Inference — no per-provider API keys needed
session = AgentSession(
  stt=inference.STT(model="deepgram/nova-3", language="en"),
  llm=inference.LLM(model="openai/gpt-4.1-mini"),
  tts=inference.TTS(model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
)

# Or use string descriptors as a shortcut
session = AgentSession(
  stt="deepgram/nova-3:en",
  llm="openai/gpt-4.1-mini",
  tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
)

LiveKit Inference handles billing, routing, and connection management automatically. Swapping models is as simple as changing the model string. For providers not yet available in Inference, or when you need provider-specific features like custom endpoints or voice cloning, use the open source plugins described below.

When to use Inference vs plugins

Use LiveKit Inference when you want the simplest setup — one billing relationship, no per-provider API keys, and automatic connection management. Use plugins when you need provider-specific features (voice cloning, custom endpoints, Azure compliance), self-hosted models, or providers not yet in Inference.


STT providers

Speech-to-text is the first pipeline stage and sets the ceiling for everything downstream. A slow or inaccurate STT means your LLM works with bad input and your user waits longer.

STT comparison table

ProviderPluginInferenceLatency (first result)Accuracy (WER)StreamingLanguages
Deepgram Nova-3livekit-plugins-deepgramdeepgram/nova-3~100ms~8%Excellent44
Deepgram Fluxlivekit-plugins-deepgramdeepgram/flux-general~80ms~7%ExcellentEnglish
AssemblyAI Universal-3 Prolivekit-plugins-assemblyaiassemblyai/universal-3-pro~150ms~7-9%Good6
ElevenLabs Scribe v2livekit-plugins-elevenlabselevenlabs/scribe-v2-rt~120ms~7%Good190
Cartesia Ink Whisperlivekit-plugins-cartesiacartesia/ink-whisper~150ms~7-8%Good100
OpenAI Whisper / gpt-4o-transcribelivekit-plugins-openai~500-1000ms~5-8%Limited50+
Google Cloud Speechlivekit-plugins-google~200ms~8-10%Good125+
Azure Speechlivekit-plugins-azure~200ms~8-10%Good100+
Speechmaticslivekit-plugins-speechmatics~150ms~7-9%Good50+
Fallivekit-plugins-fal~200ms~7-9%GoodMulti
NVIDIA Parakeetlivekit-plugins-nvidia~150ms~8%GoodEnglish
Groq (Whisper)livekit-plugins-groq~100ms~7%GoodMulti

Word Error Rate (WER)

WER measures the percentage of words transcribed incorrectly. Lower is better. A WER of 8% means roughly 1 in 12 words is wrong. In practice, most errors occur on uncommon words, names, or jargon — common conversational speech transcribes much more accurately.

Deepgram Nova-3 and Flux

The most popular STT choice for real-time voice AI. Deepgram delivers the fastest streaming transcription available, with partial transcripts arriving within 100ms. Nova-3 improves accuracy on accented speech and noisy audio compared to Nova-2. Deepgram Flux is their newest model optimized for conversational AI with even lower latency.

Key strengths: ultra-low streaming latency, accurate endpointing detection, smart formatting (punctuation, capitalization, numbers). Available through both LiveKit Inference and the Deepgram plugin.

# Via LiveKit Inference (recommended)
from livekit.agents import inference
stt = inference.STT(model="deepgram/nova-3", language="en")

# Or via plugin for provider-specific features
from livekit.plugins import deepgram
stt = deepgram.STT(model="nova-3", language="en")

AssemblyAI

Universal-3 Pro Streaming offers competitive real-time accuracy with built-in turn detection support. Available in LiveKit Inference and as a plugin.

Key strengths: good streaming latency, automatic language detection, built-in turn detection, competitive accuracy.

# Via LiveKit Inference
from livekit.agents import inference
stt = inference.STT(model="assemblyai/universal-3-pro-streaming")

# Or via plugin
from livekit.plugins import assemblyai
stt = assemblyai.STT()

ElevenLabs Scribe v2

ElevenLabs' Scribe v2 Realtime supports an impressive 190 languages — the widest language coverage of any STT option in LiveKit Inference. Available via Inference and the ElevenLabs plugin.

Key strengths: widest language coverage (190 languages), good streaming latency, real-time capable.

# Via LiveKit Inference
from livekit.agents import inference
stt = inference.STT(model="elevenlabs/scribe-v2-rt")

# Or via plugin
from livekit.plugins import elevenlabs
stt = elevenlabs.STT()

Cartesia Ink Whisper

Cartesia, known for TTS, also offers the Ink Whisper STT model with 100-language support. Available in LiveKit Inference.

Key strengths: broad language support, good accuracy, available via Inference.

# Via LiveKit Inference
from livekit.agents import inference
stt = inference.STT(model="cartesia/ink-whisper")

# Or via plugin
from livekit.plugins import cartesia
stt = cartesia.STT()

OpenAI Whisper and gpt-4o-transcribe

Exceptionally accurate across diverse audio conditions and accents, but designed primarily for batch transcription. Streaming latency is significantly higher than purpose-built streaming providers, making Whisper a poor fit for real-time voice AI. The newer gpt-4o-transcribe model is better for streaming use cases.

Key strengths: high accuracy across diverse audio, strong multilingual performance, simple API.

from livekit.plugins import openai

stt = openai.STT(model="gpt-4o-transcribe")

Whisper latency

Whisper's 500-1000ms latency for first results makes it unsuitable for real-time conversational voice AI. Use it for offline transcription or post-call analysis. For real-time use, choose Deepgram, AssemblyAI, or ElevenLabs Scribe.

Google Cloud Speech-to-Text

Wide language coverage with 125+ languages. Reliable streaming API with model adaptation for boosting recognition of specific phrases.

Key strengths: wide language coverage, reliable streaming, phrase boosting, speaker diarization.

from livekit.plugins import google

stt = google.STT(model="latest_long", language="en-US")

Azure Speech Services

Enterprise-grade STT with custom model training, compliance certifications (HIPAA, SOC 2, GDPR), and private endpoints. A natural fit for organizations in the Microsoft ecosystem.

Key strengths: custom speech models, enterprise compliance, private endpoints, 100+ languages.

from livekit.plugins import azure

stt = azure.STT(
  speech_key="your-azure-key",
  speech_region="eastus",
  language="en-US",
)

Additional STT plugins

LiveKit also offers plugins for additional STT providers:

PluginProviderKey feature
livekit-plugins-speechmaticsSpeechmaticsHigh accuracy, 50+ languages, custom dictionaries
livekit-plugins-falFalCloud-hosted transcription
livekit-plugins-nvidiaNVIDIA ParakeetOn-premise/GPU inference, diarization
livekit-plugins-groqGroq (Whisper)Ultra-fast Whisper inference on LPU hardware

STT selection guide

PriorityRecommendedWhy
Lowest latencyDeepgram Flux / Nova-3Fastest streaming, purpose-built for real-time
Highest accuracyDeepgram Nova-3 or AssemblyAITop-tier WER on English
Most languagesElevenLabs Scribe v2190 languages via Inference
Enterprise complianceAzure SpeechHIPAA, SOC 2, custom models
Simplest setupLiveKit Inference (any STT)No per-provider API keys
Batch transcriptionOpenAI WhisperBest accuracy for non-realtime

LLM providers

The LLM is the brain of your voice agent. It determines reasoning quality, instruction following, and tool use reliability. Voice AI places unique demands on an LLM: it must stream tokens quickly, follow conversational instructions, call functions reliably, and be concise.

LLM comparison table

LiveKit Inference provides access to LLMs from OpenAI, Google, DeepSeek, Groq, Cerebras, and more. For providers not in Inference (like Anthropic), use the dedicated plugin.

ProviderPluginInferenceTTFTFunction callingContext windowRealtime model
OpenAI GPT-4.1livekit-plugins-openaiopenai/gpt-4.1~200msExcellent1M
OpenAI GPT-4.1 minilivekit-plugins-openaiopenai/gpt-4.1-mini~100msGood1M
OpenAI GPT-4olivekit-plugins-openaiopenai/gpt-4o~200msExcellent128KYes
Anthropic Claudelivekit-plugins-anthropic~200msGood200K
Google Gemini 2.5 Flashlivekit-plugins-googlegoogle/gemini-2.5-flash~150msGood1M
Google Gemini 2.5 Prolivekit-plugins-googlegoogle/gemini-2.5-pro~200msGood1M
DeepSeek V3via OpenAI plugindeepseek/deepseek-v3~150msGood128K
xAI Groklivekit-plugins-xai~150msGood128KYes
Groq (GPT OSS 120B)livekit-plugins-groqgroq/gpt-oss-120b~50msModerate128K
Cerebrasvia OpenAI plugincerebras/gpt-oss-120b~60msModerate128K
Qwen3 235Bvia OpenAI pluginqwen/qwen3-235b-a22b~150msGood128K

Pricing changes frequently

LLM pricing is a moving target. These costs are approximate as of early 2026. The relative ordering tends to be stable even as absolute prices drop. Always check the provider's current pricing page or the LiveKit Inference pricing page.

OpenAI (GPT-4.1, GPT-4o, and more)

OpenAI models are the most broadly used for voice AI. GPT-4.1 is the latest generation with a 1M token context window and improved function calling. GPT-4.1 mini and nano variants offer lower cost for simpler agents. All are available via LiveKit Inference and the OpenAI plugin.

# Via LiveKit Inference (recommended)
from livekit.agents import inference
llm = inference.LLM(model="openai/gpt-4.1-mini")

# Or via plugin
from livekit.plugins import openai
llm = openai.LLM(model="gpt-4.1-mini")

Key strengths: best function calling reliability, fast streaming, realtime model available, broad ecosystem.

Anthropic Claude

Claude Sonnet excels at following complex, nuanced instructions and producing natural conversational responses. If your agent handles sensitive conversations (healthcare, legal, financial) or needs a very specific personality, Claude is an excellent choice. Use the Anthropic plugin directly.

from livekit.plugins import anthropic

llm = anthropic.LLM(model="claude-sonnet-4-20250514")

Key strengths: excellent instruction following, strong reasoning, natural conversational style, 200K context, strong safety properties.

Claude for sensitive domains

If your voice agent operates in healthcare, legal, or financial services, Claude's safety properties and instruction following make it a particularly strong fit.

Google Gemini

Gemini models are fast and extremely cost-effective with up to 1M token context windows. Gemini 2.5 Flash is ideal for high-volume deployments. Google also offers Gemini Live for speech-to-speech interactions. Available through both LiveKit Inference and the Google plugin.

# Via LiveKit Inference
from livekit.agents import inference
llm = inference.LLM(model="google/gemini-2.5-flash")

# Or via plugin
from livekit.plugins import google
llm = google.LLM(model="gemini-2.5-flash")

Key strengths: very low cost, 1M token context, multimodal (text, images, audio, video), realtime model available.

xAI Grok

xAI's Grok models are available through the dedicated xAI plugin, with support for LLM, TTS, and realtime speech-to-speech.

from livekit.plugins import xai

llm = xai.LLM(model="grok-3")

Key strengths: strong reasoning, realtime model available, TTS support in the same plugin.

DeepSeek

DeepSeek V3 is a high-quality open model available through LiveKit Inference (via Baseten and DeepSeek). Competitive reasoning at lower cost.

# Via LiveKit Inference
from livekit.agents import inference
llm = inference.LLM(model="deepseek/deepseek-v3")

Key strengths: strong reasoning, competitive pricing, available via Inference.

Groq and Cerebras

Both Groq and Cerebras offer ultra-fast inference on custom hardware, achieving extremely low time-to-first-token. Best for use cases where raw speed matters most. Both are available through LiveKit Inference for models like GPT OSS 120B.

# Via LiveKit Inference
from livekit.agents import inference
llm = inference.LLM(model="groq/gpt-oss-120b")

# Or via Groq plugin
from livekit.plugins import groq
llm = groq.LLM(model="llama-3.3-70b-versatile")

Key strengths: fastest TTFT available (~50ms), competitive pricing, open source model support.

OpenAI-compatible providers and self-hosting

Many providers (Together AI, Fireworks, vLLM, etc.) expose an OpenAI-compatible API. LiveKit's OpenAI plugin supports custom base_url for any of these. For strict data residency or very high volume, you can self-host models via vLLM or TGI.

from livekit.plugins import openai

# Any OpenAI-compatible provider
llm = openai.LLM(
  model="meta-llama/Llama-3-70b-chat-hf",
  base_url="https://api.together.xyz/v1",
  api_key="your-together-key",
)

# Self-hosted via vLLM
llm = openai.LLM(
  model="meta-llama/Llama-3-70b-chat",
  base_url="http://your-vllm-server:8000/v1",
  api_key="not-needed",
)

Self-hosting is not free

GPU hosting costs $1-3 per hour per A100. For most teams, API-based models or LiveKit Inference are cheaper until you reach thousands of concurrent conversations. Do the math before committing.

Function calling comparison

For agentic voice AI, function calling reliability is the most critical LLM capability. A failure leaves the user in awkward silence with no way to retry.

ProviderReliabilityParallel callsStructured output
OpenAI GPT-4.1 / GPT-4oExcellentYesYes
OpenAI GPT-4.1 miniGoodYesYes
Claude SonnetGoodYesYes
Gemini 2.5 Flash / ProGoodYesYes
xAI GrokGoodYesYes
DeepSeek V3GoodYesYes
Groq (open models)ModerateLimitedLimited

LLM selection guide

PriorityRecommendedWhy
Best all-aroundOpenAI GPT-4.1Excellent at everything, best function calling
Lowest costGemini 2.5 Flash or GPT-4.1 miniFraction of the cost, still very capable
Best reasoningClaude SonnetHandles complex logic and nuanced instructions
Fastest TTFTGroq or CerebrasCustom hardware for ultra-fast inference
Simplest setupLiveKit Inference (any LLM)No per-provider API keys
Data privacySelf-hosted via OpenAI pluginNo data leaves your infrastructure
Largest contextGemini 2.5 Flash or GPT-4.11M tokens

TTS providers

Text-to-speech is the final stage and the one users notice most. The voice is the direct experience — a robotic voice undermines even the best AI reasoning. Two metrics matter most: time-to-first-byte (TTFB) for latency and Mean Opinion Score (MOS) for voice naturalness.

TTS comparison table

LiveKit Inference provides access to TTS from Cartesia, ElevenLabs, Deepgram, Inworld, Rime, and xAI. Additional providers are available through plugins.

ProviderPluginInferenceTTFBQuality (MOS)Voice varietyLanguages
Cartesia Sonic 3livekit-plugins-cartesiacartesia/sonic-3~80ms~4.220+ preset40+
ElevenLabslivekit-plugins-elevenlabselevenlabs/*~150-250ms~4.41000+ + cloning30+
Deepgram Aura 2livekit-plugins-deepgramdeepgram/aura-2~100ms~4.0Multiple10+
Inworldlivekit-plugins-inworldinworld/inworld-tts-1.5-max~120ms~4.1Multiple15
Rime Arcanalivekit-plugins-rimerime/arcana~100ms~4.1Multiple9
xAI TTSlivekit-plugins-xaixai/tts-1~150ms~4.1Multiple20+
OpenAI TTSlivekit-plugins-openai~200ms~4.26 presetMulti
Google Cloud TTSlivekit-plugins-google~150ms~4.0200+ neural60+
Azure Neural TTSlivekit-plugins-azure~150ms~4.0400+ neural100+
PlayHTlivekit-plugins-playht~150-200ms~4.1600+ + cloningMulti
NVIDIAlivekit-plugins-nvidia~100ms~4.0MultipleMulti
Speechmaticslivekit-plugins-speechmatics~150ms~4.0MultipleMulti
Groq (PlayAI)livekit-plugins-groq~100ms~4.1Multiple3
What's happening

MOS scores are averages from human evaluations and vary by voice, language, and content. Differences between top providers are often subtle. Test your specific use case with 2-3 providers before committing — what sounds best for customer service may not be ideal for an entertainment character.

Cartesia Sonic

Built specifically for real-time voice AI. Ultra-low latency at ~80ms TTFB — the fastest of any major provider. Sonic 3 is the latest model with 40+ language support. Available through both LiveKit Inference and the plugin.

# Via LiveKit Inference (recommended)
from livekit.agents import inference
tts = inference.TTS(model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc")

# Or via plugin
from livekit.plugins import cartesia
tts = cartesia.TTS(model="sonic-3", voice="warm-professional")

Key strengths: lowest latency (80ms TTFB), emotion control, speed adjustment, consistent quality, 40+ languages.

ElevenLabs

Arguably the most natural-sounding voices available. Industry-leading voice cloning and the largest voice library. Multiple model tiers available in Inference — from eleven_flash_v2_5 for speed to eleven_multilingual_v2 for quality.

# Via LiveKit Inference
from livekit.agents import inference
tts = inference.TTS(model="elevenlabs/eleven_flash_v2_5", voice="your-voice-id")

# Or via plugin for voice cloning and advanced features
from livekit.plugins import elevenlabs
tts = elevenlabs.TTS(model="eleven_turbo_v2", voice_id="your-voice-id")

Key strengths: highest voice quality, voice cloning, 1000+ voices, fine-grained style control (stability, similarity, clarity).

ElevenLabs model tiers

Use eleven_flash_v2_5 for the lowest latency or eleven_turbo_v2_5 for a balance of speed and multilingual quality. Use eleven_multilingual_v2 only when you need maximum quality and can tolerate higher latency.

Deepgram Aura

Deepgram's TTS offering with low latency and clean voice quality. Aura 2 supports 10+ languages. Available through Inference and the plugin.

# Via LiveKit Inference
from livekit.agents import inference
tts = inference.TTS(model="deepgram/aura-2", voice="asteria")

# Or via plugin
from livekit.plugins import deepgram
tts = deepgram.TTS(model="aura-2", voice="asteria")

Key strengths: low latency, clean voice quality, simple integration, competitive pricing.

Inworld

Inworld provides TTS models optimized for gaming and interactive characters. Multiple tiers available in Inference — from inworld-tts-1 to inworld-tts-1.5-max.

# Via LiveKit Inference
from livekit.agents import inference
tts = inference.TTS(model="inworld/inworld-tts-1.5-max", voice="your-voice-id")

# Or via plugin
from livekit.plugins import inworld
tts = inworld.TTS(voice="your-voice-id")

Key strengths: character-oriented voices, 15 languages, gaming-optimized, available via Inference.

Rime

Rime offers the Arcana and Mist TTS models with competitive latency. Available through Inference and the plugin.

# Via LiveKit Inference
from livekit.agents import inference
tts = inference.TTS(model="rime/arcana", voice="kai")

# Or via plugin
from livekit.plugins import rime
tts = rime.TTS(model="arcana", speaker="kai")

Key strengths: good latency, competitive pricing, 9 languages, available via Inference.

xAI TTS

xAI provides text-to-speech alongside their Grok LLM and realtime model. Available through Inference and the xAI plugin.

# Via LiveKit Inference
from livekit.agents import inference
tts = inference.TTS(model="xai/tts-1", voice="Ash")

# Or via plugin
from livekit.plugins import xai
tts = xai.TTS(voice="Ash")

Key strengths: 20+ languages, integrated with Grok LLM ecosystem, available via Inference.

OpenAI TTS

Good quality with the simplest API. The newer gpt-4o-mini-tts model provides improved quality. Limited voice options but natural-sounding.

from livekit.plugins import openai

tts = openai.TTS(model="gpt-4o-mini-tts", voice="alloy")

Key strengths: simplest integration, good quality defaults, improving model lineup.

Additional TTS plugins

PluginProviderKey feature
livekit-plugins-googleGoogle Cloud TTS200+ neural voices, SSML, 60+ languages
livekit-plugins-azureAzure Neural TTS400+ voices, HIPAA/SOC 2, custom voice training
livekit-plugins-playhtPlayHTVoice cloning, emotion control, competitive pricing
livekit-plugins-nvidiaNVIDIAOn-premise/GPU TTS inference
livekit-plugins-speechmaticsSpeechmaticsStreaming TTS
livekit-plugins-groqGroq (PlayAI)Ultra-fast TTS on LPU hardware

TTS selection guide

PriorityRecommendedWhy
Lowest latencyCartesia Sonic 380ms TTFB, built for real-time
Highest qualityElevenLabsMost natural voices, best cloning
Best valueDeepgram Aura 2 or RimeGood quality at competitive price
Most languagesAzure Neural TTS or Cartesia Sonic 3100+ / 40+ languages
Simplest setupLiveKit Inference (any TTS)No per-provider API keys
Custom brand voiceElevenLabs (via plugin)Best voice cloning technology
Gaming / charactersInworldCharacter-optimized voices

The swap test: benchmarking providers

LiveKit's plugin architecture makes it trivial to benchmark providers head-to-head. The pattern is simple: keep two components fixed and swap the third, then compare latency and quality.

1

Establish a baseline

Pick a starting stack — for example, Deepgram + GPT-4o + Cartesia. Run 20-30 test conversations and measure end-to-end latency, response quality, and subjective voice experience.

2

Swap one component

Change only one plugin — for example, swap Cartesia for ElevenLabs. Run the same test conversations and compare. The only variable is the component you swapped.

3

Record and compare

Track time-to-first-audio, voice quality subjective ratings, and cost per conversation. After testing each alternative, you have a clear comparison table for your specific use case.

from livekit.agents import AgentSession, inference

# Baseline: Cartesia TTS via Inference
baseline_session = AgentSession(
  stt=inference.STT(model="deepgram/nova-3", language="en"),
  llm=inference.LLM(model="openai/gpt-4.1-mini"),
  tts=inference.TTS(model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
)

# Swap test: ElevenLabs TTS (only this line changes)
test_session = AgentSession(
  stt=inference.STT(model="deepgram/nova-3", language="en"),
  llm=inference.LLM(model="openai/gpt-4.1-mini"),
  tts=inference.TTS(model="elevenlabs/eleven_flash_v2_5", voice="test-voice"),
)

Test with real conversations

Scripted inputs miss the edge cases that reveal real-world issues — accented speech, background noise, interruptions, and unexpected questions. Always benchmark with real or realistic conversations, not canned prompts.

Test your knowledge

Question 1 of 4

Why is OpenAI Whisper generally a poor choice for real-time voice AI despite its excellent accuracy?

What you learned

  • LiveKit Inference provides a unified interface to the best voice AI models — no per-provider API keys needed for models from OpenAI, Google, Deepgram, AssemblyAI, Cartesia, ElevenLabs, Rime, Inworld, xAI, DeepSeek, Groq, Cerebras, and more
  • LiveKit's open source plugin ecosystem covers 15+ providers — including Anthropic, Azure, Google Cloud, Speechmatics, NVIDIA, Fal, and PlayHT — for when you need provider-specific features or self-hosted models
  • Deepgram Nova-3/Flux is the top STT choice for real-time voice AI; ElevenLabs Scribe leads for language coverage
  • OpenAI GPT-4.1 offers the best all-around LLM performance; Gemini 2.5 Flash and GPT-4.1 mini are best for cost; Groq and Cerebras are fastest
  • Cartesia Sonic 3 has the lowest TTS latency; ElevenLabs has the highest voice quality; Deepgram Aura, Rime, and Inworld provide solid alternatives
  • The swap test pattern lets you benchmark any provider by changing one model string (Inference) or one plugin while keeping everything else fixed

Next up

With all providers mapped, the final chapter brings everything together: cost modeling for real conversations, optimization strategies, and ready-made stack recommendations for common use cases.

Concepts covered
STT providersLLM providersTTS providersLatency vs quality tradeoffs