Chapter 615m

Text input for the dental receptionist

Text input for the dental receptionist

Not every patient wants to speak out loud. Maybe they are in a quiet waiting room, or they need to spell out a medication name, or their insurance ID is XJ7-449-BKL-2 and saying it out loud would be a nightmare for any speech-to-text model. In this chapter, you will add text input alongside voice so Maya can accept both modalities seamlessly.

Text alongside voiceData channelsMultimodal input

What you'll learn

  • Why multimodal input matters for a dental receptionist
  • How to enable text input in the app configuration
  • How text messages travel from the browser to the agent via data channels
  • How to build a dental-specific text input component
  • How Maya processes text and voice identically

Why multimodal matters for dental

A dental receptionist handles information that spans the spectrum of "easy to say out loud" to "impossible to dictate accurately":

Easy to sayHard to say
"I'd like to book an appointment""My insurance ID is XJ7-449-BKL-2"
"Next Tuesday at 9 AM""My name is Siobhan Bhattacharya"
"I need a cleaning""I take amlodipine besylate, 10mg"

Giving patients a text input ensures nothing gets lost in translation. They speak when it is natural and type when precision matters. Maya handles both the same way — the text goes to the LLM as a user message, and she responds via TTS audio.

What's happening

Multimodal means both voice and text are active simultaneously — this is not a "mode switch." The patient does not toggle between a voice mode and a text mode. They speak into the microphone AND have a text field available at all times. Maya treats a typed message and a spoken utterance identically: both become user messages in the conversation context.

How text input works

Text messages from the frontend reach the agent through data channels — reliable, ordered, low-latency pathways that piggyback on the existing WebRTC connection.

1

Patient types a message

The patient types "My name is Siobhan Bhattacharya" in the text field and presses Enter.

2

Frontend sends via data channel

The LiveKit client SDK sends the text as a data channel message to the room. This uses the same WebRTC connection as audio — no separate HTTP request.

3

Agent receives the text

Maya's agent session receives the data channel message and injects it into the conversation as a user message — just as if the patient had spoken it.

4

Maya responds via TTS

Maya processes the message through the LLM and responds with spoken audio. The response appears in the transcript and plays through the patient's speakers.

Text goes to LLM, response comes as audio

When a patient types, the message goes straight to the LLM (bypassing STT). Maya's response still comes back as TTS audio. The patient reads their own typed message in the transcript and hears Maya's spoken reply. This asymmetry is intentional — the agent is a voice receptionist, and her responses are always spoken.

Building the dental text input

Create a text input component tailored to the dental receptionist's workflow. The input is enabled only when Maya is listening (ready for input) and disabled during other states.

src/components/DentalTextInput.tsxtsx
"use client";

import { useSession } from "@livekit/agents-react";
import { useState, FormEvent } from "react";

export function DentalTextInput() {
const session = useSession();
const [text, setText] = useState("");
const canSend = session.agent.agentState === "listening" && text.trim().length > 0;

const handleSubmit = (e: FormEvent) => {
  e.preventDefault();
  if (!canSend) return;

  session.sendChatMessage(text.trim());
  setText("");
};

return (
  <form onSubmit={handleSubmit} className="flex gap-2">
    <input
      type="text"
      value={text}
      onChange={(e) => setText(e.target.value)}
      placeholder={
        session.agent.agentState === "listening"
          ? "Type a message — spell names, insurance IDs..."
          : "Maya is responding..."
      }
      disabled={session.agent.isFinished}
      className="flex-1 rounded-lg border border-gray-300 px-4 py-2 text-sm
        placeholder:text-gray-400
        focus:border-blue-500 focus:outline-none focus:ring-1 focus:ring-blue-500
        disabled:bg-gray-100 disabled:text-gray-400"
    />
    <button
      type="submit"
      disabled={!canSend}
      className="rounded-lg bg-blue-600 px-4 py-2 text-sm font-medium text-white
        hover:bg-blue-700
        disabled:bg-gray-300 disabled:cursor-not-allowed"
    >
      Send
    </button>
  </form>
);
}
What's happening

The Send button is only active when Maya is in the listening state and the patient has typed something. When Maya is thinking or speaking, the button is disabled and the placeholder changes to "Maya is responding..." — signaling that the patient should wait. This mirrors a natural phone conversation: you do not talk over the receptionist.

Integrating text input into the dental page

Add the text input below the transcript in the conversation view:

src/app/page.tsx (conversation section)tsx
import { DentalTranscript } from "@/components/DentalTranscript";
import { BookingConfirmation } from "@/components/BookingConfirmation";
import { DentalTextInput } from "@/components/DentalTextInput";

{/* Inside the active conversation block */}
{(agentState === "listening" ||
agentState === "thinking" ||
agentState === "speaking") && (
<div className="space-y-4">
  <DentalTranscript />
  <BookingConfirmation />
  <DentalTextInput />
</div>
)}

Dental use cases for text input

Here are the specific scenarios where text input makes the dental receptionist better:

Spelling difficult names

The patient says "My name is..." but STT mangles it. They type the correct spelling instead:

Patient (voice): "I'd like to book under my name." Maya: "Of course! What is your name?" Patient (typed): Siobhan Bhattacharya Maya (spoken): "Got it — Siobhan Bhattacharya. And what time works best?"

Insurance information

Insurance IDs are alphanumeric strings that are error-prone over voice:

Patient (typed): Insurance ID: XJ7-449-BKL-2, Group: DELTA-PPO-5521 Maya (spoken): "Thank you! I've noted your insurance information."

Medication names

Medical terminology is notoriously hard for STT:

Maya: "Are you currently taking any medications we should be aware of?" Patient (typed): Amlodipine besylate 10mg, Metformin 500mg Maya (spoken): "Thank you for listing those. I'll add them to your file."

Guide the patient to type when it matters

You can update Maya's instructions to suggest typing for complex information. Add a line like: "When the caller needs to provide a name, insurance ID, or medication, suggest they type it in the text field for accuracy."

Mixed-mode conversations

Patients naturally mix voice and text in the same conversation. The transcript shows both seamlessly:

Example transcripttext
[voice] Patient: "Hi, I'd like to book a cleaning."
[voice] Maya: "I'd be happy to help! When would you like to come in?"
[voice] Patient: "Next Tuesday, if possible."
[voice] Maya: "I have openings at 9 AM, 11:30, and 2 in the afternoon. Which works best?"
[voice] Patient: "9 AM please."
[voice] Maya: "Great choice! And what is your name?"
[text]  Patient: Siobhan Bhattacharya
[voice] Maya: "Thank you, Siobhan. Let me book that for you."
[voice] Maya: "You're all set — cleaning on Tuesday at 9 AM."

There is no mode switch. No toggle button. The text field is always there, and the microphone is always active. The patient uses whichever is most natural for the information they are conveying.

What you learned

  • Multimodal input means voice and text are active simultaneously — not a mode switch
  • Text messages travel via WebRTC data channels, bypassing STT entirely
  • Maya treats typed and spoken input identically — both become user messages in the conversation
  • The text input is enabled when Maya is listening and disabled during other states
  • Dental-specific use cases include spelling names, insurance IDs, and medication names
  • session.sendChatMessage(text) sends the message through the data channel

Test your knowledge

Question 1 of 2

When a patient types 'Siobhan Bhattacharya' instead of saying it, how does the message reach the LLM?

Next up

The dental receptionist frontend now handles voice, text, and booking confirmations. In the final chapter, you will add RPC calls so the frontend can interact with the agent programmatically — building a date picker that calls check_availability directly and a booking details panel driven by data channels.

Concepts covered
Data channelsMultimodal inputsendChatMessage