Assistant CoreAssistant Core
Voice

Auto vs Realtime Voice

Compare Auto Mode (VAD → STT → LLM → TTS) and Realtime Mode (Gemini Live, xAI Grok) — choose the right voice mode.

Auto Mode

Loading diagram…

Realtime Mode

Loading diagram…

Comparison

FeatureAuto ModeRealtime Mode
InteractionTurn-based (speak → wait → response)Full-duplex, natural conversation
LatencyModerate (processes in stages)Low (instant speech-to-speech)
AI ModelsAny LLM (GPT-4o, Gemini, Claude, etc.)OpenAI / Gemini Live / xAI Grok
Voice QualityVery good (dedicated TTS)Excellent (native audio)
Tool SupportFull (web search, KB, etc.)Limited (provider-specific)
Text InputLLM → TTS pipelineSent directly to realtime session
Plan RequiredFree+ (5 min/day)Plus+ (15 min/day)

When to Use Auto

  • Customer support — Structured Q&A with knowledge base
  • Any LLM — Works with GPT-4o, Claude, Gemini, etc.
  • Tool usage — Needs web search, image generation, etc.

When to Use Realtime

  • Natural conversations — Back-and-forth dialogue
  • Low latency — Instant responses feel more natural
  • Simple queries — Don't need complex tool integrations

Voice Quota

PlanAuto VoiceRealtime Voice
Free5 min/day
Plus30 min/day15 min/day
Pro120 min/day60 min/day
EnterpriseUnlimitedUnlimited

On this page