Voice
Auto vs Realtime Voice
Compare Auto Mode (VAD → STT → LLM → TTS) and Realtime Mode (Gemini Live, xAI Grok) — choose the right voice mode.
Auto Mode
Loading diagram…
Realtime Mode
Loading diagram…
Comparison
| Feature | Auto Mode | Realtime Mode |
|---|---|---|
| Interaction | Turn-based (speak → wait → response) | Full-duplex, natural conversation |
| Latency | Moderate (processes in stages) | Low (instant speech-to-speech) |
| AI Models | Any LLM (GPT-4o, Gemini, Claude, etc.) | OpenAI / Gemini Live / xAI Grok |
| Voice Quality | Very good (dedicated TTS) | Excellent (native audio) |
| Tool Support | Full (web search, KB, etc.) | Limited (provider-specific) |
| Text Input | LLM → TTS pipeline | Sent directly to realtime session |
| Plan Required | Free+ (5 min/day) | Plus+ (15 min/day) |
When to Use Auto
- Customer support — Structured Q&A with knowledge base
- Any LLM — Works with GPT-4o, Claude, Gemini, etc.
- Tool usage — Needs web search, image generation, etc.
When to Use Realtime
- Natural conversations — Back-and-forth dialogue
- Low latency — Instant responses feel more natural
- Simple queries — Don't need complex tool integrations
Voice Quota
| Plan | Auto Voice | Realtime Voice |
|---|---|---|
| Free | 5 min/day | ❌ |
| Plus | 30 min/day | 15 min/day |
| Pro | 120 min/day | 60 min/day |
| Enterprise | Unlimited | Unlimited |