Voice
Auto vs Realtime Voice
Compare the two voice modes and choose the right one for your use case.
Comparison
| Feature | Auto Mode | Realtime Mode |
|---|---|---|
| Interaction | Turn-based (speak → wait → response) | Full-duplex, natural conversation |
| Latency | Moderate (processes in stages) | Low (instant speech-to-speech) |
| AI Models | Any LLM (GPT-4o, Gemini, Claude, etc.) | Gemini Live or xAI Grok only |
| Voice Quality | Very good (dedicated TTS) | Excellent (native audio) |
| Tool Support | Full (web search, KB, etc.) | Limited (provider-specific) |
| Plan Required | Free+ (5 min/day) | Plus+ (15 min/day) |
When to Use Auto
- Customer support — Structured Q&A with knowledge base
- Any LLM — Works with GPT-4o, Claude, Gemini, etc.
- Tool usage — Needs web search, image generation, etc.
When to Use Realtime
- Natural conversations — Back-and-forth dialogue
- Low latency — Instant responses feel more natural
- Simple queries — Don't need complex tool integrations
Voice Quota
| Plan | Auto Voice | Realtime Voice |
|---|---|---|
| Free | 5 min/day | ❌ |
| Plus | 30 min/day | 15 min/day |
| Pro | 120 min/day | 60 min/day |
| Enterprise | Unlimited | Unlimited |