Assistant CoreAssistant Core
Devices

MQTT Protocol

How ESP32 devices communicate with Assistant Core through MQTT Gateway.

Overview

ESP32 devices use MQTT Gateway to keep a realtime connection with Assistant Core. The gateway sits between devices and the platform, allowing devices to send status, start voice sessions, stream audio, and receive commands from the assistant.

At a high level, the connection has three parts:

  • MQTT: the control channel for status, events, and commands.
  • UDP audio: the low-latency audio channel for voice sessions.
  • Internal connection to Assistant Core: the gateway forwards device sessions into the conversation system.

Connection Architecture

MQTT Gateway architecture connecting ESP32 Device to Assistant Core

The device does not need to talk directly to every backend system. It connects to MQTT Gateway, and the gateway forwards status, audio, and commands to the correct assistant session.

How It Works

1. The device fetches connection settings

On startup, the device checks OTA to receive system time, firmware information, an activation code if it is not connected yet, and the gateway connection address.

2. The device opens the MQTT channel

The device uses MQTT to report online status, send events, and receive commands. This is the main control channel, not the continuous audio channel.

3. A voice session starts

When the user wakes the device or presses the talk button, the device sends a session-start signal. The gateway creates a matching session in Assistant Core.

4. Audio flows over UDP

During the voice session, microphone audio is sent over UDP to reduce latency. TTS audio from the assistant is sent back to the device through the audio channel.

5. Commands flow over MQTT

Commands such as status updates, interface changes, device-tool calls, or session close events are delivered over MQTT for the firmware to handle.

What Uses MQTT?

Data groupExamples
Device statusOnline/offline, ready to talk, in session
Device eventsWake word, button press, session started/ended
Assistant commandsShow text, change state, call a device tool
Session managementOpen voice session, close session, report errors

What Uses UDP Audio?

MQTT is a good fit for small control messages, but not for continuous low-latency audio. Audio is therefore carried on a separate UDP channel:

  • The device sends microphone audio to the gateway.
  • Assistant Core processes the conversation and produces a response.
  • Response audio is sent back to the device for speaker playback.

This keeps voice responses faster and avoids blocking the MQTT control channel.

Common Connection States

StateMeaning
Not connectedThe device has configuration but is not connected to an assistant yet.
OnlineThe device is connected to MQTT Gateway and ready for commands.
In sessionThe device is streaming audio and receiving assistant responses.
OfflineThe device lost power, lost network access, or cannot reach the gateway.

Troubleshooting

  1. If the device is not online, check Wi-Fi, power, and firmware.
  2. If the device shows an activation code but cannot be used, check that it is connected to the right assistant.
  3. If the device is online but voice does not start, check that voice is enabled for the assistant and that the device has fresh OTA configuration.
  4. If audio is delayed or choppy, check network quality and whether a firewall blocks the audio connection.
  5. If device commands do not run, check whether the firmware supports that device tool.

This page explains the protocol architecture for operations and troubleshooting. Full authentication signatures, JSON payloads, and Docker deployment settings belong in internal deployment documentation.

On this page