Dialogue management


Hermes is a mqtt-based protocol to control a voice assistant.
You publish to specific topics to activate or start a function and subscribe to other topics to receive the results of these functions.
For example to start Kaldi, you have to publish to hermes/asr/startListening and subscribe to hermes/asr/textCaptured to receive the spoken text.


We use Node-Red to create a dialogue manager, which then controls our voice assistant.
Here you can find the flows for all Hermes-based features and the Intent-Switch.

Dialogue manager


We started our dialogue manager with a mqtt-node, which subscribed to hermes/hotword/heimdall/detected.
It will be triggered everytime our wake-word “Heimdall” is detected.
The following function-node then creates a new message with a sessionId, which is generated relative to the current time.
The node toggles off the wake-word detection and starts the ASR-capturing, which in our case is Kaldi, by publishing to hermes/hotword/toggleOff and hermes/asr/startListening.
To receive the captured text, we subscribed to hermes/asr/textCaptured and filter the result by sessionId and text, which we then renamed to input for the nlu request.
After filtering we stopped the ASR-capturing by publishing to hermes/asr/stopListening and send a request to FuzzyWuzzy by publishing to hermes/nlu/query.
We subscribed to hermes/nlu/intentParsed to receive the parsed intent and use two function-nodes to convert the information to a JSON-file for the subsequent use.
The wake-word gets reactivated by publishing to hermes/hotword/toggleOn.
To know if the intent will not be recognized we subscribed to hermes/nlu/intentNotRecognized, which then reactivates the wake-word again.

  "slots": {
    "name": "zimmerlampen",
    "light_state": "ein"
  "sessionId": "2020-11-16T14:17:59.010Z",
  "intent": "ChangeLightState",
  "_msgid": "af195475.366548"

Example output of our dialogue manager.

To see the full documentation of Hermes, click here.



Our intent-handling is just a switch-node, that directs the msg to the following nodes based on the intent:


Audio output

To simplify the usage of our audio-output we decided to create two main audio-output-nodes, one for TTS and one to play .wav-files:



To use TTS, you simply have to publish a message to hermes/tts/say, which contains the following keys in its payload:

  • text: the text, which should be spoken
  • siteId: to identify the source of the message (in our case default)
  • id: to identify, which process started the TTS-output

We decided to switch off the wake-word-detection and to delay our message to prevent detections by Rhasspy itself.
After speaking the text, hermes publishes a message to hermes/tts/sayFinished, which we used to switch on the wake-word-detection and to pass the id to following nodes (Callback).


To play .wav-files, you simply have to publish a message to hermes/audioServer/<siteId>/playBytes/<RequestId>, where siteId is used to identify the source of the message and RequestId could be any string (in our case, we generate one with a function-node).
It will play any .wav-file, which is stored in message.payload (binary-payload).

We decided to switch off the wake-word-detection and to delay our message to prevent detections by rhasspy itself.
After playing the sound, hermes publishes a message to hermes/audioServer/<siteId>/playFinished, which we used to switch on the wake-word-detection.


We wanted to be able to trigger command-requests without a wake-word.
To reach this goal we added some nodes:


The blue link-nodes are the in- and output-nodes of this function.
The link-in-node will trigger the process and the link-out-node will return the intent (Callback).
To see the function of the first function-node and the two mqtt-out-nodes, you should check our dialogue manager.
The second function-node creates a new message and a payload with the value false, which later is used to toggle switches.

We had to extend our dialogue manager by the following nodes:


The two traffic-nodes decides where to pass the intents.
If a wake-word is detected, the intent will be passed to our “Intent-Switch”.
If no wake-word is detected, the intent will be passed to the “Command-Request-Output”.
All new function-nodes were created to control the traffic-nodes.



What’s next?

To get some other useful information about this project, you may want to take a look here.