Kaldi

Speech to Text

Introduction

Kaldi is a C++ based speech recognition tool. It is licensed under Apache License v2.0 and used for Automatic Speech Recognition (ASR).
After Precise activates the assistant by detecting the wake-word, Kaldi should transform the recorded sounds to text, which is passed on to FuzzyWuzzy to interpret the users intent.

Install Kaldi

Download

Clone the Kaldi repository and move into the folder.

    git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
    cd kaldi

Install

To install Kaldi follow the instructions in the INSTALL file.
It will guide you to the INSTALL files in kaldi/tools/ and kaldi/src/.
After you followed all these instructions, you can move on with this guide.

/kaldi/INSTALL
/kaldi/INSTALL

/kaldi/tools/INSTALL
/kaldi/tools/INSTALL

/kaldi/src/INSTALL
/kaldi/src/INSTALL

Use Kaldi in Rhasspy

Update your Rhasspy profile

You need to have Kaldi installed. Then go to the Webinterface of Rhasspy and open the settings.
Change “Speech to Text” to Kaldi and add the following lines to your profile.

    {
      "speech_to_text": {
        "system": "kaldi",
        "kaldi": {
            "base_dictionary": "kaldi/base_dictionary.txt",
            "compatible": true,
            "custom_words": "kaldi/custom_words.txt",
            "dictionary": "kaldi/dictionary.txt",
            "graph": "kaldi/graph",
            "kaldi_dir": "~/kaldi",
            "language_model": "kaldi/language_model.txt",
            "model_dir": "kaldi/model",
            "unknown_words": "kaldi/unknown_words.txt",
            "language_model_type": "arpa"
        }
      }
    }

Set kaldi_dir to the directory, where you installed Kaldi in our case it’s ~/kaldi.
Rhasspy will automatically delete all default lines e.g. "base_dictionary": "kaldi/base_dictionary.txt".
After a restart of Rhasspy you should be able to use Kaldi as your STT-system.

Test Kaldi

Rhasspy automatically trains Kaldi, to expect your intents in the sentences.ini file. To test it, you can add a line to this file and save it.
Rhasspy will ask to restart. Accept it and you are ready to test.
Look in the terminal and start the recording by saying your wake-word or by using the “Wake Up”-button on the webinterface.
Now say the line you added and wait a few seconds. Then scroll through your terminal until you find the line:

[DEBUG:2020-10-13 07:35:27,316] rhasspyasr_kaldi.transcribe: ['wie spät ist es ', 'wie spät ist es ']

We tried it with the sentence “wie spät ist es”.

Kaldi understands numbers and digits, too. So if there is a 15 or something like that in the sentences.ini file, Kaldi will understand the word fünfzehn (the german word for fifteen).

[DEBUG:2020-10-14 13:20:10,706] rhasspyasr_kaldi.transcribe: ['fünfzehn ', 'fünfzehn ']

Sources

Kaldi-Documentation Rhasspy-Documentation

What’s next?

The spoken text will be passed to FuzzyWuzzy to understand your intent.