CMUSphinx Tutorial For Developers

Introduction

This tutorial is going to describe some applications of the CMUSphinx toolkit. Such applications could include voice control of mobile, desktop or automotive applications, language learning, speech transcription, closed captioning, speech translation, or voice search. While all of these applications are possible with CMUSphinx, modern toolkits such as Kaldi, Coqui, NeMo, Wav2vec2, Whisper and whisper.cpp, etc, etc, will perform much, much better on larger vocabulary tasks.

The tutorial is intended for developers who need to apply speech technology in their applications, not for speech recognition researchers. If you are a researcher, it’s recommended to start with a textbook on speech technologies. Spoken Language Processing by Acero, Huang and others is a good choice for that.

The structure of this tutorial is the following: