Sphinx 3 Aligner for Dummies
Philip Kwok (with significant help from Evandro Gouvea) 6/13/2002
A. Why do we need the Sphinx 3 Aligner?
We want to test whether the acoustic scores calculated by Sphinx 4 are correct.
B. How does the Aligner help?
The Sphinx 3 Aligner tests a set of input audio files against the acoustic models of Sphinx 3. Take isolated-digits as an example. Given an audio file that contains an utterance for “one”. We test this audio file (which is first transformed into a cepstra file) against the Sphinx 3 acoustic models (i.e., the HMMs) for “zero”, “one”, “two”, …, “nine”, each giving an acoustic score. We take the same audio file, and run it against the Sphinx 4 acoustic models for “zero”, “one”, “two”, …, “nine”, each also giving an acoustic score. By assuming that the acoustic scoring code of Sphinx 3 are “correct”, we can tell whether Sphinx 4's acoustic scores are correct by comparing the acoustic scores produced by Sphinx 3.
C. What do you need?
You will need:
1. Acoustic Models
- This is the acoustic models in Sphinx 3 format.
2. Sphinx 3 aligner
- From SourceForge, check out the cmusphinx archive_s3/s3.0 module.
Try to compile the timealign program by typing "make timealign-install" at the top level directory. Resolve the path problems you'll likely run into. The timealign program should be compiled into the "$(TOP)/bin/$(PLATFORM)" directory.
3. Cepstra files
The Aligner, timealign, requires cepstra files (of your input audio files) as input. You can generate these cepstra files from your input audio files using the Sphinx 3 program “wave2feat” (don't let the name fool you, its more appropriately named “wave2cep”). It generates cepstra of length 13.
- To get wave2feat, from SourceForge, checkout the cmusphinx
- At the “src” directory, type “make clean all”. It should successfully
compile the wave2feat program into the appropriate "bin" directory. The wave2feat program is fairly easy to use. Don't forget the "-srate", "-raw" and "-mach_endian" switches for Solaris.
- Normally, we give the generated cepstra files a “.mfc” extension.
- The cepstra files will be created at a directory that you specify.
4. Control File & Transcript File
- You will list all the cepstra files that you want to test in a control
file. Take the isolated-digits case, for example, you will have acoustic models for "oh", "zero", "one", "two", ..., "nine". You want to test each audio file against each of these acoustic models. The following discussion is based on the isolated-digits task using TI46 as the audio input files.
This lists the cepstra files to be tested. The filenames have to be unique (required by the aligner). So, instead of having a control file with 10 reps of each file name, I created links from each file name to “filename_index”, for example:
ln -s 02m1s8t1.wav.mfc 02m1s8t1_1.mfc ln -s 02m1s8t1.wav.mfc 02m1s8t1_2.mfc etc.
and created the corresponding control file, as well as the transcription with the right reference.
The control file has only the relative path (since we already provide the path to the directory containing cepstra using the switch -cepdir).
So, the control file looks like:
02m1s8t1_0.mfc 02m1s8t1_1.mfc 02m1s8t1_2.mfc 02m1s8t1_3.mfc 02m1s8t1_4.mfc …
Notice that in general we don't have to go to the trouble of creating links using indices (since usually the filenames are unique) but for this particular task (with repetitions of filenames) we had to.
format is: <transcription> (filename)
oh (02m1s8t1_o) zero (02m1s8t1_0) one (02m1s8t1_1) two (02m1s8t1_2) … … nine (02m1s8t1_9) oh (02m1s8t1_o) zero (02m1s8t1_0) one (02m1s8t1_1) two (02m1s8t1_2) … … nine (02m1s8t1_9)
she kept your gray suit in greasy wash water all year (st01)
Notice that filename doesn't have the extension.
D. How to Run the Aligner?
Now, you have the acoustic model, the Sphinx 3 timealign program, the control and transcript files, the cepstra files, you're all set to run the aligner!
Use the following script (provided by Bhiksha) to run the timealign program. You will need to set the variables according to your system:
—– start of script —–
./ti46.ctl # Control file with list of files
dictfn$MODEL_DIR/train.dict # Recognition dictionary
$MODEL_DIR/fillerdict # Filler dictionary
lsnfn./ti46.transcript # File of transcripts to align against ctlfn
cepextmfc # Extension of cepstral files
Myexpt # Some name for experiment
./alignments/$outname # Directory where falign output goes
aligndir$resultdir/segmentation # Directory where segmetnation is written
#DIRECTORIES TO BE MADE
#if (! -e $resultdir) mkdir $resultdir
#if (! -e $aligndir) mkdir $aligndir
-logbase 1.0001 \ -mdeffn $mdef \ -senmgaufn .cont. \ -meanfn $ACMODDIR/means \ -varfn $ACMODDIR/variances \ -mixwfn $ACMODDIR/mixture_weights \ -tmatfn $ACMODDIR/transition_matrices \ -beam 1e-80 \ -dictfn $dictfn \ -fdictfn $fdictfn \ -ctlfn $ctlfn \ -cepdir $cepdir \ -cepext $cepext \ -insentfn $lsnfn \ -outsentfn $alignfile \ -wdsegdir $aligndir,CTL \ -agc none \ -cmn current \
—– end of script —–
(1) You MUST manually create the directories $resultdir and $aligndir.
Otherwise, the timealign program will crash.
(2) You should change the variable “outname” to the directory where
you want the output to go. Everything will go under "./alignments" (unless you change it). Then, suppose you name the variable "outname" as "results_01", then the results will go to the "./alignments/results_01" directory.
(3) The “-wdsegdir” switch is added so that the program dumps
segmentation information utterance by utterance.
E. How to Interpret the Results?
Suppose that you output your results to the directory “./alignments/Myexpt”, this directory will contain a bunch of files with the “.wdseg” extension. If you look at a file, it will look like:
mangueira.speech.cs.cmu.edu% more 06f3s4t1_8.mfc.wdseg
SFrm EFrm SegAScr Word 0 18 730020 <sil> 19 61 -6491857 seven 62 -12 -3907457 <sil> 113 -19 693289 <sil>
Total score: -8976005
This is from decoding a cepstra file which has an utterance of “six” against the Sphinx 3 acoustic model for “seven”. It should you the acoustic scores at each word segment of the decoded result. The score that is important is the “Total score”.
What you should do next is to implement something similar in Sphinx 4, and then compare the “Total score” from Sphinx 4 with this “Total score”. This way, you can tell if there is anything wrong with the Sphinx 4 acoustic scoring code.