User Tools

Site Tools


sphinx4:sphinxthreealigner

<file> Sphinx 3 Aligner for Dummies

Philip Kwok (with significant help from Evandro Gouvea) 6/13/2002

A. Why do we need the Sphinx 3 Aligner?


We want to test whether the acoustic scores calculated by Sphinx 4 are correct.

B. How does the Aligner help?


The Sphinx 3 Aligner tests a set of input audio files against the acoustic models of Sphinx 3. Take isolated-digits as an example. Given an audio file that contains an utterance for “one”. We test this audio file (which is first transformed into a cepstra file) against the Sphinx 3 acoustic models (i.e., the HMMs) for “zero”, “one”, “two”, …, “nine”, each giving an acoustic score. We take the same audio file, and run it against the Sphinx 4 acoustic models for “zero”, “one”, “two”, …, “nine”, each also giving an acoustic score. By assuming that the acoustic scoring code of Sphinx 3 are “correct”, we can tell whether Sphinx 4's acoustic scores are correct by comparing the acoustic scores produced by Sphinx 3.

C. What do you need?


You will need:

1. Acoustic Models


- This is the acoustic models in Sphinx 3 format.

2. Sphinx 3 aligner


- From SourceForge, check out the cmusphinx archive_s3/s3.0 module.

Try to compile the timealign program by typing "make timealign-install"
at the top level directory. Resolve the path problems you'll
likely run into. The timealign program should be compiled into the
"$(TOP)/bin/$(PLATFORM)" directory.

3. Cepstra files


The Aligner, timealign, requires cepstra files (of your input audio files) as input. You can generate these cepstra files from your input audio files using the Sphinx 3 program “wave2feat” (don't let the name fool you, its more appropriately named “wave2cep”). It generates cepstra of length 13.

- To get wave2feat, from SourceForge, checkout the cmusphinx

"SphinxTrain" module.

- At the “src” directory, type “make clean all”. It should successfully

compile the wave2feat program into the appropriate "bin" directory.
The wave2feat program is fairly easy to use. Don't forget the
"-srate", "-raw" and "-mach_endian" switches for Solaris.

- Normally, we give the generated cepstra files a “.mfc” extension.

- The cepstra files will be created at a directory that you specify.

4. Control File & Transcript File


- You will list all the cepstra files that you want to test in a control

file. Take the isolated-digits case, for example, you will have
acoustic models for "oh", "zero", "one", "two", ..., "nine". You want to 
test each audio file against each of these acoustic models. The following
discussion is based on the isolated-digits task using TI46 as the
audio input files.

Control File


This lists the cepstra files to be tested. The filenames have to be unique (required by the aligner). So, instead of having a control file with 10 reps of each file name, I created links from each file name to “filename_index”, for example:

ln -s 02m1s8t1.wav.mfc 02m1s8t1_1.mfc ln -s 02m1s8t1.wav.mfc 02m1s8t1_2.mfc etc.

and created the corresponding control file, as well as the transcription with the right reference.

The control file has only the relative path (since we already provide the path to the directory containing cepstra using the switch -cepdir).

So, the control file looks like:

02m1s8t1_0.mfc 02m1s8t1_1.mfc 02m1s8t1_2.mfc 02m1s8t1_3.mfc 02m1s8t1_4.mfc …

Notice that in general we don't have to go to the trouble of creating links using indices (since usually the filenames are unique) but for this particular task (with repetitions of filenames) we had to.

Transcript File


format is: <transcription> (filename)

Example:

oh (02m1s8t1_o) zero (02m1s8t1_0) one (02m1s8t1_1) two (02m1s8t1_2) … … nine (02m1s8t1_9) oh (02m1s8t1_o) zero (02m1s8t1_0) one (02m1s8t1_1) two (02m1s8t1_2) … … nine (02m1s8t1_9)

or:

she kept your gray suit in greasy wash water all year (st01)

Notice that filename doesn't have the extension.

D. How to Run the Aligner?


Now, you have the acoustic model, the Sphinx 3 timealign program, the control and transcript files, the cepstra files, you're all set to run the aligner!

Use the following script (provided by Bhiksha) to run the timealign program. You will need to set the variables according to your system:

—– start of script —–

#!/bin/sh

CMUSPHINX_DIR../../../.. MODEL_DIR/lab/speech/sphinx4/data/connected_digits

ctlfn./ti46.ctl # Control file with list of files dictfn$MODEL_DIR/train.dict # Recognition dictionary fdictfn$MODEL_DIR/fillerdict # Filler dictionary lsnfn./ti46.transcript # File of transcripts to align against ctlfn

cepdir/usr0/ppk96/cvs/cmusphinx/SphinxTrain/bin.sparc-sun-solaris2.8/cepstra cepextmfc # Extension of cepstral files outnameMyexpt # Some name for experiment ceplen13

resultdir./alignments/$outname # Directory where falign output goes aligndir$resultdir/segmentation # Directory where segmetnation is written alignfile$aligndir/output.txt #DIRECTORIES TO BE MADE #if (! -e $resultdir) mkdir $resultdir #if (! -e $aligndir) mkdir $aligndir #+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ACMODDIR$MODEL_DIR/wdclean.cd_continuous_8gau mdef$MODEL_DIR/wdclean.500.mdef PGM./timealign

$PGM \

	  -logbase 1.0001 \
	  -mdeffn $mdef \
	  -senmgaufn .cont. \
	  -meanfn $ACMODDIR/means \
	  -varfn $ACMODDIR/variances \
	  -mixwfn $ACMODDIR/mixture_weights \
	  -tmatfn $ACMODDIR/transition_matrices \
	  -beam 1e-80 \
	  -dictfn $dictfn \
	  -fdictfn $fdictfn \
	  -ctlfn $ctlfn \
	  -cepdir $cepdir \
	  -cepext $cepext \
	  -insentfn  $lsnfn \
	  -outsentfn $alignfile \
	  -wdsegdir $aligndir,CTL \
	  -agc none \
	  -cmn current \

>& $logfile

exit 0

—– end of script —–

IMPORTANT:

(1) You MUST manually create the directories $resultdir and $aligndir.

 Otherwise, the timealign program will crash.

(2) You should change the variable “outname” to the directory where

 you want the output to go. Everything will go under
 "./alignments" (unless you change it). Then, suppose you name
 the variable "outname" as "results_01", then the results will
 go to the "./alignments/results_01" directory.

(3) The “-wdsegdir” switch is added so that the program dumps

 segmentation information utterance by utterance.

E. How to Interpret the Results?


Suppose that you output your results to the directory “./alignments/Myexpt”, this directory will contain a bunch of files with the “.wdseg” extension. If you look at a file, it will look like:

mangueira.speech.cs.cmu.edu% more 06f3s4t1_8.mfc.wdseg

		SFrm  EFrm	 SegAScr Word
			0	 18	  730020 <sil>
		  19	 61	-6491857 seven
		  62   -12	-3907457 <sil>
		 113   -19	  693289 <sil>

Total score: -8976005

This is from decoding a cepstra file which has an utterance of “six” against the Sphinx 3 acoustic model for “seven”. It should you the acoustic scores at each word segment of the decoded result. The score that is important is the “Total score”.

What you should do next is to implement something similar in Sphinx 4, and then compare the “Total score” from Sphinx 4 with this “Total score”. This way, you can tell if there is anything wrong with the Sphinx 4 acoustic scoring code.

  1. – End of Document —

<file>

sphinx4/sphinxthreealigner.txt · Last modified: 2010/08/03 21:48 (external edit)