First of all make sure you are using fixed point operations and your dictionary and language models are small enough.

How do I make it fast?

The default settings are not enough to achieve sub-realtime performance on most tasks. Here are some command-line flags you should experiment with:

-ds

This is the dsratio. In most cases -ds 2 gives the best performance, though accuracy suffers a bit. (Frame GMM computation downsampling ratio) Thus lower should be better and higher should be less accurate.

-topn

The default value is 4, the fastest value is 2, but accuracy can suffer a bit depending on your acoustic model.

-lpbeam

This beam is quite important for performance, however the default setting is pretty narrow already. Run pocketsphinx_batch with no arguments to see what it is.

-lponlybeam

Likewise here as with -lpbeam. If you are finding it hard to get enough accuracy, you can widen these beams.

-maxwpf

This can be set quite low and still give you reasonable performance - try 5.

-maxhmmpf

Depending on the acoustic and language model this can be very helpful. Try 3000.

-kdtreefn

If you are using the models provided with PocketSphinx (model/hmm/6k), pass the kdtrees file included with them using this argument. You could also make your own kd-trees with the mk_s3gau and kdtree programs in SphinxTrain.

-kdmaxdepth

You can play around with this. Values from 1 to 10 are possible with the default kdtrees file. Higher values use more memory. Values between 5 and 7 seem to work well for some reason.

-kdmaxbbi

Tuning this might help too. Try 16 and go up or down to see how speed and accuracy are affected. Too low and both will suffer, too high and it won't make a difference.

-pl_window

Phonetic lookahead is a specific technique which is used to speedup decoding by reducing the amount of computation. Basically everything is decoded with phonetic decoder first and then detailed search is restricted by the results of the fast phonetic search. It's also called “Fast match”. For details and evaluations see the chapter “4.5 Phonetic Fast Match” in Efficient Algorithms for Speech Recognition Mosur K. Ravishankar

pl_window specifies lookahead distance in frames. Typical values are from 0 (don't use lookahead) to 10 (decode 10 frames ahead). Bigger values give faster decoding but reduced accuracy.

 
pocketsphinxhandhelds.txt · Last modified: 2013/03/24 19:23 by twobob
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki