First of all make sure you are using fixed point operations and your dictionary and language models are small enough.
How do I make it fast?
The default settings are not enough to achieve sub-realtime performance on most tasks. Here are some command-line flags you should experiment with:
This is the dsratio. In most cases -ds 2 gives the best performance, though accuracy suffers a bit. (Frame GMM computation downsampling ratio) Thus lower should be better and higher should be less accurate.
The default value is 4, the fastest value is 2, but accuracy can suffer a bit depending on your acoustic model.
This beam is quite important for performance, however the default setting is pretty narrow already. Run pocketsphinx_batch with no arguments to see what it is.
Likewise here as with
-lpbeam. If you are finding it hard to get enough accuracy, you can widen these beams.
This can be set quite low and still give you reasonable performance - try 5.
Depending on the acoustic and language model this can be very helpful. Try 3000.
If you are using the models provided with PocketSphinx (model/hmm/6k), pass the kdtrees file included with them using this argument. You could also make your own kd-trees with the mk_s3gau and kdtree programs in SphinxTrain.
You can play around with this. Values from 1 to 10 are possible with the default kdtrees file. Higher values use more memory. Values between 5 and 7 seem to work well for some reason.
Tuning this might help too. Try 16 and go up or down to see how speed and accuracy are affected. Too low and both will suffer, too high and it won't make a difference.
Phonetic lookahead is a specific technique which is used to speedup decoding by reducing the amount of computation. Basically everything is decoded with phonetic decoder first and then detailed search is restricted by the results of the fast phonetic search. It's also called “Fast match”. For details and evaluations see the chapter “4.5 Phonetic Fast Match” in Efficient Algorithms for Speech Recognition Mosur K. Ravishankar
pl_window specifies lookahead distance in frames. Typical values are from 0 (don't use lookahead) to 10 (decode 10 frames ahead). Bigger values give faster decoding but reduced accuracy.