Performance Optimization for Sphinx 4

Performance optimization is broken down into two stages. The first stage is described here:

Accuracy Goal 18.845% WER or better for F0 of hub4
Speed Goal 10X Real time on Mangueira

We've looked at a number of techniques to improve the speed of the recognizer.

Other things that we could do (but haven't yet):

  • Second Pass - Rescore the lattice
  • Fast Match - a fast way of constraining the search.

There is also a S3 vs S4 Performance comparision page.

Charts

This chart shows summarizes the effect of these various optimizations on the speed and accuracy of hub4 quick (which is 7 short utterances from hub4).

Test Beam Word Beam Smear Grow Skipping Avg Beam Size WER Speed (RT)
Baseline 30,000/1E-80 2000/1E-40 0.0 0 57,000 12.12 35
Word Pruning 3,000/1E-80 20/1E-40 0.0 0 42,000 8.33 28
Tighter(1) 3,000/1E-60 20/1E-35 0.0 0 27,500 11.364 19
Unigram smear 30,000/1E-60 20/1E-35 0.5 0 24,500 9.10 17
Unigram smear (full weight) 30,000/1E-60 20/1E-35 1.0 0 20,000 12.12 14
Tighter(2) 30,000/1E-60 20/1E-32 1.0 0 20,000 12.12 14
Tighter(3) 30,000/1E-60 20/1E-30 1.0 0 19,700 12.12 13.5
Tighter(4) 30,000/1E-60 17/1E-30 1.0 0 19000 12.12 12.9
Grow Skipping 30,000/1E-60 17/1E-30 1.0 6 16,000 9.10 9.0
2-Thread scoring 30,000/1E-60 17/1E-30 1.0 6 16,000 9.10 8.25
Tighter(5) (too tight) 30,000/1E-60 15/1E-30 1.0 6 15,000 11.354 8.12
HMM State Reuse (Code change) 30,000/1E-60 17/1E-30 1.0 6 16,000 9.10 8.25
Tighter(6) -1/1E-60 16/1E-27 1.0 6 16,000 9.10 8.53
Profiling disabled -1/1E-60 16/1E-27 1.0 6 16,000 9.10 7.67
Tighter(7) -1/1E-55 17/1E-30 1.0 6 12,000 9.85 6.23
Tighter(8) -1/1E-50 17/1E-30 1.0 6 8,500 12.9 4.57

Test notes:

  • Tests run with 7 short sentences of hub4. Note that the WER here is not representative of the WER for the entire hub4 test. The full hub4 test is generally around 6% higher than this).
  • Runtimes are for glottis.east
  • HMM State Reuse - reused self-looping hmm states to reduce object creation

Full Tests

Based on the tuning experiments we will now try a select set of full runs to see where we stand in terms of our speed and accuracy goals.

Test Beam Word Beam Smear Grow Skipping Avg Beam Size WER Speed (RT) Notes
hub4 f0 full glottis -1/1E-60 16/1E-27 1.0 6 21,741 19.425 13.68 Unrestricted absolute beam causes slower times . Beam too tight to get acceptable accuracy<li> Run concurrent with regression tests, so reported RT is higher than actual
hub4 f0 full glottis 30,000/1E-60 20/1E-30 1.0 6 21,211 18.834 9.94 Accuracy is equivalent to 3.3. Runtime 10X on glottis
hub4 f0 full glottis 30,000/1E-60 20/1E-35 1.0 6 21,386 18.834 13.49 Looser beam but no improvement in accuracy, probably the word absolute beam is dominating. Runtime affected by regression tests running concurrently GC Overhead is about 2.5%
hub4 f0 full glottis 30,000/1E-60 25/1E-35 1.0 6 22,781 18.622 13.47 Slight improvement in accuracy, Runtime affected by regression tests running concurrently

Quick Tests

The full test consists of about 51 minutes of audio. Thus a 10XRealtime run will take 510 minutes or 8.5 hours. The quick tests are a quicker version that run in about 1 3/4 hours.

Test Beam Word Beam Smear Grow Skipping Avg Beam Size WER Speed (RT) Notes
hub4 f0 full quick glottis (43 utterances) 30,000/1E-60 20/1E-30 1.0 6 21,076 19.644 10.05 Accuracy is equivalent to 3.3. Runtime 10X on glottis
hub4 f0 full quick glottis (43 utterances) 30,000/1E-60 20/1E-30 1.0 6 11,184 20.956 5.01 Using acoustic lookahead

Full tests

This is the full f0 test (215 utterances). Tests are on glottis unless otherwise noted.

Test Beam Word Beam Smear Grow Skipping LookAhead Avg Beam Size WER Speed (RT) Notes
hub4 f0 full 30,000/1E-60 20/1E-30 1.0 0 2 8,824 19.503 6.44 Other tests running simultaneously
hub4 f0 full 30,000/1E-60 20/1E-30 1.0 0 1.5 11,698 18.756 6.44 Other tests running simultaneously
hub4 f0 full 30,000/1E-60 20/1E-30 0.7 0 1.5 12,854 18.733 8.81 Other tests running simultaneously
hub4 f0 full -1/1E-60 20/1E-30 0.7 0 1.5 13,920 18.689 7.87 Other tests running simultaneously
hub4 f0 full -1/1E-60 20/1E-30 0.7 0 1.7 12,017 18.622 6.69 Slightly altered algorithm that gets previous hmm, even spanning across units and words
hub4 f0 full -1/1E-60 20/1E-30 0.7 0 2.0 10,196 18.918 5.50
hub4 f0 full 30,000/1E-60 15/1E-30 1 0 1.7 10,426 19.068 5.78
hub4 f0 full 30,000/1E-60 22/1E-30 1 0 2.3 7729 20.372 5.46 Smaller footprint linguist
hub4 f0 full 30,000/1E-60 22/1E-30 1 01.7 10,168 18.611 8.25 Small heap (500mb) gc time: 16.5%
hub4 f0 full 30,000/1E-60 22/1E-30 1 01.7 10,168 18.611 8.25 After profiling/optimization
hub4 f0 full 30,000/1E-60 22/1E-30 1 03 6,319 20.506 3.2 Going for speed
hub4 f0 full mangueira 30,000/1E-60 22/1E-30 1 03 6,319 20.573 4.3 Going for speed
 
sphinx4/largevocabularyperformanceoptimization.txt · Last modified: 2010/07/31 19:20 (external edit)
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki