|
Open Source Toolkit For Speech Recognition Project by Carnegie Mellon University |
Performance optimization is broken down into two stages. The first stage is described here:
| Accuracy Goal | 18.845% WER or better for F0 of hub4 |
| Speed Goal | 10X Real time on Mangueira |
We've looked at a number of techniques to improve the speed of the recognizer.
Other things that we could do (but haven't yet):
There is also a S3 vs S4 Performance comparision page.
This chart shows summarizes the effect of these various optimizations on the speed and accuracy of hub4 quick (which is 7 short utterances from hub4).
| Test | Beam | Word Beam | Smear | Grow Skipping | Avg Beam Size | WER | Speed (RT) |
| Baseline | 30,000/1E-80 | 2000/1E-40 | 0.0 | 0 | 57,000 | 12.12 | 35 |
| Word Pruning | 3,000/1E-80 | 20/1E-40 | 0.0 | 0 | 42,000 | 8.33 | 28 |
| Tighter(1) | 3,000/1E-60 | 20/1E-35 | 0.0 | 0 | 27,500 | 11.364 | 19 |
| Unigram smear | 30,000/1E-60 | 20/1E-35 | 0.5 | 0 | 24,500 | 9.10 | 17 |
| Unigram smear (full weight) | 30,000/1E-60 | 20/1E-35 | 1.0 | 0 | 20,000 | 12.12 | 14 |
| Tighter(2) | 30,000/1E-60 | 20/1E-32 | 1.0 | 0 | 20,000 | 12.12 | 14 |
| Tighter(3) | 30,000/1E-60 | 20/1E-30 | 1.0 | 0 | 19,700 | 12.12 | 13.5 |
| Tighter(4) | 30,000/1E-60 | 17/1E-30 | 1.0 | 0 | 19000 | 12.12 | 12.9 |
| Grow Skipping | 30,000/1E-60 | 17/1E-30 | 1.0 | 6 | 16,000 | 9.10 | 9.0 |
| 2-Thread scoring | 30,000/1E-60 | 17/1E-30 | 1.0 | 6 | 16,000 | 9.10 | 8.25 |
| Tighter(5) (too tight) | 30,000/1E-60 | 15/1E-30 | 1.0 | 6 | 15,000 | 11.354 | 8.12 |
| HMM State Reuse (Code change) | 30,000/1E-60 | 17/1E-30 | 1.0 | 6 | 16,000 | 9.10 | 8.25 |
| Tighter(6) | -1/1E-60 | 16/1E-27 | 1.0 | 6 | 16,000 | 9.10 | 8.53 |
| Profiling disabled | -1/1E-60 | 16/1E-27 | 1.0 | 6 | 16,000 | 9.10 | 7.67 |
| Tighter(7) | -1/1E-55 | 17/1E-30 | 1.0 | 6 | 12,000 | 9.85 | 6.23 |
| Tighter(8) | -1/1E-50 | 17/1E-30 | 1.0 | 6 | 8,500 | 12.9 | 4.57 |
Test notes:
Based on the tuning experiments we will now try a select set of full runs to see where we stand in terms of our speed and accuracy goals.
| Test | Beam | Word Beam | Smear | Grow Skipping | Avg Beam Size | WER | Speed (RT) | Notes |
| hub4 f0 full glottis | -1/1E-60 | 16/1E-27 | 1.0 | 6 | 21,741 | 19.425 | 13.68 | Unrestricted absolute beam causes slower times . Beam too tight to get acceptable accuracy<li> Run concurrent with regression tests, so reported RT is higher than actual |
| hub4 f0 full glottis | 30,000/1E-60 | 20/1E-30 | 1.0 | 6 | 21,211 | 18.834 | 9.94 | Accuracy is equivalent to 3.3. Runtime 10X on glottis |
| hub4 f0 full glottis | 30,000/1E-60 | 20/1E-35 | 1.0 | 6 | 21,386 | 18.834 | 13.49 | Looser beam but no improvement in accuracy, probably the word absolute beam is dominating. Runtime affected by regression tests running concurrently GC Overhead is about 2.5% |
| hub4 f0 full glottis | 30,000/1E-60 | 25/1E-35 | 1.0 | 6 | 22,781 | 18.622 | 13.47 | Slight improvement in accuracy, Runtime affected by regression tests running concurrently |
The full test consists of about 51 minutes of audio. Thus a 10XRealtime run will take 510 minutes or 8.5 hours. The quick tests are a quicker version that run in about 1 3/4 hours.
| Test | Beam | Word Beam | Smear | Grow Skipping | Avg Beam Size | WER | Speed (RT) | Notes |
| hub4 f0 full quick glottis (43 utterances) | 30,000/1E-60 | 20/1E-30 | 1.0 | 6 | 21,076 | 19.644 | 10.05 | Accuracy is equivalent to 3.3. Runtime 10X on glottis |
| hub4 f0 full quick glottis (43 utterances) | 30,000/1E-60 | 20/1E-30 | 1.0 | 6 | 11,184 | 20.956 | 5.01 | Using acoustic lookahead |
This is the full f0 test (215 utterances). Tests are on glottis unless otherwise noted.
| Test | Beam | Word Beam | Smear | Grow Skipping | LookAhead | Avg Beam Size | WER | Speed (RT) | Notes |
| hub4 f0 full | 30,000/1E-60 | 20/1E-30 | 1.0 | 0 | 2 | 8,824 | 19.503 | 6.44 | Other tests running simultaneously |
| hub4 f0 full | 30,000/1E-60 | 20/1E-30 | 1.0 | 0 | 1.5 | 11,698 | 18.756 | 6.44 | Other tests running simultaneously |
| hub4 f0 full | 30,000/1E-60 | 20/1E-30 | 0.7 | 0 | 1.5 | 12,854 | 18.733 | 8.81 | Other tests running simultaneously |
| hub4 f0 full | -1/1E-60 | 20/1E-30 | 0.7 | 0 | 1.5 | 13,920 | 18.689 | 7.87 | Other tests running simultaneously |
| hub4 f0 full | -1/1E-60 | 20/1E-30 | 0.7 | 0 | 1.7 | 12,017 | 18.622 | 6.69 | Slightly altered algorithm that gets previous hmm, even spanning across units and words |
| hub4 f0 full | -1/1E-60 | 20/1E-30 | 0.7 | 0 | 2.0 | 10,196 | 18.918 | 5.50 | |
| hub4 f0 full | 30,000/1E-60 | 15/1E-30 | 1 | 0 | 1.7 | 10,426 | 19.068 | 5.78 | |
| hub4 f0 full | 30,000/1E-60 | 22/1E-30 | 1 | 0 | 2.3 | 7729 | 20.372 | 5.46 | Smaller footprint linguist |
| hub4 f0 full | 30,000/1E-60 | 22/1E-30 | 1 | 0 | 1.7 | 10,168 | 18.611 | 8.25 | Small heap (500mb) gc time: 16.5% |
| hub4 f0 full | 30,000/1E-60 | 22/1E-30 | 1 | 0 | 1.7 | 10,168 | 18.611 | 8.25 | After profiling/optimization |
| hub4 f0 full | 30,000/1E-60 | 22/1E-30 | 1 | 0 | 3 | 6,319 | 20.506 | 3.2 | Going for speed |
| hub4 f0 full mangueira | 30,000/1E-60 | 22/1E-30 | 1 | 0 | 3 | 6,319 | 20.573 | 4.3 | Going for speed |