Differences

This shows you the differences between two versions of the page.

tutorialadapt [2012/08/27 18:58]
admin
tutorialadapt [2013/04/09 19:41] (current)
admin updated mixture_weights section
Line 24: Line 24:
If you are at a Linux command line, you can accomplish this in very nerdy style with the following ''bash'' one-liner from the directory in which you downloaded ''arctic20.txt'': If you are at a Linux command line, you can accomplish this in very nerdy style with the following ''bash'' one-liner from the directory in which you downloaded ''arctic20.txt'':
 +Since we are redirecting the output the /dev/null in the one-liner, you should verify whether you have the ''sox'' package, and if not, install it using this command.
 +<code>
 +sudo apt-get install sox
 +</code>
 +
 +Now, the one-liner is as follows
<code> <code>
for i in `seq 1 20`; do for i in `seq 1 20`; do
Line 32: Line 38:
done < arctic20.txt done < arctic20.txt
</code> </code>
-This will echo each sentence to the screen and start recording immediately.  Hit Control-C to move on to the next sentence. You should see the following files in the current directory afterwards:+This will echo each sentence to the screen and start recording immediately.  Hit Control-C to move on to the next sentence. You should see the following files in the current directory afterwards:
<code> <code>
Line 44: Line 50:
arctic20.txt arctic20.txt
</code> </code>
 +If you've hit Control-C immediately after you finished speaking out a sentence, chances are that your recording might have truncated the last word.
You should verify that these recordings sound okay.  To do this, you can play them back with: You should verify that these recordings sound okay.  To do this, you can play them back with:
Line 88: Line 95:
==== Converting the sendump and mdef files ==== ==== Converting the sendump and mdef files ====
-Some models don't have enough data for adaptation. There is an extra file which you need which was left out of the PocketSphinx distribution in order to save space. You can download the bzip2-compressed version from http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk/pocketsphinx-extra/?view=tar and decompress it. Copy the mixture_weights file from the pocketsphinx-extra/model/hmm/en_US/hub4_wsj_sc_3s_8k.cd_semi_5000 folder to your acoustic model folder.+Some models don't have enough data for adaptation. There is an extra file which you need which was left out of the PocketSphinx distribution in order to save space. You can [[https://sourceforge.net/p/cmusphinx/code/HEAD/tree/trunk/pocketsphinx-extra/model/hmm/en_US/hub4_wsj_sc_3s_8k.cd_semi_5000/mixture_weights?format=raw | download the it from code repository]] from the package called pocketsphinx-extra from the folder pocketsphinx-extra/model/hmm/en_US/hub4_wsj_sc_3s_8k.cd_semi_5000 or checkout it from subversion. Copy the mixture_weights file to your acoustic model folder.
-Sometimes sendump file can be converted back to mixture_weights file. This is only possible for an older sendump files. If you have installed the [[InstallingPythonStuff|SphinxTrain Python modules]], you can use [[http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk/SphinxTrain/python/cmusphinx/sendump.py?revision=10579&view=markup | sendump.py]] to convert the ''sendump'' file from the acoustic model to a ''mixture_weights'' file. For hub4_wsj acoustic model it will not work.+Sometimes sendump file can be converted back to mixture_weights file. This is only possible for an older sendump files. If you have installed the [[InstallingPythonStuff|SphinxTrain Python modules]], you can use ''SphinxTrain/python/cmusphinx/sendump.py'' to convert the ''sendump'' file from the acoustic model to a ''mixture_weights'' file. For hub4_wsj acoustic model it will not work.
You will also need to convert the ''mdef'' file from the acoustic model to the plain text format used by the SphinxTrain tools.  To do this, use the ''pocketsphinx_mdef_convert'' program: You will also need to convert the ''mdef'' file from the acoustic model to the plain text format used by the SphinxTrain tools.  To do this, use the ''pocketsphinx_mdef_convert'' program:
Line 99: Line 106:
==== Accumulating observation counts ==== ==== Accumulating observation counts ====
-The next step in adaptation is to collect statistics from the adaptation data.  This is done using the ''bw'' program from SphinxTrain.  You should be able to find this in a directory called ''bin.i686-pc-linux-gnu'' or ''bin-x86_64-unknown-linux-gnu'' (on Linux) or in ''bin\Debug'' or ''bin\Release'' (on Windows) inside the SphinxTrain directory.  Copy it to the working directory along with the ''map_adapt'' and ''mk_s2sendump'' programs from the same directory.+The next step in adaptation is to collect statistics from the adaptation data.  This is done using the ''bw'' program from SphinxTrain.  You should be able to find ''bw'' tool in a sphinxtrain installation in a folder ''/usr/local/libexec/sphinxtrain'' (or under other prefix on Linux) or in ''bin\Release'' (in sphinxtrain directory on Windows).  Copy it to the working directory along with the ''map_adapt'' and ''mk_s2sendump'' programs.
Now, to collect statistics, run: Now, to collect statistics, run:
Line 149: Line 156:
<code> <code>
-map_adapt \+./map_adapt \
    -meanfn hub4wsj_sc_8k/means \     -meanfn hub4wsj_sc_8k/means \
    -varfn hub4wsj_sc_8k/variances \     -varfn hub4wsj_sc_8k/variances \
Line 182: Line 189:
===== Testing the adaptation ===== ===== Testing the adaptation =====
-After you have done the adaptation, it's critical to test the adaptation quality. To do that you need to setup the database similar to the one used for adaptation:+After you have done the adaptation, it's critical to test the adaptation quality. To do that you need to setup the database similar to the one used for adaptation. To test the adaptation you need to configure the decoding with the required paramters, in particular, you need to have ''<your.lm>''. For more details see  [[tutoriallm]] 
Create fileids file adaptation-test.fileids: Create fileids file adaptation-test.fileids:
 
tutorialadapt.1346093897.txt.gz · Last modified: 2012/08/27 18:58 by admin
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki