The structure of acoustic model is simple:
The acoustic model is usually a folder with the following contents. Some of the files could be missing:
feat.params– feature extraction parameters, a list of options used to configure feature extractoion.
mdef– the definition of mapping between the triphone contexts to GMM ids (senones)
means– gaussian codebook means
variances– gaussian codebook variances
mixture_weights– mixtures for gaussians (could be missing if sendump is present)
sendump– compressed and quantized mixtures (could replace mixture_weights)
feature_transform– feature transformation matrix
noisedict– the dictionary for filler words
transition_matrices– HMM transition matrices
Binary files usually consist of the header which points to the number of streams and data dimensions and then the raw float data. Last value is usually a checksum. They arrays are stored sequentially, float by float.
The arrays like means, mixture weights or variances usually have multiple dimensions. For means it is feature stream id, then for example gaussian id, then the vector of means. For mixture weights first index is stream id, then senone id, then mixture weights for each gaussian for the senone. Sendump file contains quantized and processed mixture weights. One can use printp tool from sphinxtrain to convert this binary representation to text readable format.
Mdef file is a text file listing the mapping from the triphone context to state senone ids.