edu.cmu.sphinx.linguist.language.ngram.large
Class LargeTrigramModel

java.lang.Object
  extended by edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel
All Implemented Interfaces:
LanguageModel, Configurable

public class LargeTrigramModel
extends java.lang.Object
implements LanguageModel

Queries a binary language model file generated by the CMU-Cambridge Statistical Language Modeling Toolkit.

Note that all probabilities in the grammar are stored in LogMath log base format. Language Probabilities in the language model file are stored in log 10 base. They are converted to the LogMath logbase.


Field Summary
static int BYTES_PER_BIGRAM
          The number of bytes per bigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.
static int BYTES_PER_TRIGRAM
          The number of bytes per trigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.
static java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
          A property that controls whether or not the language model will apply the language weight and word insertion probability
static java.lang.String PROP_BIGRAM_CACHE_SIZE
          A property that defines the maximum number of bigrams to be cached.
static java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
          A property that controls whether the bigram and trigram caches are cleared after every utterance
static java.lang.String PROP_FULL_SMEAR
          If true, use full bigram information to determine smear
static java.lang.String PROP_LANGUAGE_WEIGHT
          A property that defines the language weight for the search
static java.lang.String PROP_LOG_MATH
          A property that defines the logMath component.
static java.lang.String PROP_QUERY_LOG_FILE
          A property for the name of the file that logs all the queried N-grams.
static java.lang.String PROP_TRIGRAM_CACHE_SIZE
          A property that defines that maxium number of trigrams to be cached
static java.lang.String PROP_WORD_INSERTION_PROBABILITY
          Word insertion probability property
 
Fields inherited from interface edu.cmu.sphinx.linguist.language.ngram.LanguageModel
PROP_DICTIONARY, PROP_FORMAT, PROP_LOCATION, PROP_MAX_DEPTH, PROP_UNIGRAM_WEIGHT
 
Constructor Summary
LargeTrigramModel()
           
LargeTrigramModel(java.lang.String format, java.net.URL urlLocation, java.lang.String ngramLogFile, int maxTrigramCacheSize, int maxBigramCacheSize, boolean clearCacheAfterUtterance, int maxDepth, LogMath logMath, Dictionary dictionary, boolean applyLanguageWeightAndWip, float languageWeight, double wip, float unigramWeight, boolean fullSmear)
           
 
Method Summary
 void allocate()
          Create the language model
 void deallocate()
          Deallocate resources allocated to this language model
 float getBackoff(WordSequence wordSequence)
          Returns the backoff probability for the give sequence of words
 int getBigramMisses()
          Returns the number of times when a bigram is queried, but there is no bigram in the LM (in which case it uses the backoff probabilities).
 int getMaxDepth()
          Returns the maximum depth of the language model
 java.lang.String getName()
           
 float getProbability(WordSequence wordSequence)
          Gets the ngram probability of the word sequence represented by the word list
 float getSmear(WordSequence wordSequence)
          Gets the smear term for the given wordSequence
 float getSmearOld(WordSequence wordSequence)
          Gets the smear term for the given wordSequence
 int getTrigramHits()
          Returns the number of trigram hits.
 int getTrigramMisses()
          Returns the number of times when a trigram is queried, but there is no trigram in the LM (in which case it uses the backoff probabilities).
 java.util.Set<java.lang.String> getVocabulary()
          Returns the set of words in the lanaguage model.
 int getWordID(Word word)
          Returns the ID of the given word.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component needs to be reconfigured.
 void start()
          Called before a recognition
 void stop()
          Called after a recognition
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_QUERY_LOG_FILE

@S4String(mandatory=false)
public static final java.lang.String PROP_QUERY_LOG_FILE
A property for the name of the file that logs all the queried N-grams. If this property is set to null, it means that the queried N-grams are not logged.

See Also:
Constant Field Values

PROP_TRIGRAM_CACHE_SIZE

@S4Integer(defaultValue=100000)
public static final java.lang.String PROP_TRIGRAM_CACHE_SIZE
A property that defines that maxium number of trigrams to be cached

See Also:
Constant Field Values

PROP_BIGRAM_CACHE_SIZE

@S4Integer(defaultValue=50000)
public static final java.lang.String PROP_BIGRAM_CACHE_SIZE
A property that defines the maximum number of bigrams to be cached.

See Also:
Constant Field Values

PROP_CLEAR_CACHES_AFTER_UTTERANCE

@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
A property that controls whether the bigram and trigram caches are cleared after every utterance

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT

@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_LANGUAGE_WEIGHT
A property that defines the language weight for the search

See Also:
Constant Field Values

PROP_LOG_MATH

@S4Component(type=LogMath.class)
public static final java.lang.String PROP_LOG_MATH
A property that defines the logMath component.

See Also:
Constant Field Values

PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP

@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
A property that controls whether or not the language model will apply the language weight and word insertion probability

See Also:
Constant Field Values

PROP_WORD_INSERTION_PROBABILITY

@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
Word insertion probability property

See Also:
Constant Field Values

PROP_FULL_SMEAR

@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_FULL_SMEAR
If true, use full bigram information to determine smear

See Also:
Constant Field Values

BYTES_PER_BIGRAM

public static final int BYTES_PER_BIGRAM
The number of bytes per bigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

See Also:
Constant Field Values

BYTES_PER_TRIGRAM

public static final int BYTES_PER_TRIGRAM
The number of bytes per trigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

See Also:
Constant Field Values
Constructor Detail

LargeTrigramModel

public LargeTrigramModel(java.lang.String format,
                         java.net.URL urlLocation,
                         java.lang.String ngramLogFile,
                         int maxTrigramCacheSize,
                         int maxBigramCacheSize,
                         boolean clearCacheAfterUtterance,
                         int maxDepth,
                         LogMath logMath,
                         Dictionary dictionary,
                         boolean applyLanguageWeightAndWip,
                         float languageWeight,
                         double wip,
                         float unigramWeight,
                         boolean fullSmear)

LargeTrigramModel

public LargeTrigramModel()
Method Detail

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component needs to be reconfigured.

Specified by:
newProperties in interface Configurable
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

getName

public java.lang.String getName()

allocate

public void allocate()
              throws java.io.IOException
Description copied from interface: LanguageModel
Create the language model

Specified by:
allocate in interface LanguageModel
Throws:
java.io.IOException

deallocate

public void deallocate()
Description copied from interface: LanguageModel
Deallocate resources allocated to this language model

Specified by:
deallocate in interface LanguageModel

start

public void start()
Called before a recognition

Specified by:
start in interface LanguageModel

stop

public void stop()
Called after a recognition

Specified by:
stop in interface LanguageModel

getProbability

public float getProbability(WordSequence wordSequence)
Gets the ngram probability of the word sequence represented by the word list

Specified by:
getProbability in interface LanguageModel
Parameters:
wordSequence - the word sequence
Returns:
the probability of the word sequence. Probability is in logMath log base

getWordID

public final int getWordID(Word word)
Returns the ID of the given word.

Parameters:
word - the word to find the ID
Returns:
the ID of the word

getSmearOld

public float getSmearOld(WordSequence wordSequence)
Gets the smear term for the given wordSequence

Parameters:
wordSequence - the word sequence
Returns:
the smear term associated with this word sequence

getSmear

public float getSmear(WordSequence wordSequence)
Description copied from interface: LanguageModel
Gets the smear term for the given wordSequence

Specified by:
getSmear in interface LanguageModel
Parameters:
wordSequence - the word sequence
Returns:
the smear term associated with this word sequence

getBackoff

public float getBackoff(WordSequence wordSequence)
Returns the backoff probability for the give sequence of words

Parameters:
wordSequence - the sequence of words
Returns:
the backoff probability in LogMath log base

getMaxDepth

public int getMaxDepth()
Returns the maximum depth of the language model

Specified by:
getMaxDepth in interface LanguageModel
Returns:
the maximum depth of the language model

getVocabulary

public java.util.Set<java.lang.String> getVocabulary()
Returns the set of words in the lanaguage model. The set is unmodifiable.

Specified by:
getVocabulary in interface LanguageModel
Returns:
the unmodifiable set of words

getBigramMisses

public int getBigramMisses()
Returns the number of times when a bigram is queried, but there is no bigram in the LM (in which case it uses the backoff probabilities).

Returns:
the number of bigram misses

getTrigramMisses

public int getTrigramMisses()
Returns the number of times when a trigram is queried, but there is no trigram in the LM (in which case it uses the backoff probabilities).

Returns:
the number of trigram misses

getTrigramHits

public int getTrigramHits()
Returns the number of trigram hits.

Returns:
the number of trigram hits