|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.cmu.sphinx.util.props.ConfigurableAdapter
edu.cmu.sphinx.frontend.BaseDataProcessor
edu.cmu.sphinx.frontend.endpoint.SpeechMarker
public class SpeechMarker
Converts a stream of SpeechClassifiedData objects, marked as speech and non-speech, and mark out the regions that are considered speech. This is done by inserting SPEECH_START and SPEECH_END signals into the stream.
The algorithm for inserting the two signals is as follows.
The algorithm is always in one of two states: 'in-speech' and 'out-of-speech'. If 'out-of-speech', it will read in audio until we hit audio that is speech. If we have read more than 'startSpeech' amount of continuous speech, we consider that speech has started, and insert a SPEECH_START at 'speechLeader' time before speech first started. The state of the algorithm changes to 'in-speech'.
Now consider the case when the algorithm is in 'in-speech' state. If it read an audio that is speech, it is scheduled for output. If the audio is non-speech, we read ahead until we have 'endSilence' amount of continuous non-speech. At the point we consider that speech has ended. A SPEECH_END signal is inserted at 'speechTrailer' time after the first non-speech audio. The algorithm returns to 'out-of-speech' state. If any speech audio is encountered in-between, the accounting starts all over again. While speech audio is processed delay is lowered to some minimal amount. This helps to segment both slow speech with visible delays and fast speech when delays are minimal.
| Field Summary | |
|---|---|
static java.lang.String |
PROP_END_SILENCE
A property for the amount of time in silence (in milliseconds) to be considered as utterance end. |
static java.lang.String |
PROP_END_SILENCE_DECAY
A property to decrease end silence while we are reading speech. |
static java.lang.String |
PROP_SPEECH_LEADER
A property for the amount of time (in milliseconds) before speech start to be included as speech data. |
static java.lang.String |
PROP_SPEECH_LEADER_FRAMES
A property for number of frames to keep in buffer. |
static java.lang.String |
PROP_SPEECH_TRAILER
A property for the amount of time (in milliseconds) after speech ends to be included as speech data. |
static java.lang.String |
PROP_START_SPEECH
A property for the minimum amount of time in speech (in milliseconds) to be considered as utterance start. |
| Constructor Summary | |
|---|---|
SpeechMarker()
|
|
SpeechMarker(int startSpeechTime,
int endSilenceTime,
int speechLeader,
int speechLeaderFrames,
int speechTrailer)
|
|
| Method Summary | |
|---|---|
int |
getAudioTime(SpeechClassifiedData audio)
Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object. |
Data |
getData()
Returns the next Data object. |
void |
initialize()
Initializes this SpeechMarker |
boolean |
inSpeech()
|
void |
newProperties(PropertySheet ps)
This method is called when this configurable component needs to be reconfigured. |
| Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor |
|---|
getPredecessor, getTimer, setPredecessor |
| Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter |
|---|
getName, toString |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
@S4Integer(defaultValue=200) public static final java.lang.String PROP_START_SPEECH
@S4Integer(defaultValue=500) public static final java.lang.String PROP_END_SILENCE
@S4Integer(defaultValue=50) public static final java.lang.String PROP_SPEECH_LEADER
@S4Integer(defaultValue=30) public static final java.lang.String PROP_SPEECH_LEADER_FRAMES
@S4Integer(defaultValue=50) public static final java.lang.String PROP_SPEECH_TRAILER
@S4Double(defaultValue=15.0) public static final java.lang.String PROP_END_SILENCE_DECAY
| Constructor Detail |
|---|
public SpeechMarker(int startSpeechTime,
int endSilenceTime,
int speechLeader,
int speechLeaderFrames,
int speechTrailer)
public SpeechMarker()
| Method Detail |
|---|
public void newProperties(PropertySheet ps)
throws PropertyException
Configurable
newProperties in interface ConfigurablenewProperties in class ConfigurableAdapterps - a property sheet holding the new data
PropertyException - if there is a problem with the properties.public void initialize()
initialize in interface DataProcessorinitialize in class BaseDataProcessor
public Data getData()
throws DataProcessingException
getData in interface DataProcessorgetData in class BaseDataProcessorDataProcessingException - if a data processing error occurspublic int getAudioTime(SpeechClassifiedData audio)
audio - the SpeechClassifiedData object
public boolean inSpeech()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||