edu.cmu.sphinx.frontend.util
Class Microphone

java.lang.Object
  extended by edu.cmu.sphinx.util.props.ConfigurableAdapter
      extended by edu.cmu.sphinx.frontend.BaseDataProcessor
          extended by edu.cmu.sphinx.frontend.util.Microphone
All Implemented Interfaces:
DataProcessor, Configurable

public class Microphone
extends BaseDataProcessor

A Microphone captures audio data from the system's underlying audio input systems. Converts these audio data into Data objects. When the method startRecording() is called, a new thread will be created and used to capture audio, and will stop when stopRecording() is called. Calling getData() returns the captured audio data as Data objects.

This Microphone will attempt to obtain an audio device with the format specified in the configuration. If such a device with that format cannot be obtained, it will try to obtain a device with an audio format that has a higher sample rate than the configured sample rate, while the other parameters of the format (i.e., sample size, endianness, sign, and channel) remain the same. If, again, no such device can be obtained, it flags an error, and a call startRecording returns false.


Nested Class Summary
(package private)  class Microphone.RecordingThread
          This Thread records audio, and caches them in an audio buffer.
 
Field Summary
private  int audioBufferSize
           
private  javax.sound.sampled.TargetDataLine audioLine
           
private  java.util.concurrent.BlockingQueue<Data> audioList
           
private  javax.sound.sampled.AudioInputStream audioStream
           
private  boolean bigEndian
           
private  boolean closeBetweenUtterances
           
private  Utterance currentUtterance
           
private  javax.sound.sampled.AudioFormat desiredFormat
           
private  boolean doConversion
           
private  javax.sound.sampled.AudioFormat finalFormat
           
private  int frameSizeInBytes
           
private  boolean keepDataReference
           
private  int msecPerRead
           
static java.lang.String PROP_BIG_ENDIAN
          The property specify the endianness of the data.
static java.lang.String PROP_BITS_PER_SAMPLE
          The property for the number of bits per value.
static java.lang.String PROP_BUFFER_SIZE
          The property that specifies the size of the buffer used to store audio samples recorded from the microphone.
static java.lang.String PROP_CHANNELS
          The property specifying the number of channels.
static java.lang.String PROP_CLOSE_BETWEEN_UTTERANCES
          The property that specifies whether or not the microphone will release the audio between utterances.
static java.lang.String PROP_KEEP_LAST_AUDIO
          The property that specifies whether to keep the audio data of an utterance around until the next utterance is recorded.
static java.lang.String PROP_MSEC_PER_READ
          The property that specifies the number of milliseconds of audio data to read each time from the underlying Java Sound audio device.
static java.lang.String PROP_SAMPLE_RATE
          The property for the sample rate of the data.
static java.lang.String PROP_SELECT_CHANNEL
          The property that specifies the channel to use if the audio is stereo
static java.lang.String PROP_SELECT_MIXER
          The property that specifies the mixer to use.
static java.lang.String PROP_SIGNED
          The property specify whether the data is signed.
static java.lang.String PROP_STEREO_TO_MONO
          The property that specifies how to convert stereo audio to mono.
private  Microphone.RecordingThread recorder
           
private  boolean recording
           
private  int sampleRate
           
private  int selectedChannel
           
private  java.lang.String selectedMixerIndex
           
private  boolean signed
           
private  java.lang.String stereoToMono
           
private  boolean utteranceEndReached
           
 
Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
logger
 
Constructor Summary
Microphone()
           
Microphone(int sampleRate, int bitsPerSample, int channels, boolean bigEndian, boolean signed, boolean closeBetweenUtterances, int msecPerRead, boolean keepLastAudio, java.lang.String stereoToMono, int selectedChannel, java.lang.String selectedMixerIndex, int audioBufferSize)
           
 
Method Summary
 void clear()
          Clears all cached audio data.
private  double[] convertStereoToMono(double[] samples, int channels)
          Converts stereo audio to mono.
 javax.sound.sampled.AudioFormat getAudioFormat()
          Returns the format of the audio recorded by this Microphone.
private  javax.sound.sampled.TargetDataLine getAudioLine()
          Creates the audioLine if necessary and returns it.
 Data getData()
          Reads and returns the next Data object from this Microphone, return null if there is no more audio data.
private  javax.sound.sampled.Mixer getSelectedMixer()
          Gets the Mixer to use.
 Utterance getUtterance()
          Returns the current Utterance.
 boolean hasMoreData()
          Returns true if there is more data in the Microphone.
 void initialize()
          Constructs a Microphone with the given InputStream.
 boolean isRecording()
          Returns true if this Microphone is recording.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component needs to be reconfigured.
private  boolean open()
          Opens the audio capturing device so that it will be ready for capturing audio.
 boolean startRecording()
          Starts recording audio.
 void stopRecording()
          Stops recording audio.
 
Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
getPredecessor, getTimer, setPredecessor
 
Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
getName, initLogger, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_SAMPLE_RATE

@S4Integer(defaultValue=16000)
public static final java.lang.String PROP_SAMPLE_RATE
The property for the sample rate of the data.

See Also:
Constant Field Values

PROP_CLOSE_BETWEEN_UTTERANCES

@S4Boolean(defaultValue=true)
public static final java.lang.String PROP_CLOSE_BETWEEN_UTTERANCES
The property that specifies whether or not the microphone will release the audio between utterances. On certain systems (Linux for one), closing and reopening the audio does not work too well. The default is false for Linux systems, true for others.

See Also:
Constant Field Values

PROP_MSEC_PER_READ

@S4Integer(defaultValue=10)
public static final java.lang.String PROP_MSEC_PER_READ
The property that specifies the number of milliseconds of audio data to read each time from the underlying Java Sound audio device.

See Also:
Constant Field Values

PROP_BITS_PER_SAMPLE

@S4Integer(defaultValue=16)
public static final java.lang.String PROP_BITS_PER_SAMPLE
The property for the number of bits per value.

See Also:
Constant Field Values

PROP_CHANNELS

@S4Integer(defaultValue=1)
public static final java.lang.String PROP_CHANNELS
The property specifying the number of channels.

See Also:
Constant Field Values

PROP_BIG_ENDIAN

@S4Boolean(defaultValue=true)
public static final java.lang.String PROP_BIG_ENDIAN
The property specify the endianness of the data.

See Also:
Constant Field Values

PROP_SIGNED

@S4Boolean(defaultValue=true)
public static final java.lang.String PROP_SIGNED
The property specify whether the data is signed.

See Also:
Constant Field Values

PROP_KEEP_LAST_AUDIO

@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_KEEP_LAST_AUDIO
The property that specifies whether to keep the audio data of an utterance around until the next utterance is recorded.

See Also:
Constant Field Values

PROP_STEREO_TO_MONO

@S4String(defaultValue="average",
          range={"average","selectChannel"})
public static final java.lang.String PROP_STEREO_TO_MONO
The property that specifies how to convert stereo audio to mono. Currently, the possible values are "average", which averages the samples from at each channel, or "selectChannel", which chooses audio only from that channel. If you choose "selectChannel", you should also specify which channel to use with the "selectChannel" property.

See Also:
Constant Field Values

PROP_SELECT_CHANNEL

@S4Integer(defaultValue=0)
public static final java.lang.String PROP_SELECT_CHANNEL
The property that specifies the channel to use if the audio is stereo

See Also:
Constant Field Values

PROP_SELECT_MIXER

@S4String(defaultValue="default")
public static final java.lang.String PROP_SELECT_MIXER
The property that specifies the mixer to use. The value can be "default," (which means let the AudioSystem decide), "last," (which means select the last Mixer supported by the AudioSystem), which appears to be what is often used for USB headsets, or an integer value which represents the index of the Mixer.Info that is returned by AudioSystem.getMixerInfo(). To get the list of Mixer.Info objects, run the AudioTool application with a command line argument of "-dumpMixers".

See Also:
AudioTool, Constant Field Values

PROP_BUFFER_SIZE

@S4Integer(defaultValue=6400)
public static final java.lang.String PROP_BUFFER_SIZE
The property that specifies the size of the buffer used to store audio samples recorded from the microphone. Default value correspond to 200ms. Smaller value decrease microphone latency with danger of dropping out the frames if decoding thread will be slow enough to process the result.

See Also:
Constant Field Values

finalFormat

private javax.sound.sampled.AudioFormat finalFormat

audioStream

private javax.sound.sampled.AudioInputStream audioStream

audioLine

private javax.sound.sampled.TargetDataLine audioLine

audioList

private java.util.concurrent.BlockingQueue<Data> audioList

currentUtterance

private Utterance currentUtterance

doConversion

private boolean doConversion

recording

private volatile boolean recording

utteranceEndReached

private volatile boolean utteranceEndReached

recorder

private Microphone.RecordingThread recorder

desiredFormat

private javax.sound.sampled.AudioFormat desiredFormat

closeBetweenUtterances

private boolean closeBetweenUtterances

keepDataReference

private boolean keepDataReference

signed

private boolean signed

bigEndian

private boolean bigEndian

frameSizeInBytes

private int frameSizeInBytes

msecPerRead

private int msecPerRead

selectedChannel

private int selectedChannel

selectedMixerIndex

private java.lang.String selectedMixerIndex

stereoToMono

private java.lang.String stereoToMono

sampleRate

private int sampleRate

audioBufferSize

private int audioBufferSize
Constructor Detail

Microphone

public Microphone(int sampleRate,
                  int bitsPerSample,
                  int channels,
                  boolean bigEndian,
                  boolean signed,
                  boolean closeBetweenUtterances,
                  int msecPerRead,
                  boolean keepLastAudio,
                  java.lang.String stereoToMono,
                  int selectedChannel,
                  java.lang.String selectedMixerIndex,
                  int audioBufferSize)
Parameters:
sampleRate - sample rate of the data
bitsPerSample - number of bits per value.
channels - number of channels.
bigEndian - the endianness of the data
signed - whether the data is signed.
closeBetweenUtterances - whether or not the microphone will release the audio between utterances. On certain systems (Linux for one), closing and reopening the audio does not work too well. The default is false for Linux systems, true for others
msecPerRead - the number of milliseconds of audio data to read each time from the underlying Java Sound audio device.
keepLastAudio - whether to keep the audio data of an utterance around until the next utterance is recorded.
stereoToMono - how to convert stereo audio to mono. Currently, the possible values are "average", which averages the samples from at each channel, or "selectChannel", which chooses audio only from that channel. If you choose "selectChannel", you should also specify which channel to use with the "selectChannel" property.
selectedChannel - the channel to use if the audio is stereo
selectedMixerIndex - the mixer to use. The value can be "default," (which means let the AudioSystem decide), "last," (which means select the last Mixer supported by the AudioSystem), which appears to be what is often used for USB headsets, or an integer value which represents the index of the Mixer.Info that is returned by AudioSystem.getMixerInfo(). To get the list of Mixer.Info objects, run the AudioTool application with a command line argument of "-dumpMixers".

Microphone

public Microphone()
Method Detail

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component needs to be reconfigured.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class ConfigurableAdapter
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

initialize

public void initialize()
Constructs a Microphone with the given InputStream.

Specified by:
initialize in interface DataProcessor
Overrides:
initialize in class BaseDataProcessor

getSelectedMixer

private javax.sound.sampled.Mixer getSelectedMixer()
Gets the Mixer to use. Depends upon selectedMixerIndex being defined.

See Also:
newProperties(edu.cmu.sphinx.util.props.PropertySheet)

getAudioLine

private javax.sound.sampled.TargetDataLine getAudioLine()
Creates the audioLine if necessary and returns it.


open

private boolean open()
Opens the audio capturing device so that it will be ready for capturing audio. Attempts to create a converter if the requested audio format is not directly available.

Returns:
true if the audio capturing device is opened successfully; false otherwise

getAudioFormat

public javax.sound.sampled.AudioFormat getAudioFormat()
Returns the format of the audio recorded by this Microphone. Note that this might be different from the configured format.

Returns:
the current AudioFormat

getUtterance

public Utterance getUtterance()
Returns the current Utterance.

Returns:
the current Utterance

isRecording

public boolean isRecording()
Returns true if this Microphone is recording.

Returns:
true if this Microphone is recording, false otherwise

startRecording

public boolean startRecording()
Starts recording audio. This method will return only when a START event is received, meaning that this Microphone has started capturing audio.

Returns:
true if the recording started successfully; false otherwise

stopRecording

public void stopRecording()
Stops recording audio. This method does not return until recording has been stopped and all data has been read from the audio line.


convertStereoToMono

private double[] convertStereoToMono(double[] samples,
                                     int channels)
Converts stereo audio to mono.

Parameters:
samples - the audio samples, each double in the array is one sample
channels - the number of channels in the stereo audio

clear

public void clear()
Clears all cached audio data.


getData

public Data getData()
             throws DataProcessingException
Reads and returns the next Data object from this Microphone, return null if there is no more audio data. All audio data captured in-between startRecording() and stopRecording() is cached in an Utterance object. Calling this method basically returns the next chunk of audio data cached in this Utterance.

Specified by:
getData in interface DataProcessor
Specified by:
getData in class BaseDataProcessor
Returns:
the next Data or null if none is available
Throws:
DataProcessingException - if there is a data processing error

hasMoreData

public boolean hasMoreData()
Returns true if there is more data in the Microphone. This happens either if the a DataEndSignal data was not taken from the buffer, or if the buffer in the Microphone is not yet empty.

Returns:
true if there is more data in the Microphone