Tasks for Summer Of Code Projects

This is a preliminary list of tasks for the SummerOfCode project. If you have some other idea, feel free to add it. For any questions contact us on cmusphinx-devel@lists.sourceforge.net mailing list or on #cmusphinx irc channel on freenode. See also Information for Students.

Accurate voice activity detector for speech recognition and training

Acoustic model quality is critical for the recognizer accuracy. The fact is that very often acoustic models for speech sounds are trained totally wrong and do not even properly separate silence and sounds. Somehow Baum-Welch training do not make sure that silence model actually corresponds to silence and speech models correspond to speech.

The task is to create more or less accurate voice activity detector and integrate it into a model training process so so that another information source will help to improve the acoustic model quality. The silence detector result will be simple included into the trainign process as a feature and will help to train the models properly. Beside trainer, you also need to integrate silence detector into the decoder to evaluate the results.

Expected results

VAD code is implemented and incorporated into training. The impovement on a test database is measure.

The process is well documented on the project wiki

Complexity

Medium

Skills

C, Java, Machine learning

Note

Example of the voice activity detector

A NEW VOICE ACTIVITY DETECTOR USING SUBBAND ORDER-STATISTICS FILTERS FOR ROBUST SPEECH RECOGNITION by Ramirez at al.

Extraction Methods of Voicing Feature for Robust Speech Recognition

Very large scale grid-based model training

Have you heard about SETI project or Folding@HOME? The idea is that you can give everyone an easy opportunity to share CPU power to some project. So people could simply install the screensaver on their desktops and help to train a very large acoustic model. That seems to be a great opportunity to win over Google's resources.

The goal is to port CMUSphinx acoustic model training process to a distributed grid framework so everyone could donate the resources for best possible speech recognition accuracy. The port should outsource most computationally hard tasks to the grid, mainly it's a Baum-Welch estimation (bw binary in acoustic model training tutorial).

The project can use BOINC for the grid middleware or some Java analog of BOINC if you prefer to do project in Java. We prefer Java.

Resources

A server for grid experiments will be provided

Expected results

Training process can submit bw task and features for it to the grid and train the acoustic model this way.

The setup and code is well documented in the project wiki

Complexity

Easy

Skills

C, Java

Note

BOINC Project

Large scale grid-based model optimization

Previous project deals with existing training framework, but we can try something very new. We can try to build the best possible acoustic model with brute force. You know, it's better to spend electricity on speech recognition than on search for the alliens who will enslave us ;)

The acoustic model is just a set of about 1000 parameters. You can apply a simple large-scale optimization algorithm to create the best acoustic model possible. You need to distribute evaluation data on the grid and just select the best parameters possible. You can start from existing model approximation and try to improve it.

Resources

A server for grid experiments will be provided

Expected results

The setup is in place and the model training is running. The model demonstrates reasonable improvement over baseline on our task.

Complexity

Hard

Skills

C, Java, Optimization methods

Note

BOINC Project

Numerical Optimization Texbook

Collect pronunciation dictionaries from Wikipedia

It's critical to reuse existing information sources to improve system performance and support more languages. We also need to incorporate quickly the pronunciation for many new words which appeared just last year and missing in the common dictionaries. Think about how to pronounce the word “gangnam”. One valuable source of phonetic pronunciations is Wikipedia, in particular Wiktionary project. It often contains pronunciations for many words in IPA format created by the dictionary authors. However, it's not trivial to parse the pages in uniform way since page format is different from language to language.

You need to write a code to parse existing Wiktionary pronunciations and create the dictionary for many langauges. It's not that trivial task as it might seem as the code has to work for many Wikipedia languages.

Expected results

Phonetic dictionaries for 10 languages are collected from Wikipedia. The tool is documented and is easy to use to discover pronunciations for the new words.

Complexity

Easy

Skills

Python

Note

http://en.wiktionary.org/wiki/Wiktionary:Main_Page

Rework acoustic model format

Current acoustic model format is not very flexible to match state of the art recognizers. The issue is that we only support a fixed amount of mixtures per state and this reduces the possible accuracy of the speech recognition system. You need to refactor acoustic model loading code in sphinxtrain, pocketsphinx and sphinx4 in order to allow more flexible models.

Expected results

Introduce the new acoustic model format and modify the code to load/store acoustic models.

Complexity

Easy

Skills

C, Java, Python

http://en.wikipedia.org/wiki/Mixture_model

Implement Java trainer on Hadoop framework

Hadoop is a framework for distributed computation an it's enabled processing of the huge databases. Sphinx4 implements java training already, but this training is not parallel. The port of the Java trainer to support training from Hadoop would allow us to scale significantly beyond the simple training setups.

Expected results

Train Sphinxtrain acoustic models using Hadoop framework

Complexity

Easy

Skills

Java, Hadoop

Notes

Apache Mahout, machine learning using Hadoop Apache Hadoop

http://en.wikipedia.org/wiki/Mixture_model

Your project here

We are very interested in a new project ideas related to large scale learning, distributed computing, human-assisted computation. Feel free to suggest your project here!

 
summerofcodeideas.txt · Last modified: 2013/03/30 09:17 by admin
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki