|
Open Source Toolkit For Speech Recognition Project by Carnegie Mellon University |
This is a preliminary list of tasks for the SummerOfCode project. If you have some other idea, feel free to add it. For any questions contact us on cmusphinx-devel@lists.sourceforge.net mailing list or on #cmusphinx irc channel on freenode. See also Information for Students.
Task
Simon http://simon-listens.org is a dialog managment system for Linux desktop. It is an open-source speech recognition program and replaces the mouse and keyboard. It's designed to be very flexible and allows customization for any application where speech recognition is needed. It currently runs using HTK but it has a flexible architecture and able to plug any other engines. You need to implement a simon backend to recognize speech with pocketsphinx.
Complexity
Easy
Mentor
Peter Grasch
Skills
C++, Qt, KDE
Note
Task
Currently sphinx4 can only work with predefined dictionary. It's possible to build phonetic dictionary automatically but it requires both application of machine learning for training and development of decoder module as well as testing. Various language modules needs to be trained as well. This work will be implement letter to sound rules with OpenFST in sphinx4.
Reading
Sittichai Jiampojamarn, Colin Cherry and Grzegorz Kondrak. “Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion”. In Proceeding of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT), Columbus, OH, June 2008, pp.905-913.
http://code.google.com/p/directl-p/
M. Bisani and H. Ney. “Joint-Sequence Models for Grapheme-to-Phoneme Conversion”. Speech Communication, Volume 50, Issue 5, May 2008, Pages 434-451
http://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html
Complexity
Medium
Mentor
Nickolay V. Shmyrev
Skills
Java, Python, C++, Machine learning
Note
Task
Implement the simple reading and pronunciation learning system
More Information
http://cmusphinx.sourceforge.net/wiki/faq#qhow_to_implement_pronunciation_evaluation
Complexity
Easy
Mentor
James Salsman (jpsalsman@talknicer.com)
Skills
C, Perl, Javascript, Actionscript/Flex4, statistics
Note
Accepted for GSoC 2012: Srikanth Ronanki and Troy Lee
Current language models are very basic that means they don't really understand what's transcribed. That affects error rate. Create a decoder over the lattices that will select semantically correct path and create a perfectly readable result.
Reading
Integrating Word Relationships into Language Models Guihong Cao, Jian-Yun Nie, Jing Bai
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.7443&rep=rep1&type=pdf
Mentor
Bhiksha Raj
Complexity
Hard
Skills
Java, C, Machine learning
Note
Create language-independent postprocessing framework that will turn ASR results into something readable with punctuation, abbreviations and capitalization.
Reading
SENTENCE SEGMENTATION AND PUNCTUATION RECOVERY FOR SPOKEN LANGUAGE TRANSLATION Matthias Paulik, Sharath Rao1, Ian Lane1, Stephan Vogel1 and Tanja Schultz
http://www.makapa.de/Paulik_Sent_ICASSP08.pdf
Mentor
Elias Majic
Complexity
Hard
Skills
Java, Machine learning
Note
Write a crawler which can collect text data for language model training on certain topic
Reading
Web augmentation of language models for continuous speech recognition of SMS text messages
http://www.aclweb.org/anthology/E/E09/E09-1019.pdf
Mentor
Tony Robinson
Complexity
Hard
Skills
Java. Web technologies.
Note
Implement Epraim-Malah and Kalman noise cancellation filters in pocketsphinx or sphinx4.
Reading
A modified Ephraim-Malah noise suppression rule for automatic speech recognition. Gemello R, F. Mana, R. De Mori.
Mentor
Nickolay V. Shmyrev
Complexity
Easy
Skills
C or Java, Signal processing.