Improving Open Source Speech Recognition

Speech Recognition Engines require two types of files to recognize speech: an Acoustic Model, created by 'compiling' a lots of transcribed speech into statistical models, and a Language Model (for Dictation) or Grammar file (for Command and Control). Most Acoustic Models used by 'Open Source' Speech Recognition engines are 'Closed Source'. They do not give you access to the speech audio (the 'Source') used to create the Acoustic Model. The reason for this is that there is no free Speech Corpus in a form that can readily be used to create Acoustic Models for Speech Recognition Engines. Open Source projects are thus required to purchase a Speech Corpus which has restrictive licensing in order to create their Acoustic Models. VoxForge ( was set up to address this problem. The site collects GPL transcribed speech audio from users which is then used to create Acoustic Models. These can then be used with Free and Open Source Speech Recognition Engines such as Sphinx, ISIP, Julius and/or HTK.

