Speech Recognizers/ASRs
Automatic Speech Recognition (ASRs) are tools that allow spoken words and sentences to be transcribed into text. This can make it easier to communicate with a dialogue system by allowing the user to speak audibly instead of typing out utterances.
[table sort=”asc”]
Name[attr style=”width: 250px;”], Level, Location, Cost, License, Overview
Google Speech, High, Cloud, Paid, Proprietary, A powerful neural network based cloud service that recognizes over 80 languages
Bing Speech, High, Cloud, Paid, Proprietary, Also adds formatting to text (eg. punctuation; capitalization; masking profanity; etc.)
API.AI, High, Cloud, Free, CC 4.0, Open-source model based on Kaldi with open source scripts to run it also available
Alexa Voice, High, Both, Free, Custom, Amazon’s Alexa Voice Service includes a full set of NLP tools including ASR
Wit Speech, High, Cloud, Free, Custom, Facebook’s wit.ai includes a “Speech to JSON” feature
IBM Watson, High, Cloud, Paid, Proprietary, IBM offers first thousand minutes of speech-to-text for free each month
HTK, Low, Local, Free, Custom, The Hidden Markov Model ToolKit works with HMMs geared toward speech recognition but is flexible
CMU Sphinx, Low, Local, Free, BSD-style, Low resource speech recognition that can even be used on mobile
Kaldi, Low, Local, Free, Apache 2.0, Open source project designed to be as flexible in its use as possible.
Julius, Low, Local, Free, Custom, High-performance open source large vocabulary continuous speech recognition software
Speechmatics, High, Cloud, Paid, Proprietary, Recurrent Neural Network based speech recognition and text-video time alignment
Vocapia, High, Cloud, Paid, Proprietary, Offers cloud-based speech recognition and other features including language recognition
Simon, High,Local, Free, GNU 1.2, Wrapper around low-level tools including CMU Sphinx Julius and HTK
Jasper, High, Local, Free, MIT, Speech recognition designed for easy use with Raspberry Pi
OpenEars (iOS), High, Local, Free, Politepix, Uses CMU Sphinx in a free-to-use mobile framework for iOS
Apple Dictation, High, Local, Free, Proprietary, The built-in dictation feature of OSX
Microsoft Speech Recognition, High, Local, Paid, MIT License, Microsoft’s speech recognizer.
[/table]