Speech Recognizers/ASRs

Automatic Speech Recognition (ASRs) are tools that allow spoken words and sentences to be transcribed into text. This can make it easier to communicate with a dialogue system by allowing the user to speak audibly instead of typing out utterances.

Name Level Location Cost License Overview
Google Speech High Cloud Paid Proprietary A powerful neural network based cloud service that recognizes over 80 languages
Bing Speech High Cloud Paid Proprietary Also adds formatting to text (eg. punctuation; capitalization; masking profanity; etc.)
API.AI High Cloud Free CC 4.0 Open-source model based on Kaldi with open source scripts to run it also available
Alexa Voice High Both Free Custom Amazon's Alexa Voice Service includes a full set of NLP tools including ASR
Wit Speech High Cloud Free Custom Facebook's wit.ai includes a "Speech to JSON" feature
IBM Watson High Cloud Paid Proprietary IBM offers first thousand minutes of speech-to-text for free each month
HTK Low Local Free Custom The Hidden Markov Model ToolKit works with HMMs geared toward speech recognition but is flexible
CMU Sphinx Low Local Free BSD-style Low resource speech recognition that can even be used on mobile
Kaldi Low Local Free Apache 2.0 Open source project designed to be as flexible in its use as possible.
Julius Low Local Free Custom High-performance open source large vocabulary continuous speech recognition software
Speechmatics High Cloud Paid Proprietary Recurrent Neural Network based speech recognition and text-video time alignment
Vocapia High Cloud Paid Proprietary Offers cloud-based speech recognition and other features including language recognition
Simon High Local Free GNU 1.2 Wrapper around low-level tools including CMU Sphinx Julius and HTK
Jasper High Local Free MIT Speech recognition designed for easy use with Raspberry Pi
OpenEars (iOS) High Local Free Politepix Uses CMU Sphinx in a free-to-use mobile framework for iOS
Apple Dictation High Local Free Proprietary The built-in dictation feature of OSX
Microsoft Speech Recognition High Local Paid MIT License Microsoft's speech recognizer.