Speech Recognizers/ASRs

Automatic Speech Recognition (ASRs) are tools that allow spoken words and sentences to be transcribed into text. This can make it easier to communicate with a dialogue system by allowing the user to speak audibly instead of typing out utterances.

[table sort=”asc”]
Name[attr style=”width: 250px;”], Level, Location, Cost, License, Overview

Google Speech, High, Cloud, Paid, Proprietary, A powerful neural network based cloud service that recognizes over 80 languages

Bing Speech, High, Cloud, Paid, Proprietary, Also adds formatting to text (eg. punctuation; capitalization; masking profanity; etc.)

API.AI, High, Cloud, Free, CC 4.0, Open-source model based on Kaldi with open source scripts to run it also available

Alexa Voice, High, Both, Free, Custom, Amazon’s Alexa Voice Service includes a full set of NLP tools including ASR

Wit Speech, High, Cloud, Free, Custom, Facebook’s wit.ai includes a “Speech to JSON” feature

IBM Watson, High, Cloud, Paid, Proprietary, IBM offers first thousand minutes of speech-to-text for free each month

HTK, Low, Local, Free, Custom, The Hidden Markov Model ToolKit works with HMMs geared toward speech recognition but is flexible

CMU Sphinx, Low, Local, Free, BSD-style, Low resource speech recognition that can even be used on mobile

Kaldi, Low, Local, Free, Apache 2.0, Open source project designed to be as flexible in its use as possible.

Julius, Low, Local, Free, Custom, High-performance open source large vocabulary continuous speech recognition software

Speechmatics, High, Cloud, Paid, Proprietary, Recurrent Neural Network based speech recognition and text-video time alignment

Vocapia, High, Cloud, Paid, Proprietary, Offers cloud-based speech recognition and other features including language recognition

Simon, High,Local, Free, GNU 1.2, Wrapper around low-level tools including CMU Sphinx Julius and HTK

Jasper, High, Local, Free, MIT, Speech recognition designed for easy use with Raspberry Pi

OpenEars (iOS), High, Local, Free, Politepix, Uses CMU Sphinx in a free-to-use mobile framework for iOS

Apple Dictation, High, Local, Free, Proprietary, The built-in dictation feature of OSX

Microsoft Speech Recognition, High, Local, Paid, MIT License, Microsoft’s speech recognizer.

 

[/table]