Speech Recognizers/ASRs
Automatic Speech Recognition (ASRs) are tools that allow spoken words and sentences to be transcribed into text. This can make it easier to communicate with a dialogue system by allowing the user to speak audibly instead of typing out utterances.
Name | Level | Location | Cost | License | Overview |
---|---|---|---|---|---|
Google Speech | High | Cloud | Paid | Proprietary | A powerful neural network based cloud service that recognizes over 80 languages |
Bing Speech | High | Cloud | Paid | Proprietary | Also adds formatting to text (eg. punctuation; capitalization; masking profanity; etc.) |
API.AI | High | Cloud | Free | CC 4.0 | Open-source model based on Kaldi with open source scripts to run it also available |
Alexa Voice | High | Both | Free | Custom | Amazon's Alexa Voice Service includes a full set of NLP tools including ASR |
Wit Speech | High | Cloud | Free | Custom | Facebook's wit.ai includes a "Speech to JSON" feature |
IBM Watson | High | Cloud | Paid | Proprietary | IBM offers first thousand minutes of speech-to-text for free each month |
HTK | Low | Local | Free | Custom | The Hidden Markov Model ToolKit works with HMMs geared toward speech recognition but is flexible |
CMU Sphinx | Low | Local | Free | BSD-style | Low resource speech recognition that can even be used on mobile |
Kaldi | Low | Local | Free | Apache 2.0 | Open source project designed to be as flexible in its use as possible. |
Julius | Low | Local | Free | Custom | High-performance open source large vocabulary continuous speech recognition software |
Speechmatics | High | Cloud | Paid | Proprietary | Recurrent Neural Network based speech recognition and text-video time alignment |
Vocapia | High | Cloud | Paid | Proprietary | Offers cloud-based speech recognition and other features including language recognition |
Simon | High | Local | Free | GNU 1.2 | Wrapper around low-level tools including CMU Sphinx Julius and HTK |
Jasper | High | Local | Free | MIT | Speech recognition designed for easy use with Raspberry Pi |
OpenEars (iOS) | High | Local | Free | Politepix | Uses CMU Sphinx in a free-to-use mobile framework for iOS |
Apple Dictation | High | Local | Free | Proprietary | The built-in dictation feature of OSX |
Microsoft Speech Recognition | High | Local | Paid | MIT License | Microsoft's speech recognizer. |