Bing Speech API

Microsoft’s Speech Service transcribes audio streams into text suitable for display to a user. Transcription includes adding appropriate capitalization and punctuation, masking profanity, and normalizing text. For example, if a user says remind me to buy six pencils, Microsoft’s Speech Services will return the transcribed text Remind me to buy 6 pencils. There are two options for adding speech recognition capabilities to your app:

The REST API uses chunked-transfer encoding to convert short spoken commands without real-time streaming or user feedback.
The WebSocket API uses full-duplex communication to convert longer audio input and supports intermediate results. Use this comparison chart to help you choose the API that fits your needs.