Google has since launched it's official Google Cloud Speech API. I strongly recommend looking over there.
https://www.google.com/speech-api/v2/recognize
output: json, xml not supported.
lang: any valid locale (en-us, nl-be, fr-fr, etc.)
key: Please get one from the Google Developers Console
Key is not optional.
app: optional
You can specify an optional query string called app
, which returns some extra transcripts for some reason.
client: optional, seems to do nothing in particular
Flac file; 44100Hz 32bit float, exported with Audacity. Check the audio folder in this repository for some hilarious examples.
Channels : 2
Sample Rate : 44100
Precision : 32-bit
Sample Encoding: 32-bit Float
The following audio options are confirmed working for 16-bit PCM sample encoding:
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Sample Encoding: 16-bit Signed Integer PCM
One-line sox recording command:
rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav
Content-Type:
Content-Type: audio/x-flac; rate=44100;
Set the rate to be equal to the rate of the FLAC file (generally 44100Hz) but it supports different rates.
Content-Type: audio/l16; rate=16000;
is also supported with a rate of 44100Hz or 16000Hz for files encoded with LPCM 16-bit signed-integer.
NOTE: Make sure the rate in your header matches the sample rate you used for your audio capture.
User-Agent:
not required, but for spoofing purposes use one of Chrome’s userAgent strings.
When Google is 100% confident in it's translation, it will return the following object:
{
"result":[
{
"alternative":[
{
"transcript":"good morning Google how are you feeling today"
}
],
"final":true
}
],
"result_index":0
}
When it's doubtful, it adds a confidence parameter for you. It also seems to add multiple transcripts for some reason.
{
"result":[
{
"alternative":[
{
"transcript":"this is a test",
"confidence":0.97321892
},
{
"transcript":"this is a test for"
}
],
"final":true
}
],
"result_index":0
}
On OS X with Homebrew installed:
brew install sox
rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav
curl -X POST \
--data-binary @'audio/hello (16bit PCM).wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'
Or for FLAC encoded audio:
curl -X POST \
--data-binary @audio/good-morning-google.flac \
--header 'Content-Type: audio/x-flac; rate=44100;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'
Here are a few caveats you have to know about, should you decide to use this API in a production environment. (I don't recommend it)
- The API only accepts up to ~10-15 seconds of audio.
- Generating your own Speech API Key, you can only make 50 requests per day.