-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add speech api streaming sample. #239
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,10 +37,36 @@ See the | |
[Cloud Platform Auth Guide](https://cloud.google.com/docs/authentication#developer_workflow) | ||
for more information. | ||
|
||
### Install the dependencies | ||
|
||
* If you're running the `speechrest.py` sample: | ||
|
||
```sh | ||
$ pip install requirements-speechrest.txt | ||
``` | ||
|
||
* If you're running the `speech_streaming.py` sample: | ||
|
||
```sh | ||
$ pip install requirements-speech_streaming.txt | ||
``` | ||
|
||
## Run the example | ||
|
||
```sh | ||
$ python speechrest.py resources/audio.raw | ||
``` | ||
* To run the `speechrest.py` sample: | ||
|
||
```sh | ||
$ python speechrest.py resources/audio.raw | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: speechrest has no underscore, but speach_streaming does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah - I was considering adding an underscore to speechrest, but got lazy 'cuz then I'd have to change the references to it -_- Will update in a later PR |
||
``` | ||
|
||
You should see a response with the transcription result. | ||
|
||
* To run the `speech_streaming.py` sample: | ||
|
||
```sh | ||
$ python speech_streaming.py | ||
``` | ||
|
||
You should see a response with the transcription result. | ||
The sample will run in a continuous loop, printing the data and metadata | ||
it receives from the Speech API, which includes alternative transcriptions | ||
of what it hears, and a confidence score. Say "exit" to exit the loop. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
gcloud==0.12.0 | ||
grpcio==0.13.1 | ||
PyAudio==0.2.9 | ||
grpc-google-cloud-speech==1.0.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
google-api-python-client==1.5.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
#!/usr/bin/python | ||
|
||
import contextlib | ||
import threading | ||
|
||
from gcloud.credentials import get_credentials | ||
from google.cloud.speech.v1.cloud_speech_pb2 import * # noqa | ||
from google.rpc import code_pb2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why the weird newlines? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I blame flake8! Will fix. |
||
from grpc.beta import implementations | ||
import pyaudio | ||
|
||
# Audio recording parameters | ||
RATE = 16000 | ||
CHANNELS = 1 | ||
CHUNK = RATE // 10 # 100ms | ||
|
||
# Keep the request alive for this many seconds | ||
DEADLINE_SECS = 8 * 60 * 60 | ||
SPEECH_SCOPE = 'https://www.googleapis.com/auth/cloud-platform' | ||
|
||
|
||
def make_channel(host, port): | ||
"""Creates an SSL channel with auth credentials from the environment.""" | ||
# In order to make an https call, use an ssl channel with defaults | ||
ssl_channel = implementations.ssl_channel_credentials(None, None, None) | ||
|
||
# Grab application default credentials from the environment | ||
creds = get_credentials().create_scoped([SPEECH_SCOPE]) | ||
# Add a plugin to inject the creds into the header | ||
auth_header = ( | ||
'Authorization', | ||
'Bearer ' + creds.get_access_token().access_token) | ||
auth_plugin = implementations.metadata_call_credentials( | ||
lambda _, cb: cb([auth_header], None), | ||
name='google_creds') | ||
|
||
# compose the two together for both ssl and google auth | ||
composite_channel = implementations.composite_channel_credentials( | ||
ssl_channel, auth_plugin) | ||
|
||
return implementations.secure_channel(host, port, composite_channel) | ||
|
||
|
||
@contextlib.contextmanager | ||
def record_audio(channels, rate, chunk): | ||
"""Opens a recording stream in a context manager.""" | ||
audio_interface = pyaudio.PyAudio() | ||
audio_stream = audio_interface.open( | ||
format=pyaudio.paInt16, channels=channels, rate=rate, | ||
input=True, frames_per_buffer=chunk, | ||
) | ||
|
||
yield audio_stream | ||
|
||
audio_stream.stop_stream() | ||
audio_stream.close() | ||
audio_interface.terminate() | ||
|
||
|
||
def request_stream(stop_audio, channels=CHANNELS, rate=RATE, chunk=CHUNK): | ||
"""Yields `RecognizeRequest`s constructed from a recording audio stream. | ||
|
||
Args: | ||
stop_audio: A threading.Event object stops the recording when set. | ||
channels: How many audio channels to record. | ||
rate: The sampling rate. | ||
chunk: Buffer audio into chunks of this size before sending to the api. | ||
""" | ||
with record_audio(channels, rate, chunk) as audio_stream: | ||
# The initial request must contain metadata about the stream, so the | ||
# server knows how to interpret it. | ||
metadata = InitialRecognizeRequest( | ||
encoding='LINEAR16', sample_rate=rate) | ||
audio_request = AudioRequest(content=audio_stream.read(chunk)) | ||
|
||
yield RecognizeRequest( | ||
initial_request=metadata, | ||
audio_request=audio_request) | ||
|
||
while not stop_audio.is_set(): | ||
# Subsequent requests can all just have the content | ||
audio_request = AudioRequest(content=audio_stream.read(chunk)) | ||
|
||
yield RecognizeRequest(audio_request=audio_request) | ||
|
||
|
||
def listen_print_loop(recognize_stream): | ||
for resp in recognize_stream: | ||
if resp.error.code != code_pb2.OK: | ||
raise RuntimeError('Server error: ' + resp.error.message) | ||
|
||
# Display the transcriptions & their alternatives | ||
for result in resp.results: | ||
print(result.alternatives) | ||
|
||
# Exit recognition if any of the transcribed phrases could be | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice. |
||
# one of our keywords. | ||
if any(alt.confidence > .5 and | ||
(alt.transcript.strip() in ('exit', 'quit')) | ||
for result in resp.results | ||
for alt in result.alternatives): | ||
print('Exiting..') | ||
return | ||
|
||
|
||
def main(): | ||
stop_audio = threading.Event() | ||
with beta_create_Speech_stub( | ||
make_channel('speech.googleapis.com', 443)) as service: | ||
try: | ||
listen_print_loop( | ||
service.Recognize(request_stream(stop_audio), DEADLINE_SECS)) | ||
finally: | ||
# Stop the request stream once we're done with the loop - otherwise | ||
# it'll keep going in the thread that the grpc lib makes for it.. | ||
stop_audio.set() | ||
|
||
|
||
if __name__ == '__main__': | ||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Copyright 2016, Google, Inc. | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import contextlib | ||
import io | ||
import re | ||
import sys | ||
|
||
import pytest | ||
|
||
import speech_streaming | ||
|
||
|
||
class MockAudioStream(object): | ||
def __init__(self, audio_filename, trailing_silence_secs=10): | ||
self.audio_filename = audio_filename | ||
self.silence = io.BytesIO('\0\0' * speech_streaming.RATE * | ||
trailing_silence_secs) | ||
|
||
def __enter__(self): | ||
self.audio_file = open(self.audio_filename) | ||
return self | ||
|
||
def __exit__(self, *args): | ||
self.audio_file.close() | ||
|
||
def __call__(self, *args): | ||
return self | ||
|
||
def read(self, num_frames): | ||
# audio is 16-bit samples, whereas python byte is 8-bit | ||
num_bytes = 2 * num_frames | ||
chunk = self.audio_file.read(num_bytes) or self.silence.read(num_bytes) | ||
return chunk | ||
|
||
|
||
def mock_audio_stream(filename): | ||
@contextlib.contextmanager | ||
def mock_audio_stream(channels, rate, chunk): | ||
with open(filename, 'rb') as audio_file: | ||
yield audio_file | ||
|
||
return mock_audio_stream | ||
|
||
|
||
@pytest.mark.skipif( | ||
sys.version_info >= (3, 0), reason="can't get grpc lib to work in python3") | ||
def test_main(resource, monkeypatch, capsys): | ||
monkeypatch.setattr( | ||
speech_streaming, 'record_audio', | ||
mock_audio_stream(resource('quit.raw'))) | ||
monkeypatch.setattr(speech_streaming, 'DEADLINE_SECS', 5) | ||
|
||
speech_streaming.main() | ||
out, err = capsys.readouterr() | ||
|
||
assert re.search(r'transcript.*"quit"', out, re.DOTALL | re.I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could cause some weirdness with clashing dependency versions. I'm okay with this for now, but we should be careful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I think we should move to a world where, instead of having a global requirements-py[23].txt at the top level, each sample module would have its own smaller requirements.txt, and the global one would be either empty, or only include the stuff needed for pytest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean - the virtualenvs are split by session, not by subdir.
Hrm... that's a puzzle... perhaps we could have a separate virtualenv for each subdir that contains a requirements.txt?
Pros:
Cons:
The con might be mitigated if we could run the tests in parallel..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that originally, and it's dreadfully slow. :/