Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By now, example code sends audio in 64k chunks every second.
However, in a real time audio processing scenarios audio is read at different intervals, e.g. 20 ms in VoIP. As a user I would like to use code example to see/experiment with a speech to text feature working similarly as it is going to be integrated with my real time audio processing (particular sampling rate and ptime).
To provide additional context, I work on a text / speech processing in VoIP where packetization time interval is dictated by packetization time setting (ptime). Most often this is set to 20 ms, therefore audio is processed in 20 ms packets on an audio call. The example code
speech/api/streaming_transcribe.cc
on GoogleCloudPlatform sends audio in a fixed 1 second intervals. I need to know if speech to text code example will work when I send packets as they come in on my infrastructure with different ptime and packet size or if I need to implement buffering to send them exactly in 1 second 64k chunks as example does. It's understood that speech to text result is mostly driven by accuracy of underlying speech to text method/solution (model/AI) being applied to speech and ideally it is not impacted by audio packetization, but as a code integrator I need to verify my custom case and that would be great if code example let me to mirror as closely as possible audio processing in my environment.This PR is adding a support for ptime command line argument, so user can experiment with real time audio at various settings. Now, when ptime is set on file in RAW or ULAW encoding, packets are sent in size and with time interval reflecting a ptime and sampling rate (I did not apply that to AMR, FLAC and AMR-WB as number of bytes to send using those codecs per ptime is impacted by additional settings [encoding mode in case of AMR/AMR-WB and compression ratio for FLAC])
Example 1. Using ptime 20 ms:
Example 2. Using ptime 200 ms: