Add ptime cmdline arg #357

piotrgregor · 2024-07-09T10:09:26Z

By now, example code sends audio in 64k chunks every second.
However, in a real time audio processing scenarios audio is read at different intervals, e.g. 20 ms in VoIP. As a user I would like to use code example to see/experiment with a speech to text feature working similarly as it is going to be integrated with my real time audio processing (particular sampling rate and ptime).

To provide additional context, I work on a text / speech processing in VoIP where packetization time interval is dictated by packetization time setting (ptime). Most often this is set to 20 ms, therefore audio is processed in 20 ms packets on an audio call. The example code speech/api/streaming_transcribe.cc on GoogleCloudPlatform sends audio in a fixed 1 second intervals. I need to know if speech to text code example will work when I send packets as they come in on my infrastructure with different ptime and packet size or if I need to implement buffering to send them exactly in 1 second 64k chunks as example does. It's understood that speech to text result is mostly driven by accuracy of underlying speech to text method/solution (model/AI) being applied to speech and ideally it is not impacted by audio packetization, but as a code integrator I need to verify my custom case and that would be great if code example let me to mirror as closely as possible audio processing in my environment.

This PR is adding a support for ptime command line argument, so user can experiment with real time audio at various settings. Now, when ptime is set on file in RAW or ULAW encoding, packets are sent in size and with time interval reflecting a ptime and sampling rate (I did not apply that to AMR, FLAC and AMR-WB as number of bytes to send using those codecs per ptime is impacted by additional settings [encoding mode in case of AMR/AMR-WB and compression ratio for FLAC])

% .build/streaming_transcribe --help       

Standard C++ exception thrown: the option '--path' is required but missing
Usage:
  streaming_transcribe [--bitrate N] [--ptime N] audio.(raw|ulaw|flac|amr|awb)

Example 1. Using ptime 20 ms:

% .build/streaming_transcribe --bitrate 16000 --ptime 20 resources/audio2.raw

Sending 640 bytes.
Sending 640 bytes.
Sending 640 bytes.
(...)
Sending 640 bytes.
Sending 640 bytes.
Sending 640 bytes.

Result stability: 0
0.986006        the rain in Spain stays mainly on the plain

Example 2. Using ptime 200 ms:

.build/streaming_transcribe --bitrate 16000 --ptime 200 resources/audio2.raw

Sending 6400 bytes.
Sending 6400 bytes.
Sending 6400 bytes.
(...)
Sending 6400 bytes.
Sending 6400 bytes.
Sending 6400 bytes.

Result stability: 0
0.986006        the rain in Spain stays mainly on the plain

coryan · 2024-07-09T12:45:52Z

/gcbrun

… -0 clang-format -i

dbolduc · 2024-07-09T13:41:22Z

/gcbrun

dbolduc · 2024-07-09T17:54:59Z

/gcbrun

piotrgregor added 2 commits July 9, 2024 09:37

Add bitrate and ptime args

accb9a3

Handle ptime for raw and ulaw

4a902fc

piotrgregor requested a review from a team as a code owner July 9, 2024 10:09

Format with git ls-files -z | grep -zE '\.(cc|h)$' | xargs -P 2 -n 50…

87b72df

… -0 clang-format -i

Make bitrate and ptime flexible on mulaw as well

43ed415

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ptime cmdline arg #357

Add ptime cmdline arg #357

piotrgregor commented Jul 9, 2024 •

edited

Loading

coryan commented Jul 9, 2024

dbolduc commented Jul 9, 2024

dbolduc commented Jul 9, 2024

Add ptime cmdline arg #357

Are you sure you want to change the base?

Add ptime cmdline arg #357

Conversation

piotrgregor commented Jul 9, 2024 • edited Loading

coryan commented Jul 9, 2024

dbolduc commented Jul 9, 2024

dbolduc commented Jul 9, 2024

piotrgregor commented Jul 9, 2024 •

edited

Loading