Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe #624

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

0xzre
Copy link
Contributor

@0xzre 0xzre commented Aug 19, 2024

#518
The encoding in Friend firmware code v1.0 shows that it's using frame size of 160 (10ms). I have not tested on Friend cause I don't have the device.
Changing the real-time resolution to standard to 20ms, should theoretically reduce server load.
Thank you!

@mdmohsin7 mdmohsin7 self-requested a review August 19, 2024 15:58
Copy link
Collaborator

@mdmohsin7 mdmohsin7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still doesn't work, there's no transcript.
Also there's this warning and I am not sure if it is something to be worried about?

backend/routers/transcribe.py:102: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:1530.)
  samples = torch.frombuffer(decoded_opus, dtype=torch.int16).float() / 32768.0

@0xzre
Copy link
Contributor Author

0xzre commented Aug 20, 2024

It still doesn't work, there's no transcript. Also there's this warning and I am not sure if it is something to be worried about?

backend/routers/transcribe.py:102: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:1530.)
  samples = torch.frombuffer(decoded_opus, dtype=torch.int16).float() / 32768.0

That is the error that is related how I should handle the buffer in Opus, and I'll solve that soon.

@0xzre 0xzre changed the title reintroduce opus on VAD, change frame size according to firmware v1.0 reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe Aug 20, 2024
@mdmohsin7 mdmohsin7 self-requested a review August 20, 2024 17:28
Copy link
Collaborator

@mdmohsin7 mdmohsin7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It does transcribes, but the problem is it misses a lot of segments (way more than pcm with vad).
  • The websocket disconnects more frequently
  • Also the transcription is quite slow for both pcm and opus

@0xzre
Copy link
Contributor Author

0xzre commented Aug 21, 2024

Sounds like server get heavier.
Miss more transcribe & slower -> Incoming bytes take long time to process on VAD, increasing delay to DG.
Websockets more dc -> Ping/pong doesn't get though or processed on time, because high cpu usage on VAD

My solution :

  • Use onnx runtime for VAD
  • Decrease window for VAD, 4x lesser now

Any feedback or opinion is appreciated. Thanks!

@0xzre
Copy link
Contributor Author

0xzre commented Aug 22, 2024

Changes

  • More handling on socket2 data, which is always used when Opus (Friend mic, not phone) is involved. It target to solve socket disconnected err, while keeping the PCM still working.
    Any feedback is welcomed, thank you :)

@0xzre
Copy link
Contributor Author

0xzre commented Aug 25, 2024

@josancamon19 @mdmohsin7 Already merged with main branch, giving better result on case of using speech profile. Please review, thanks!

@josancamon19
Copy link
Contributor

https://share.icloud.com/photos/06dFrjm9Q_RrsvZO5VLScWGLg

Clearly doesn't work, for next review, please submit videos of it working through the app

@0xzre
Copy link
Contributor Author

0xzre commented Aug 31, 2024

@0xzre
Copy link
Contributor Author

0xzre commented Sep 1, 2024

I have added more testing, which now is for a lecture video (more convertation alike situation) in "test 1" folder. also provided the pcm transcribe from playstore app (no VAD) for the ground truth. The result is, the latency is indistinguishable, accuracy very improved. VAD opus usable

@0xzre
Copy link
Contributor Author

0xzre commented Sep 13, 2024

dude @josancamon19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants