Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EDGE CASE] Cartoon voice worse performance on v5 than older version #563

Open
George0828Zhang opened this issue Oct 26, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@George0828Zhang
Copy link

🐛 Bug

V5 ignores cartoon voices.

To Reproduce

Steps to reproduce the behavior:

  1. Using colab example
  2. Download this example and run until this cell (change 'en_example.wav' to 'ja_example.wav'):
wav = read_audio('ja_example.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE)
pprint(speech_timestamps)
  1. The result is:
[{'end': 30464, 'start': 12032}]

while if old version is used (see SYSTRAN/faster-whisper#934 (comment)), the result is

[{'end': 40192, 'start': 12032},
 {'end': 179456, 'start': 76544},
 {'end': 379136, 'start': 273152},
 {'end': 457984, 'start': 422656},
 {'end': 630016, 'start': 576256},
 {'end': 669952, 'start': 653056},
 {'end': 863488, 'start': 695040},
 {'end': 950528, 'start': 896768}]

Expected behavior

V5 should be better than older version.

Environment

Please copy and paste the output from this
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@George0828Zhang George0828Zhang added the bug Something isn't working label Oct 26, 2024
@George0828Zhang George0828Zhang changed the title Bug report - Cartoon voice bad performance on v5, normal on v3 Bug report - Cartoon voice worse performance on v5 than older version Oct 26, 2024
@snakers4
Copy link
Owner

Can you send an audio sample?

@George0828Zhang
Copy link
Author

Here:
https://drive.google.com/file/d/1NPvEybP0VU1dFmd6neH6JJRW_Qm2MXdk/view

thanks for looking into this!

@snakers4 snakers4 changed the title Bug report - Cartoon voice worse performance on v5 than older version [EDGE CASE] Cartoon voice worse performance on v5 than older version Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants