[EDGE CASE] Cartoon voice worse performance on v5 than older version #563

George0828Zhang · 2024-10-26T05:00:22Z

🐛 Bug

V5 ignores cartoon voices.

To Reproduce

Steps to reproduce the behavior:

Using colab example
Download this example and run until this cell (change 'en_example.wav' to 'ja_example.wav'):

wav = read_audio('ja_example.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE)
pprint(speech_timestamps)

The result is:

[{'end': 30464, 'start': 12032}]

while if old version is used (see SYSTRAN/faster-whisper#934 (comment)), the result is

[{'end': 40192, 'start': 12032},
 {'end': 179456, 'start': 76544},
 {'end': 379136, 'start': 273152},
 {'end': 457984, 'start': 422656},
 {'end': 630016, 'start': 576256},
 {'end': 669952, 'start': 653056},
 {'end': 863488, 'start': 695040},
 {'end': 950528, 'start': 896768}]

Expected behavior

V5 should be better than older version.

Environment

Please copy and paste the output from this
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch Version (e.g., 1.0):
OS (e.g., Linux):
How you installed PyTorch (conda, pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

snakers4 · 2024-10-26T05:03:46Z

Can you send an audio sample?

George0828Zhang · 2024-10-26T05:13:25Z

Here:
https://drive.google.com/file/d/1NPvEybP0VU1dFmd6neH6JJRW_Qm2MXdk/view

thanks for looking into this!

George0828Zhang added the bug Something isn't working label Oct 26, 2024

George0828Zhang assigned snakers4 Oct 26, 2024

George0828Zhang changed the title ~~Bug report - Cartoon voice bad performance on v5, normal on v3~~ Bug report - Cartoon voice worse performance on v5 than older version Oct 26, 2024

snakers4 changed the title ~~Bug report - Cartoon voice worse performance on v5 than older version~~ [EDGE CASE] Cartoon voice worse performance on v5 than older version Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EDGE CASE] Cartoon voice worse performance on v5 than older version #563

[EDGE CASE] Cartoon voice worse performance on v5 than older version #563

George0828Zhang commented Oct 26, 2024

snakers4 commented Oct 26, 2024

George0828Zhang commented Oct 26, 2024

[EDGE CASE] Cartoon voice worse performance on v5 than older version #563

[EDGE CASE] Cartoon voice worse performance on v5 than older version #563

Comments

George0828Zhang commented Oct 26, 2024

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

snakers4 commented Oct 26, 2024

George0828Zhang commented Oct 26, 2024