Disable all `threshold` parameters for stable transciption when using `vad_filter=True`. #349

zh-plus · 2023-07-12T10:21:53Z

zh-plus
Jul 12, 2023

Recently I found that large-v2 model usually drops some transcribed segments for Chinese and Japanese because of compression_ratio_threshold, log_prob_threshold, and no_speech_threshold. However, the outputted transcription is actually correct.

Is it a good practice to totally disable these 3 parameters by:

default_asr_options = {
  "compression_ratio_threshold": None,
  "log_prob_threshold": None,
  "no_speech_threshold": None,
}

when setting vad_filter=True? Perhaps it would be more appropriate for a standalone VAD model to determine which voice segments should be disregarded.

The reliability of log_prob_threshold is often questioned, as mentioned in this discussion on openai/whisper#29 (comment).

dgoryeo · 2023-08-29T16:41:07Z

dgoryeo
Aug 29, 2023

Hi @zh-plus , I was wondering, does the code above disables the threshold parameters, or does it set them to default? Thanks for sharing your insights.

1 reply

zh-plus Aug 30, 2023
Author

It disables the threshold according to https://github.com/guillaumekln/faster-whisper/blob/7b271da0351e4f81f80e8bb4d2c21c9406475aa9/faster_whisper/transcribe.py#L658

dgoryeo · 2023-10-20T11:12:32Z

dgoryeo
Oct 20, 2023

Hi @zh-plus , I get an error when i try to set log_prob_threshold to None. Is there anything I am doing wrong?


    options2 = {"language":language],
                "task":task,
                "word_timestamps":True,
                "temperature":(0.0, 0.2),
                "best_of":5,
                "beam_size":5,
                "patience":2,
                "suppress_tokens":"",
                "initial_prompt":None,
                "condition_on_previous_text":False,
                "compression_ratio_threshold":None,
                "no_speech_threshold":None,
                "log_prob_threshold":None,
                "vad_filter":True,
                "vad_parameters":dict(threshold=vad_threshold, max_speech_duration_s=chunk_duration)
                }

    segments, info = WHISPER_MODEL.transcribe(
      audio_path,
      **options2
    )

Error message is:

TypeError: generate(): incompatible function arguments.

Thanks.

1 reply

zh-plus Oct 23, 2023
Author

I don't see any issues with the code snippet and error message you posted. Could you please provide the full exception stack?

dgoryeo · 2023-10-23T08:47:35Z

dgoryeo
Oct 23, 2023

Thanks for looking into it @zh-plus. I didnt save the entire error message I'm afraid. However I turned to GPT4, and "his" recommendation was to change to "suppress_tokens":[] . Syntactically it went through with no error but I'm not sure if [] would perform the same as "". It seems that the culprit was supress_tokens than log_prob.

1 reply

zh-plus Oct 24, 2023
Author

Yes, it's the same. You can check https://github.com/guillaumekln/faster-whisper/blob/7b271da0351e4f81f80e8bb4d2c21c9406475aa9/faster_whisper/transcribe.py#L60C12-L60C12 for all default values.

toanhuynhnguyen · 2024-10-07T11:08:41Z

toanhuynhnguyen
Oct 7, 2024

Does it actually work?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable all `threshold` parameters for stable transciption when using `vad_filter=True`. #349

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Disable all threshold parameters for stable transciption when using vad_filter=True. #349

zh-plus Jul 12, 2023

Replies: 4 comments · 3 replies

dgoryeo Aug 29, 2023

zh-plus Aug 30, 2023 Author

dgoryeo Oct 20, 2023

zh-plus Oct 23, 2023 Author

dgoryeo Oct 23, 2023

zh-plus Oct 24, 2023 Author

toanhuynhnguyen Oct 7, 2024

Disable all `threshold` parameters for stable transciption when using `vad_filter=True`. #349

zh-plus
Jul 12, 2023

Replies: 4 comments 3 replies

dgoryeo
Aug 29, 2023

zh-plus Aug 30, 2023
Author

dgoryeo
Oct 20, 2023

zh-plus Oct 23, 2023
Author

dgoryeo
Oct 23, 2023

zh-plus Oct 24, 2023
Author

toanhuynhnguyen
Oct 7, 2024