WhisperX return translated output instead of normal transcription #849

BankNatchapol · 2024-08-05T10:40:27Z

I tried to use fine-tuned model with whisperx, so i first convert the model using this code.

import ctranslate2
from transformers import AutoTokenizer, AutoProcessor

# converting model to CTranslate2

model_path = "biodatlab/whisper-th-large-v3"
output_dir = ""

converter = ctranslate2.converters.TransformersConverter(
    model_name_or_path=model_path,
    load_as_float16=None
)

converter.convert(output_dir=output_dir, quantization="float16", force=True)
print(f"Model successfully converted to CTranslate2 format at {output_dir}")

then run transcribe

import whisperx
lang = "th"
device = 'cuda'

## WhisperX
batch_size_x = 8 # reduce if low on GPU mem
compute_type_x = "float16"

asr_options = {
      "max_new_tokens": None,
      "clip_timestamps": None,
      "hallucination_silence_threshold": None,

}
model_x = whisperx.load_model("", 
'cuda', 
compute_type=compute_type_x, 
language='th', asr_options=asr_options)

print(model_x.transcribe('test.wav', language='th', task='transcribe'))

The output should be in 'th' language, instead the output is mostly 'en'.

..., {'text': ' When you try, you will find a gap in between. When you miss, you will sit on the hard floor. It makes you want to get up and fight. But when you fight and you start to get something, you will find a gap. This gap is a trap. Some people are good at it, but they miss it. Some people are good at it, but they miss it. Some people are good at it, but they miss it. Some people are good at it, but they miss it. Some people are good at it, but they miss it.', 'start': 1120.026, 'end': 1140.435}, {'text': '       ', 'start': 1140.435, 'end': 1159.053}, {'text': '          ', 'start': 1159.053, 'end': 1185.282}, {'text': ' The best in Thailand, the first in the Olympic life, has been reading books all the time. He has won the Olympic gold medal.', 'start': 1185.282, 'end': 1203.08}], 'language': 'th'}

In the wav file, he's speaking in 'th', but somehow the transcription is the translation of his speech.

Anything to fix this?
Thank you in advance.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WhisperX return translated output instead of normal transcription #849

WhisperX return translated output instead of normal transcription #849

BankNatchapol commented Aug 5, 2024

WhisperX return translated output instead of normal transcription #849

WhisperX return translated output instead of normal transcription #849

Comments

BankNatchapol commented Aug 5, 2024