Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WhisperX return translated output instead of normal transcription #849

Open
BankNatchapol opened this issue Aug 5, 2024 · 0 comments
Open

Comments

@BankNatchapol
Copy link

I tried to use fine-tuned model with whisperx, so i first convert the model using this code.

import ctranslate2
from transformers import AutoTokenizer, AutoProcessor

# converting model to CTranslate2

model_path = "biodatlab/whisper-th-large-v3"
output_dir = ""

converter = ctranslate2.converters.TransformersConverter(
    model_name_or_path=model_path,
    load_as_float16=None
)

converter.convert(output_dir=output_dir, quantization="float16", force=True)
print(f"Model successfully converted to CTranslate2 format at {output_dir}")

then run transcribe

import whisperx
lang = "th"
device = 'cuda'

## WhisperX
batch_size_x = 8 # reduce if low on GPU mem
compute_type_x = "float16"

asr_options = {
      "max_new_tokens": None,
      "clip_timestamps": None,
      "hallucination_silence_threshold": None,

}
model_x = whisperx.load_model("", 
'cuda', 
compute_type=compute_type_x, 
language='th', asr_options=asr_options)

print(model_x.transcribe('test.wav', language='th', task='transcribe'))

The output should be in 'th' language, instead the output is mostly 'en'.

..., {'text': ' When you try, you will find a gap in between. When you miss, you will sit on the hard floor. It makes you want to get up and fight. But when you fight and you start to get something, you will find a gap. This gap is a trap. Some people are good at it, but they miss it. Some people are good at it, but they miss it. Some people are good at it, but they miss it. Some people are good at it, but they miss it. Some people are good at it, but they miss it.', 'start': 1120.026, 'end': 1140.435}, {'text': '       ', 'start': 1140.435, 'end': 1159.053}, {'text': '          ', 'start': 1159.053, 'end': 1185.282}, {'text': ' The best in Thailand, the first in the Olympic life, has been reading books all the time. He has won the Olympic gold medal.', 'start': 1185.282, 'end': 1203.08}], 'language': 'th'}

In the wav file, he's speaking in 'th', but somehow the transcription is the translation of his speech.

Anything to fix this?
Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant