-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distil Whisper models: Missing words and repetitions in transcription #59
Comments
I've tested the distil-small model. It works without issues for me. Granted, I mostly use the "hold to record" mode, and therefore dictate one sentence at a time. I tried dictating a couple of sentences, and still didn't notice any issues. However, it feels weird not seeing what you say for a long time. Anyway, can you suggest a phrase that often fails to transcribe properly for you?
NOTE: I'm using a heavily edited fork. So, consider my observations as related to the underlying libraries rather than WhisperWriter. You can try my fork, if you feel like it. https://github.com/dariox1337/whisper-writer To use distil models with this fork you can simply download a faster-distil-whisper model from HF, and set the folder in "model path". |
@dariox1337 I have identified what is responsible for the decreased quality of the distill whisper models. It seems that the distil whisper models are more susceptible to issues in the original audio. On my machine, which is running on Linux, the audio file that is produced by the library sounddevice is running faster than real time, skipping and having some flapping noises on top. I replaced sounddevice with pyaudio and after I did this the quality of distill whisper is just what you would expect. No issues. |
@dariox1337 Actually, it seems that this behavior is only happening in the fork you're having and suggesting to merge in #61 . Why the distill models don't work for me in state of the software now remains unclear. |
@go-run-jump as I said in the PR, the faster than real time audio might be because the sample rate isn't set correctly somewhere (I don't know where). Skipping and crackling is a mystery for me. I couldn't reproduce either of the issues. Anyway, even though SoundDevice worked without issues for me, I rewrote the audio recording code to use PyAudio as well since it was already used for "beep on completion." The code is in the main branch of my fork. |
I've been experimenting with the distil whisper models as an alternative to the standard whisper models. While I was able to successfully integrate the distil models, I'm experiencing some issues with the transcription quality.
Current Behavior:
Expected Behavior:
Additional Information:
config_schema.yaml
:Questions:
Any input or suggestions would be greatly appreciated, as the speed improvements of the distil models are significant.
Environment:
Steps to Reproduce:
config_schema.yaml
as mentioned aboveThe text was updated successfully, but these errors were encountered: