Thoughts on the new release with batched inference and some refactoring #927

ozancaglayan · 2024-07-23T10:41:59Z

ozancaglayan
Jul 23, 2024

Hello folks!

Appreciate the community work done in this repository for the past months. I have some comments and remarks on the 1.0.3 release. I think this release made far too major changes in this package such as switching the audio loading from av to torchaudio, replacing numpy based feature extraction with torch etc. I think these changes could have been optionally triggered by using some extras dependencies and so on. These parts of the codes were mature enough and widely used by the community. I've been internally using a slightly-improved numpy-based feature extraction that I had borrowed from the huggingface/whisper repos for more than a year and its fast enough as it's not the main bottleneck when doing the ASR. For loading the audio files, torchaudio and its resampling facilities may not be the fastest out there.

So I recommend sticking to the previous versions and also having the batched functionalities in a new file e.g. batched_transcribe.py as the transcribe.py is now over 2K lines of code and is very hard to navigate.

benniekiss · 2024-07-23T11:47:38Z

benniekiss
Jul 23, 2024

I agree with these sentiments.

A benefit of relying on av was that it did not require a separate version of ffmpeg installed on the system, which simplifies a lot of deployment headaches, especially in containerized pipelines. Likewise, torchaudio still depends on ffmpeg if someone is trying to process media beyond what soundfile can support, i.e., MP4. But this requires a separately installed ffmpeg on the system.

Relying on torch also makes container deployment way more costly. Building a container with faster-whisper==1.0.3, using ghcr.io/mamba-org/micromamba:jammy-cuda-12.4.1 and podman, results in an image that is 1.14GB, but building one off of master results in an image that is 6.54 GB. This makes it a lot harder to deploy on constrained infrastructure and increases build times and upload/download times significantly.

2 replies

MahmoudAshraf97 Jul 23, 2024
Maintainer

the solution for the image size is to use the cpu version of torch, gpu version of torch is the default and it installs all the CUDA dependencies again even if they exist which is redundant.
As for PyAV, it was removed because of PyAV-Org/PyAV#1429, it hasn't been fully solved yet, and we thought that if we already rely on torch, then torchaudio won't add much to the requirements

benniekiss Jul 23, 2024

Thanks for the tip about the cpu-only wheels, I'll look into it

ozancaglayan · 2024-07-23T13:32:04Z

ozancaglayan
Jul 23, 2024
Author

Tagging in some contributors for vis (@jordimas @trungkienbkhn @Jiltseb )

0 replies

MahmoudAshraf97 · 2024-07-23T22:05:29Z

MahmoudAshraf97
Jul 23, 2024
Maintainer

You do have a point, but that's two completely different directions:

maximum speed for those who can afford it (resource wise)
as much speed as possible with least requirements as possible

and to maintain these two directions, we will implement most functions twice, a version with torch and a version without, same thing with av and torchaudio.
IMO, a compromise has to be made, having a lot of duplicated code, requires harder maintenance
happy to hear more opinions

btw, some of the extra code will be removed once #921 is merged, and I'm thinking about more ideas to simplify the codebase

0 replies

jordimas · 2024-07-24T09:34:53Z

jordimas
Jul 24, 2024

Hi @ozancaglayan In my view " I've been internally using a slightly-improved numpy-based feature extraction"
If these are general improvements, the best way to reduce the maintenance burden is to contribute these improvements to faster-whisper and they become part of the code base maintained by the community.

0 replies

ooobo · 2024-07-25T05:29:45Z

ooobo
Jul 25, 2024

I have some comments and remarks on the 1.0.3 release.

to clarify 1.0.3 doesn't have the batch changes, it's just the current main branch.

I'm a bit torn as I'm mostly using faster-whisper CPU only - it still outperforms others on last test. But I appreciate most aren't, and want to pursue faster options with GPUs. Not sure what the right compromise is, but as you note, the code is pretty mature at this point so I'm not too concerned that I'll need to stay on 1.0.3 or fork.
If it does make next version, at the least I think it's worth noting in readme for users seeking low-end CPU-only to stick with pre-torch ≤1.0.3.

0 replies

ozancaglayan · 2024-07-25T10:48:17Z

ozancaglayan
Jul 25, 2024
Author

Hi all, thanks for the feedbacks.

@ooobo sorry, I meant upcoming 1.x release. Effectively, merging those changes to main entails an upcoming release.

On my side, I'm not super happy with the torch dependency as the whole thing about CTranslate2 backend and this package was to rely on a C++ backend for efficiency and avoid pulling in more dependencies. I'm using this package for CTranslate2 + GPU inference where there's also no need for torch dependency.

@jordimas You're right that I should have contributed those changes back to this repo but that comment from myself was a bit irrelevant to this discussion. My point there was that the feature extraction bits in this repo were a bit slower than ideal but the issue there was not the use of numpy.

I'm also curious about the future and the plans for this repository from SYSTRAN's point of view. Although it's now under the umbrella of Systran, does it mean that they reserve active resources for maintenance of this repo or would it still be mostly community maintained?

0 replies

ozancaglayan · 2024-07-29T10:02:27Z

ozancaglayan
Jul 29, 2024
Author

Given that master is still heavily changing through dense PRs, can we at least make sure that we branch out the latest commit before the torch introduction PR/commit, to a v1 branch where people could continue contributions and potentially do some old-type faster-whisper releases from that branch?

1 reply

MahmoudAshraf97 Jul 29, 2024
Maintainer

we are removing the components that rely on torch one by one, after #936 is merged, the only thing using torch will be the feature extraction, so if you could share your implementation it'd be great as torch brings more than 8x speed up in that part on CPU and even more on GPU

jordimas · 2024-07-29T12:03:13Z

jordimas
Jul 29, 2024

Hello. I have been on holidays and I could not look at this on detail, but my feedback.
I have observed a regression in WER on the new version #856 using the sequential pipeline vs 1.0.3 based on our internal benchmarks.

My suggested approach:

Go back to revision fbcf58b, before the PR New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements #856 was merged
Contribute BatchedInferencePipeline and multi-segment language detection using the current stack in 1.0.3, without replacing any existing libraries or introducing a new ones. Also with the minimal code changes possible. These are great contributions and it will be great to have them.

With more time, changes on the stack can be discussed but I agree that this is a very large change.

1 reply

Jiltseb Jul 29, 2024

Since the main reason for your suggestions was a regression in WER, could you please share it here?
I have opened the PR after looking at both versions and the impact on WER (#856 (comment)).

The original PR did not have any change in the sequential version (only an optional flag to use Kaldi-based FE). It is still possible to keep all the same dependencies as 1.0.3 if we switch back from torch-based FE (which was contributing to speed up) to the original feature extractor.

ozancaglayan · 2024-07-30T09:10:46Z

ozancaglayan
Jul 30, 2024
Author

Can we please pull all the relevant and concerned people here and discuss the future of this toolkit? I can see that there are/were some harsh comments & discussions through some PRs regarding the latest merged large commits.

Can we please define a set of contributors which we think are meaningful as reviewers, change the repo settings, set up a CODEOWNERS file so that all PRs get reviewed properly before being merged?

Although I strongly disagree with the tone and rudeness of the comments & PRs wrote by the OP in the following PR, I believe the way to go is to revert all the commits in master as suggested in #940

We shouldn't usually merge such large API and/or behaviour changing updates to a repository with 10.6K stars and a large user base and then iteratively work through them to change them further. Things should be cooked through PRs & branches and only merged when we believe they are mature enough.

For the PRs at hand, there's for example the CTranslate2's random seed setting thing which is completely irrelevant to the other changes. That one should have been merged individually and also without always setting it but by adding an argument to the class to optionally enable it to preserve API consistency. Also, it's not clear why we added another huge dependency chain through pyannote for VAD. Is that better than Silero? Is it measured? etc. etc. When I install pyannote, it installs more than 50 packages to my env and I'm not even using it. So on and so forth.

I can see that the majority of the people are upset and concerned with this large PR getting merged and we still have time to fix it and iteratively merge things by small and atomic chunks by getting community review.

Thanks

3 replies

MahmoudAshraf97 Jul 30, 2024
Maintainer

I suggest:

Forking a new branch from master right before the PR merge and making that the new master by renaming branches and renaming the current master to anything else, this is to avoid commit pollution caused by merging and reverting
Modifying the sequential model helper functions to accept batched inputs (mainly timestamps related function), up till now the behavior should be exactly identical, but will help reduce the diff in the later PRs
Modify the VAD functions so that they can be used in both the sequential and batched inference, minimal or no WER changes should be experienced.
This should merge the actual batching functions, which should be small thanks to the previous steps

ozancaglayan Jul 31, 2024
Author

Would somebody take responsibility to do these or do we just continue fixing the master then?

Jiltseb Jul 31, 2024

I think someone from SYSTRAN should discuss this with us and agree with the proposal.

Thoughts on the new release with batched inference and some refactoring #927

Replies: 9 comments · 7 replies

MahmoudAshraf97 Jul 23, 2024 Maintainer

ozancaglayan Jul 23, 2024 Author

MahmoudAshraf97 Jul 23, 2024 Maintainer

ozancaglayan Jul 25, 2024 Author

ozancaglayan Jul 29, 2024 Author

MahmoudAshraf97 Jul 29, 2024 Maintainer

ozancaglayan Jul 30, 2024 Author

MahmoudAshraf97 Jul 30, 2024 Maintainer

ozancaglayan Jul 31, 2024 Author

Replies: 9 comments 7 replies

MahmoudAshraf97 Jul 23, 2024
Maintainer

ozancaglayan
Jul 23, 2024
Author

MahmoudAshraf97
Jul 23, 2024
Maintainer

ozancaglayan
Jul 25, 2024
Author

ozancaglayan
Jul 29, 2024
Author

MahmoudAshraf97 Jul 29, 2024
Maintainer

ozancaglayan
Jul 30, 2024
Author

MahmoudAshraf97 Jul 30, 2024
Maintainer

ozancaglayan Jul 31, 2024
Author