Thoughts on the new release with batched inference and some refactoring #927
Replies: 9 comments 7 replies
-
I agree with these sentiments. A benefit of relying on Relying on |
Beta Was this translation helpful? Give feedback.
-
Tagging in some contributors for vis (@jordimas @trungkienbkhn @Jiltseb ) |
Beta Was this translation helpful? Give feedback.
-
You do have a point, but that's two completely different directions:
and to maintain these two directions, we will implement most functions twice, a version with torch and a version without, same thing with av and torchaudio. btw, some of the extra code will be removed once #921 is merged, and I'm thinking about more ideas to simplify the codebase |
Beta Was this translation helpful? Give feedback.
-
Hi @ozancaglayan In my view " I've been internally using a slightly-improved numpy-based feature extraction" |
Beta Was this translation helpful? Give feedback.
-
to clarify 1.0.3 doesn't have the batch changes, it's just the current main branch. I'm a bit torn as I'm mostly using faster-whisper CPU only - it still outperforms others on last test. But I appreciate most aren't, and want to pursue faster options with GPUs. Not sure what the right compromise is, but as you note, the code is pretty mature at this point so I'm not too concerned that I'll need to stay on 1.0.3 or fork. |
Beta Was this translation helpful? Give feedback.
-
Hi all, thanks for the feedbacks. @ooobo sorry, I meant upcoming On my side, I'm not super happy with the @jordimas You're right that I should have contributed those changes back to this repo but that comment from myself was a bit irrelevant to this discussion. My point there was that the feature extraction bits in this repo were a bit slower than ideal but the issue there was not the use of numpy. I'm also curious about the future and the plans for this repository from SYSTRAN's point of view. Although it's now under the umbrella of Systran, does it mean that they reserve active resources for maintenance of this repo or would it still be mostly community maintained? |
Beta Was this translation helpful? Give feedback.
-
Given that master is still heavily changing through dense PRs, can we at least make sure that we branch out the latest commit before the |
Beta Was this translation helpful? Give feedback.
-
Hello. I have been on holidays and I could not look at this on detail, but my feedback. My suggested approach:
With more time, changes on the stack can be discussed but I agree that this is a very large change. |
Beta Was this translation helpful? Give feedback.
-
Can we please pull all the relevant and concerned people here and discuss the future of this toolkit? I can see that there are/were some harsh comments & discussions through some PRs regarding the latest merged large commits. Can we please define a set of contributors which we think are meaningful as reviewers, change the repo settings, set up a CODEOWNERS file so that all PRs get reviewed properly before being merged? Although I strongly disagree with the tone and rudeness of the comments & PRs wrote by the OP in the following PR, I believe the way to go is to revert all the commits in We shouldn't usually merge such large API and/or behaviour changing updates to a repository with 10.6K stars and a large user base and then iteratively work through them to change them further. Things should be cooked through PRs & branches and only merged when we believe they are mature enough. For the PRs at hand, there's for example the CTranslate2's random seed setting thing which is completely irrelevant to the other changes. That one should have been merged individually and also without always setting it but by adding an argument to the class to optionally enable it to preserve API consistency. Also, it's not clear why we added another huge dependency chain through pyannote for VAD. Is that better than Silero? Is it measured? etc. etc. When I install pyannote, it installs more than 50 packages to my env and I'm not even using it. So on and so forth. I can see that the majority of the people are upset and concerned with this large PR getting merged and we still have time to fix it and iteratively merge things by small and atomic chunks by getting community review. Thanks |
Beta Was this translation helpful? Give feedback.
-
Hello folks!
Appreciate the community work done in this repository for the past months. I have some comments and remarks on the 1.0.3 release. I think this release made far too major changes in this package such as switching the audio loading from
av
totorchaudio
, replacing numpy based feature extraction withtorch
etc. I think these changes could have been optionally triggered by using someextras
dependencies and so on. These parts of the codes were mature enough and widely used by the community. I've been internally using a slightly-improved numpy-based feature extraction that I had borrowed from the huggingface/whisper repos for more than a year and its fast enough as it's not the main bottleneck when doing the ASR. For loading the audio files,torchaudio
and its resampling facilities may not be the fastest out there.So I recommend sticking to the previous versions and also having the
batched
functionalities in a new file e.g.batched_transcribe.py
as thetranscribe.py
is now over 2K lines of code and is very hard to navigate.Beta Was this translation helpful? Give feedback.
All reactions