You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm slowly familiarizing myself with the architecture. But I have to ask, has the CPU version code been profiled for speed? I feel if it was 2x-10x faster, it would be useable for many more people, as it takes several seconds to generate each word on one of my machines.
I've also noticed documentation for a lot of the parameters seems to be lacking. Example: What is n_vocab? or -i do? What is repeat_penalty? I've been managed to figure out most of it with some digging, but it seems to me there should be clearer documentation.
So to reiterate, I'm mainly asking
-Has the main program been profiled for speed optimization (particularly CPU speed optimization), or is that worth doing?
-Where is the documentation for the parameters, or is it nonexistent? Perhaps this needs fixed.
The text was updated successfully, but these errors were encountered:
For the performance, it is already well profiled and it has many internal profile tool built-in. It is currently heavily optimized available cpu acceleration, AVX2, AVX512, NEON, apple accelerate so on. If you look at recent commits and pull requests, many smart people trying to pull last bit of performance as well.
I don't think that overall 2x faster will be easy near term in cpu. My understanding is main bottle-neck is not computation rather memory bandwidth. While some optimizations increase computation quite a bit even recently, but overall speed is not as drastically better due to limited memory bandwidth. We may able to gain some speed if gpu or npu based acceleration is implemented due to better computation and higher memory bandwidth.
Repository owner
locked and limited conversation to collaborators
Apr 16, 2023
I'm slowly familiarizing myself with the architecture. But I have to ask, has the CPU version code been profiled for speed? I feel if it was 2x-10x faster, it would be useable for many more people, as it takes several seconds to generate each word on one of my machines.
I've also noticed documentation for a lot of the parameters seems to be lacking. Example: What is n_vocab? or -i do? What is repeat_penalty? I've been managed to figure out most of it with some digging, but it seems to me there should be clearer documentation.
So to reiterate, I'm mainly asking
-Has the main program been profiled for speed optimization (particularly CPU speed optimization), or is that worth doing?
-Where is the documentation for the parameters, or is it nonexistent? Perhaps this needs fixed.
The text was updated successfully, but these errors were encountered: