Documentation & Optimization Questions #1012
Replies: 1 comment
-
For the performance, it is already well profiled and it has many internal profile tool built-in. It is currently heavily optimized available cpu acceleration, AVX2, AVX512, NEON, apple accelerate so on. If you look at recent commits and pull requests, many smart people trying to pull last bit of performance as well. I don't think that overall 2x faster will be easy near term in cpu. My understanding is main bottle-neck is not computation rather memory bandwidth. While some optimizations increase computation quite a bit even recently, but overall speed is not as drastically better due to limited memory bandwidth. We may able to gain some speed if gpu or npu based acceleration is implemented due to better computation and higher memory bandwidth. |
Beta Was this translation helpful? Give feedback.
-
I'm slowly familiarizing myself with the architecture. But I have to ask, has the CPU version code been profiled for speed? I feel if it was 2x-10x faster, it would be useable for many more people, as it takes several seconds to generate each word on one of my machines.
I've also noticed documentation for a lot of the parameters seems to be lacking. Example: What is n_vocab? or -i do? What is repeat_penalty? I've been managed to figure out most of it with some digging, but it seems to me there should be clearer documentation.
So to reiterate, I'm mainly asking
-Has the main program been profiled for speed optimization (particularly CPU speed optimization), or is that worth doing?
-Where is the documentation for the parameters, or is it nonexistent? Perhaps this needs fixed.
Beta Was this translation helpful? Give feedback.
All reactions