Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation & Optimization Questions #1006

Closed
kahootbird opened this issue Apr 15, 2023 · 1 comment
Closed

Documentation & Optimization Questions #1006

kahootbird opened this issue Apr 15, 2023 · 1 comment

Comments

@kahootbird
Copy link

I'm slowly familiarizing myself with the architecture. But I have to ask, has the CPU version code been profiled for speed? I feel if it was 2x-10x faster, it would be useable for many more people, as it takes several seconds to generate each word on one of my machines.

I've also noticed documentation for a lot of the parameters seems to be lacking. Example: What is n_vocab? or -i do? What is repeat_penalty? I've been managed to figure out most of it with some digging, but it seems to me there should be clearer documentation.

So to reiterate, I'm mainly asking

-Has the main program been profiled for speed optimization (particularly CPU speed optimization), or is that worth doing?
-Where is the documentation for the parameters, or is it nonexistent? Perhaps this needs fixed.

@neurostar
Copy link

neurostar commented Apr 16, 2023

For the performance, it is already well profiled and it has many internal profile tool built-in. It is currently heavily optimized available cpu acceleration, AVX2, AVX512, NEON, apple accelerate so on. If you look at recent commits and pull requests, many smart people trying to pull last bit of performance as well.

I don't think that overall 2x faster will be easy near term in cpu. My understanding is main bottle-neck is not computation rather memory bandwidth. While some optimizations increase computation quite a bit even recently, but overall speed is not as drastically better due to limited memory bandwidth. We may able to gain some speed if gpu or npu based acceleration is implemented due to better computation and higher memory bandwidth.

Repository owner locked and limited conversation to collaborators Apr 16, 2023
@prusnak prusnak converted this issue into discussion #1012 Apr 16, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants