-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split: include the option in ./convert.py and quantize #6260
Comments
@ggerganov not urgent at all, but we might keep this in mind. I have added labels |
Yes, creating good first issues is encouraged so more people can get involved in the project |
I'd like to work on this as a first issue; can I be assigned? And how much has been implemented already in resolving #6548? It looks like that's just adding support for writing to shards when quantizing existing shards, rather than writing to shards in general, but even so some of the implementation could probably be used. |
Hello, I believe that for But yes, it would be nice to generate shards at the convert time. Feel free to submit a PR. |
The implementation of |
Preliminary observations after some attempts: This is considerably harder to implement for I figure I'll need to write the conversion method such that it writes the tensors to shards as it converts them, but to do that, I need to know how to format those shards, and it's faster to ask than try to parse the sparsely commented code. It appears that the naive implementation, just having each shard other than the first be purely comprised of tensors, isn't what's going on, so I'd like some clarification - @phymbert I believe you wrote the When the files are split, does
In general, how does llama.cpp expect to see the data formatted within the shards? Apologies for the questions, just catching up to speed. |
Each shard is a valid GGUF. The first aproach is to create a gguf per batch of tensor. |
I see, thanks. (I had actually solved my own problem not long after posting the question and now I feel foolish. PR forthcoming.) |
no worries. keep tryin |
Context
At the moment it is only possible to split after convertion or quantization. Mentionned by @Artefact2 in this
[comment](https://github.com/ggerganov/llama.cpp/pull/6135#issuecomment-2003942162)
:Proposition
Include split options in
convert*.py
, support splits inquantize
The text was updated successfully, but these errors were encountered: