split: include the option in ./convert.py and quantize #6260

phymbert · 2024-03-23T15:32:02Z

Context

At the moment it is only possible to split after convertion or quantization. Mentionned by @Artefact2 in this [comment](https://github.com/ggerganov/llama.cpp/pull/6135#issuecomment-2003942162):

as an alternative, add the splitting logic directly to tools that produce ggufs, like convert.py and quantize.

Proposition

Include split options in convert*.py, support splits in quantize

The text was updated successfully, but these errors were encountered:

phymbert · 2024-03-23T15:33:34Z

@ggerganov not urgent at all, but we might keep this in mind. I have added labels good first issue, feel free to remove it.

ggerganov · 2024-03-23T19:20:07Z

Yes, creating good first issues is encouraged so more people can get involved in the project

christianazinn · 2024-04-26T14:45:38Z

I'd like to work on this as a first issue; can I be assigned? And how much has been implemented already in resolving #6548? It looks like that's just adding support for writing to shards when quantizing existing shards, rather than writing to shards in general, but even so some of the implementation could probably be used.

phymbert · 2024-04-26T16:45:09Z

Hello, I believe that for quantize, the new --keep-split option is enough thanks to @zj040045 .

But yes, it would be nice to generate shards at the convert time.

Feel free to submit a PR.

christianazinn · 2024-04-26T22:47:02Z

The implementation of --keep-split currently keeps the number of shards constant, but I imagine there's a use case for quantizing an unsplit high precision file to multiple shards. Once splitting is implemented at convert time, this will be less of an issue, but perhaps still desirable. Thoughts?

christianazinn · 2024-04-27T03:09:20Z

Preliminary observations after some attempts: This is considerably harder to implement for convert*.py than for quantize, since the conversion scripts are in Python, not C++. I've gotten at least a dozen different errors so far and can conclude that using the GGUFWriter class is not likely to work.

I figure I'll need to write the conversion method such that it writes the tensors to shards as it converts them, but to do that, I need to know how to format those shards, and it's faster to ask than try to parse the sparsely commented code. It appears that the naive implementation, just having each shard other than the first be purely comprised of tensors, isn't what's going on, so I'd like some clarification - @phymbert I believe you wrote the gguf-split code?

When the files are split, does llama.cpp expect each shard to:

each have a copy of the header, or should that all be in just the first shard? What about for other metadata (kv entries)?
each have only gguf_tensor_info about the tensors they contain, or should the first shard contain all tensors' info?
overall have any kv entries other than the default? (e.g. I see LLM_KV_SPLIT_NO and so on.)

In general, how does llama.cpp expect to see the data formatted within the shards?

Apologies for the questions, just catching up to speed.

phymbert · 2024-04-27T03:13:30Z

In general, how does llama.cpp expect to see the data formatted within the shards?

Each shard is a valid GGUF. The first aproach is to create a gguf per batch of tensor.

christianazinn · 2024-04-27T03:22:35Z

I see, thanks. (I had actually solved my own problem not long after posting the question and now I feel foolish. PR forthcoming.)

phymbert · 2024-04-27T03:23:54Z

no worries. keep tryin

phymbert added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers need feedback Testing and feedback with results are needed split GGUF split model sharding labels Mar 23, 2024

phymbert removed the need feedback Testing and feedback with results are needed label Mar 23, 2024

phymbert mentioned this issue Apr 9, 2024

Re-quantization of a split gguf file produces "invalid split file" #6548

Closed

christianazinn mentioned this issue Apr 27, 2024

Option to split during conversion #6942

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split: include the option in ./convert.py and quantize #6260

split: include the option in ./convert.py and quantize #6260

phymbert commented Mar 23, 2024 •

edited

Loading

phymbert commented Mar 23, 2024

ggerganov commented Mar 23, 2024

christianazinn commented Apr 26, 2024

phymbert commented Apr 26, 2024

christianazinn commented Apr 26, 2024

christianazinn commented Apr 27, 2024

phymbert commented Apr 27, 2024

christianazinn commented Apr 27, 2024

phymbert commented Apr 27, 2024

split: include the option in ./convert.py and quantize #6260

split: include the option in ./convert.py and quantize #6260

Comments

phymbert commented Mar 23, 2024 • edited Loading

Context

Proposition

phymbert commented Mar 23, 2024

ggerganov commented Mar 23, 2024

christianazinn commented Apr 26, 2024

phymbert commented Apr 26, 2024

christianazinn commented Apr 26, 2024

christianazinn commented Apr 27, 2024

phymbert commented Apr 27, 2024

christianazinn commented Apr 27, 2024

phymbert commented Apr 27, 2024

phymbert commented Mar 23, 2024 •

edited

Loading