Fix --split-max-size #6655

CISC · 2024-04-13T10:10:17Z

Byte size calculation was done on int and overflowed.

phymbert

Thanks, but we discussed in #6343 that taking into account the wide usage of this feature to cover it with tests to avoid reproduce it in the future.

I do not want to block this critical fix, but would you please include tests.sh and add it to the CI.

If you have not the time, I can do it later today.

CISC · 2024-04-13T10:26:25Z

I do not want to block this critical fix, but would you please include tests.sh and add it to the CI.

Sure, I'll look into it now.

Will autodiscover examples/*/tests.sh scripts and run them.

phymbert

Thanks. FYI @ngxson

CISC · 2024-04-13T14:16:22Z

Ok, done. :)

I made some minor changes to tests.sh:

Added a second optional parameter to set output directory
Disabled the --no-tensor-in-metadata related tests (not working)
Downloads a larger model and tests --split-max-size 2G instead of 40M

I've added autodiscovery of examples/*/tests.sh scripts (at that level only, so as not to include f.ex. examples/server/tests/tests.sh) to ci run. It will pass the bin and mnt/models directories as arguments. If you want to change how this works, I'm open to suggestions.

phymbert · 2024-04-13T14:27:26Z

Cool, the --no-tensor-in-metadata was for:

gguf-split add a default option to not include tensors data in first shard #6463

But yet to be supported.

ngxson · 2024-04-13T14:52:23Z

The change to gguf-split.cpp looks good. Thanks for taking time on this.

However, for the test, I'm wondering:

I couldn't find any workflow file refers to either ci/run.sh or ci-run.sh. I can either find any log for that on the workflow of this PR (or maybe I'm missing something). Can you confirm if the test have really been run on CI? @CISC
Having a complete test is always a good thing, but what hold me back atm is that we may accidentally make the workflow slower, takes more time to finish. Depending on the time it takes to finish the test for gguf-split, we may need to decide if we can keep using the current model (gemma Q8_0) or using the smaller one.

CISC · 2024-04-13T15:02:12Z

* I couldn't find any workflow file refers to either `ci/run.sh` or `ci-run.sh`. I can either find any log for that on the workflow of this PR (or maybe I'm missing something). Can you confirm if the test have really been run on CI? @CISC

To quote ci/README.md:

In addition to Github Actions llama.cpp uses a custom CI framework:

https://github.com/ggml-org/ci

It monitors the master branch for new commits and runs the ci/run.sh script on dedicated cloud instances. This allows us to execute heavier workloads compared to just using Github Actions. Also with time, the cloud instances will be scaled to cover various hardware architectures, including GPU and Apple Silicon instances.

Collaborators can optionally trigger the CI run by adding the ggml-ci keyword to their commit message. Only the branches of this repo are monitored for this keyword.

It is a good practice, before publishing changes to execute the full CI locally on your machine

Which is what I did. :)

* Having a complete test is always a good thing, but what hold me back atm is that we may accidentally make the workflow slower, takes more time to finish. Depending on the time it takes to finish the test for `gguf-split`, we may need to decide if we can keep using the current model (gemma Q8_0) or using the smaller one.

The longest part of the test is downloading the model the first time, the next longest part is running the model tests, but I tested it with CPU build, and it's not really that slow.

ngxson · 2024-04-13T15:14:33Z

To quote ci/README.md

IMO the goal of this test script is to test if your changes works before merging. However, what ci/run.sh does is test after it's merged into master branch, so it sounds a bit strange to me.

@phymbert It's up to you to decide if this behavior is expected or not.

The longest part of the test is downloading the model the first time, the next longest part is running the model tests, but I tested it with CPU build, and it's not really that slow.

CPU in CI machines can be different from personal machines. For that reason, the workflow of server example uses the small tinyllama model (less than 100MB in size).

What I wanted to do earlier is to add the test and see if workflows in gitlab actions takes significantly longer to finish or not. But I couldn't find the output of the test from this PR, so I couldn't conclude.

CISC · 2024-04-13T15:27:44Z

@ngxson Yeah, it is indeed a little strange, but according to the readme it's for exactly that purpose (not slowing down GHA)...

Just had a cursory glance at the CI repo and it looks like it preloads its image with ggml-org/models:phi-2/ggml-model-f16.gguf which is 5GB, perhaps that would be better to test against?

phymbert · 2024-04-13T15:32:25Z

the ci\run.sh is launched by ggml/ci manually by @ggerganov on Azure nodes, these tests aim to be slow, so adding a split tests is fine there IMHO.

It is OK if it is not run on all commits, we might ask any futur contributor to run it locally before merging.

But better to request for an approval.

ngxson · 2024-04-13T15:53:57Z

I think the main problem is that we don't have the idea of running CI tests for examples/* before this PR (except for server). One idea that I have in my mind was that we can add a dedicated workflow for testing multiple examples. But for now I'm OK with simply running tests.sh locally - I actually prefer to this way to keep things simple.

That's just my personal thought, maybe @ggerganov will have other ideas.

ggerganov

For ggml-ci we currently aim the runs to not take longer than 30 mins:

https://github.com/ggml-org/ci/blob/b2856375b21aa7d96bcc1d3a25d70f4446a0610d/env.sh#L113

Technically, this particular test does not need to run on more than one node since it does not reach any code that is hardware-dependent. But we can improve this in the future - let's merge like this for now

I think the main problem is that we don't have the idea of running CI tests for examples/* before this PR (except for server).

We can add instructions when opening a PR that reminds contributors to run the ggml-ci CI locally

ci/run.sh

* Fix --split-max-size Byte size calculation was done on int and overflowed. * add tests.sh * add examples test scripts to ci run Will autodiscover examples/*/tests.sh scripts and run them. * move WORK_PATH to a subdirectory * clean up before and after test * explicitly define which scripts to run * add --split-max-size to readme

Fix --split-max-size

1d86bd8

Byte size calculation was done on int and overflowed.

phymbert requested changes Apr 13, 2024

View reviewed changes

phymbert added the split GGUF split model sharding label Apr 13, 2024

CISC added 2 commits April 13, 2024 16:01

add tests.sh

6738215

add examples test scripts to ci run

d42add4

Will autodiscover examples/*/tests.sh scripts and run them.

phymbert approved these changes Apr 13, 2024

View reviewed changes

phymbert requested a review from ggerganov April 13, 2024 15:32

move WORK_PATH to a subdirectory

18ed9ed

ggerganov approved these changes Apr 14, 2024

View reviewed changes

ci/run.sh Outdated Show resolved Hide resolved

CISC added 3 commits April 14, 2024 11:04

clean up before and after test

e53bc29

explicitly define which scripts to run

708a0b0

add --split-max-size to readme

650db0f

phymbert merged commit 8800226 into ggerganov:master Apr 14, 2024
56 checks passed

CISC deleted the no-max-size-overflow branch April 14, 2024 17:01

phymbert mentioned this pull request Apr 17, 2024

Implement '--keep-split' to quantize model into several shards #6688

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix --split-max-size #6655

Fix --split-max-size #6655

CISC commented Apr 13, 2024

phymbert left a comment

CISC commented Apr 13, 2024

phymbert left a comment

CISC commented Apr 13, 2024 •

edited

Loading

phymbert commented Apr 13, 2024

ngxson commented Apr 13, 2024

CISC commented Apr 13, 2024

ngxson commented Apr 13, 2024 •

edited

Loading

CISC commented Apr 13, 2024

phymbert commented Apr 13, 2024 •

edited

Loading

ngxson commented Apr 13, 2024 •

edited

Loading

ggerganov left a comment

Fix --split-max-size #6655

Fix --split-max-size #6655

Conversation

CISC commented Apr 13, 2024

phymbert left a comment

Choose a reason for hiding this comment

CISC commented Apr 13, 2024

phymbert left a comment

Choose a reason for hiding this comment

CISC commented Apr 13, 2024 • edited Loading

phymbert commented Apr 13, 2024

ngxson commented Apr 13, 2024

CISC commented Apr 13, 2024

ngxson commented Apr 13, 2024 • edited Loading

CISC commented Apr 13, 2024

phymbert commented Apr 13, 2024 • edited Loading

ngxson commented Apr 13, 2024 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

CISC commented Apr 13, 2024 •

edited

Loading

ngxson commented Apr 13, 2024 •

edited

Loading

phymbert commented Apr 13, 2024 •

edited

Loading

ngxson commented Apr 13, 2024 •

edited

Loading