Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --split-max-size #6655

Merged
merged 7 commits into from
Apr 14, 2024
Merged

Fix --split-max-size #6655

merged 7 commits into from
Apr 14, 2024

Conversation

CISC
Copy link
Contributor

@CISC CISC commented Apr 13, 2024

Byte size calculation was done on int and overflowed.

Fixes #6634
Fixes #6654

Byte size calculation was done on int and overflowed.
Copy link
Collaborator

@phymbert phymbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but we discussed in #6343 that taking into account the wide usage of this feature to cover it with tests to avoid reproduce it in the future.

I do not want to block this critical fix, but would you please include tests.sh and add it to the CI.

If you have not the time, I can do it later today.

@phymbert phymbert added the split GGUF split model sharding label Apr 13, 2024
@CISC
Copy link
Contributor Author

CISC commented Apr 13, 2024

I do not want to block this critical fix, but would you please include tests.sh and add it to the CI.

Sure, I'll look into it now.

Will autodiscover examples/*/tests.sh scripts and run them.
Copy link
Collaborator

@phymbert phymbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. FYI @ngxson

@CISC
Copy link
Contributor Author

CISC commented Apr 13, 2024

Ok, done. :)

I made some minor changes to tests.sh:

  • Added a second optional parameter to set output directory
  • Disabled the --no-tensor-in-metadata related tests (not working)
  • Downloads a larger model and tests --split-max-size 2G instead of 40M

I've added autodiscovery of examples/*/tests.sh scripts (at that level only, so as not to include f.ex. examples/server/tests/tests.sh) to ci run. It will pass the bin and mnt/models directories as arguments. If you want to change how this works, I'm open to suggestions.

@phymbert
Copy link
Collaborator

Cool, the --no-tensor-in-metadata was for:

But yet to be supported.

@ngxson
Copy link
Collaborator

ngxson commented Apr 13, 2024

The change to gguf-split.cpp looks good. Thanks for taking time on this.

However, for the test, I'm wondering:

  • I couldn't find any workflow file refers to either ci/run.sh or ci-run.sh. I can either find any log for that on the workflow of this PR (or maybe I'm missing something). Can you confirm if the test have really been run on CI? @CISC
  • Having a complete test is always a good thing, but what hold me back atm is that we may accidentally make the workflow slower, takes more time to finish. Depending on the time it takes to finish the test for gguf-split, we may need to decide if we can keep using the current model (gemma Q8_0) or using the smaller one.

@CISC
Copy link
Contributor Author

CISC commented Apr 13, 2024

* I couldn't find any workflow file refers to either `ci/run.sh` or `ci-run.sh`. I can either find any log for that on the workflow of this PR (or maybe I'm missing something). Can you confirm if the test have really been run on CI? @CISC

To quote ci/README.md:

In addition to Github Actions llama.cpp uses a custom CI framework:

https://github.com/ggml-org/ci

It monitors the master branch for new commits and runs the ci/run.sh script on dedicated cloud instances. This allows us to execute heavier workloads compared to just using Github Actions. Also with time, the cloud instances will be scaled to cover various hardware architectures, including GPU and Apple Silicon instances.

Collaborators can optionally trigger the CI run by adding the ggml-ci keyword to their commit message. Only the branches of this repo are monitored for this keyword.

It is a good practice, before publishing changes to execute the full CI locally on your machine

Which is what I did. :)

* Having a complete test is always a good thing, but what hold me back atm is that we may accidentally make the workflow slower, takes more time to finish. Depending on the time it takes to finish the test for `gguf-split`, we may need to decide if we can keep using the current model (gemma Q8_0) or using the smaller one.

The longest part of the test is downloading the model the first time, the next longest part is running the model tests, but I tested it with CPU build, and it's not really that slow.

@ngxson
Copy link
Collaborator

ngxson commented Apr 13, 2024

To quote ci/README.md

IMO the goal of this test script is to test if your changes works before merging. However, what ci/run.sh does is test after it's merged into master branch, so it sounds a bit strange to me.

@phymbert It's up to you to decide if this behavior is expected or not.

The longest part of the test is downloading the model the first time, the next longest part is running the model tests, but I tested it with CPU build, and it's not really that slow.

CPU in CI machines can be different from personal machines. For that reason, the workflow of server example uses the small tinyllama model (less than 100MB in size).

What I wanted to do earlier is to add the test and see if workflows in gitlab actions takes significantly longer to finish or not. But I couldn't find the output of the test from this PR, so I couldn't conclude.

@CISC
Copy link
Contributor Author

CISC commented Apr 13, 2024

@ngxson Yeah, it is indeed a little strange, but according to the readme it's for exactly that purpose (not slowing down GHA)...

Just had a cursory glance at the CI repo and it looks like it preloads its image with ggml-org/models:phi-2/ggml-model-f16.gguf which is 5GB, perhaps that would be better to test against?

@phymbert
Copy link
Collaborator

phymbert commented Apr 13, 2024

the ci\run.sh is launched by ggml/ci manually by @ggerganov on Azure nodes, these tests aim to be slow, so adding a split tests is fine there IMHO.

It is OK if it is not run on all commits, we might ask any futur contributor to run it locally before merging.

But better to request for an approval.

@phymbert phymbert requested a review from ggerganov April 13, 2024 15:32
@ngxson
Copy link
Collaborator

ngxson commented Apr 13, 2024

I think the main problem is that we don't have the idea of running CI tests for examples/* before this PR (except for server). One idea that I have in my mind was that we can add a dedicated workflow for testing multiple examples. But for now I'm OK with simply running tests.sh locally - I actually prefer to this way to keep things simple.

That's just my personal thought, maybe @ggerganov will have other ideas.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ggml-ci we currently aim the runs to not take longer than 30 mins:

https://github.com/ggml-org/ci/blob/b2856375b21aa7d96bcc1d3a25d70f4446a0610d/env.sh#L113

Technically, this particular test does not need to run on more than one node since it does not reach any code that is hardware-dependent. But we can improve this in the future - let's merge like this for now

I think the main problem is that we don't have the idea of running CI tests for examples/* before this PR (except for server).

We can add instructions when opening a PR that reminds contributors to run the ggml-ci CI locally

ci/run.sh Outdated Show resolved Hide resolved
@phymbert phymbert merged commit 8800226 into ggerganov:master Apr 14, 2024
56 checks passed
@CISC CISC deleted the no-max-size-overflow branch April 14, 2024 17:01
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* Fix --split-max-size

Byte size calculation was done on int and overflowed.

* add tests.sh

* add examples test scripts to ci run

Will autodiscover examples/*/tests.sh scripts and run them.

* move WORK_PATH to a subdirectory

* clean up before and after test

* explicitly define which scripts to run

* add --split-max-size to readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
split GGUF split model sharding
Projects
4 participants