Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce structs for the q4 data blocks #356

Merged
merged 2 commits into from
Mar 28, 2023
Merged

Conversation

sw
Copy link
Collaborator

@sw sw commented Mar 21, 2023

This came up with the AVX512 support in #320.

The code handling the q4 data blocks has quite a lot of pointer wrangling and sizeofs. I understand that this may have been out of a concern for optimizability by the compiler. But I think the code could be made more readable by introducing structs for the q4_0 and q4_1 data blocks, without changing the overall structure too much.

I have tested my changes only on Linux and with the AVX2 and scalar implementations, and it's easy to misplace an x for a y... so beware ;-)

I have also updated utils.cpp to use the quantize_row_q4_0/1 functions, but this isn't really clean at the moment. It also will cause a difference in the generated q4_0 model file for people using one of the SIMD-optimized variants. The model should work fine but it may be confusing for people who want to check the file's hash, so I'm open to reverting that.
q4_1 is not affected as that has no processor-specific optimizations in the quantization function.

I'm open to suggestions, also on naming and code style, which I tried to keep locally consistent. Should the block structs be made public in ggml.h?

@gjmulder gjmulder added the enhancement New feature or request label Mar 21, 2023
@ggerganov
Copy link
Owner

With the latest changes #370 the quantize methods from quantize.cpp are now moved into ggml.c.
I think it would be a good first step to avoid the code duplication between:

  • quantize_row_q4_0() and ggml_quantize_q4_0()
  • quantize_row_q4_1() and ggml_quantize_q4_1()

I.e. make the latter call the former.
This will reduce duplicated code and also drastically improve performance of quantize.cpp

After this change has been merged, then we can think about wrapping things into struct for better readability.

@sw
Copy link
Collaborator Author

sw commented Mar 22, 2023

Rebased and verified identical model files and prediction output

@ggerganov
Copy link
Owner

Ok, thanks. I know @blackhole89 is working on quantization improvements on a branch - would this change be too big of a conflict for you? We can postpone it for later if necessary.

I think we should also test it on ARM that something is not broken. Will try to find time in the following days

@sw sw force-pushed the q4-struct branch 2 times, most recently from 9221414 to 63ffb1a Compare March 23, 2023 18:29
@sw sw marked this pull request as ready for review March 23, 2023 18:32
@sw
Copy link
Collaborator Author

sw commented Mar 23, 2023

I consider it ready now but I'm open to revising it or waiting for other changes. I agree that it should be tested on ARM and AVX512.

@blackhole89
Copy link
Collaborator

blackhole89 commented Mar 23, 2023

@ggerganov Thanks for paging me in! As far as I can see, there would be no conflict with anything I'm doing. I think this is a good change in terms of code readability - just need to make sure we don't run into weird platform-specific problems due to struct alignment or something.

@ggerganov
Copy link
Owner

@sw I changed the variable names and fixed the ARM_NEON and WASM builds
We haven't tested the WASM and POWER9 paths but hopefully it works

Give it one more try after my changes to make sure that AVX is fine. I'll merge this now

@ggerganov ggerganov merged commit c1f8850 into ggerganov:master Mar 28, 2023
@sw sw deleted the q4-struct branch March 28, 2023 19:13
sw added a commit to sw/llama.cpp that referenced this pull request Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants