-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit test for quantization functions #953
Conversation
Please fix build failures. If the build failures turn out to be more problematic, we can extract the first commit and submit it as a separate pull request, which can be reviewed and merged pretty quickly. Then we can rebase this branch/PR and try to figure out why the quantization tests fail. |
I went ahead and extracted the first commit (including my suggestion from above #953 (comment)) as Pull Request #970 |
You might remove |
#970 has been merged Please rebase the branch on top of current master:
or you can rebase interactively with |
2a0ffeb
to
cd3bc37
Compare
// Generate synthetic data | ||
void generate_data(float offset, size_t n, float * dst) { | ||
for (size_t i = 0; i < n; i++) { | ||
dst[i] = 0.1 + 2*cosf(i + offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this (or the maximum errors) needs improvement.
I tried varying this slightly, and with -0.2 + 2*cosf(i + offset)
, q4_0 dot product fails.
We should try to create data that matches the distribution in the actual model, maybe using std::normal_distribution
. @prusnak made some histograms of the models: #397 (comment)
Since Q4_0 and Q4_1 effectively differ in how they handle a bias in the data (0.1
in your case), we might want to test separately with and without bias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to match the distribution better but I somewhat disagree with the reasoning here - it doesn't matter if the data matches the model, as long as the test fails when an implementation is broken.
If anything it might be good to add some "unusual" patterns like all zeroes, all negative/positive etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe it's better to have deterministic test data. So it's just a matter of the thresholds being set too tight?
Edit: I can't seem to reproduce the problem right now. So I guess the maximum errors are okay as they are.
Use the ggml_internal_get_quantize_fn function to loop through all quantization formats and run a sanity check on the result. Also add a microbenchmark that times these functions directly without running the rest of the GGML graph.
Fix issues uncovered in CI - need to use sizes divisible by 32*8 for loop unrolling - use intrinsic header that should work on Mac
Per PR comment, subsumed by test-quantize-fns
cd3bc37
to
e95a833
Compare
Somehow I've lost track of this PR - sorry What is left to be done before merge? |
Use
ggml_internal_get_quantize_fn
to loop through all quantization formats and run sanity checks on the implemented functions.They are run by ctest, but also accept a few command line parameters for more output.
This is a quick test with generated data, so the measurements are not very useful to guide perplexity, but they might surface issues like #876
Also add a microbenchmark that times these functions directly without running the rest of the GGML graph.
Some similarity with #653, but I think there is value both in having tests that run the full GGML graph and tests for specific issues with the SIMD implementations.
Example output:
test-quantize-fns -v
test-quantize-perf -3 --op vec_dot_q