Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : alternative Q4_3 implementation using modified Q8_0 #1109

Merged
merged 5 commits into from
Apr 22, 2023
Merged

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Apr 21, 2023

This one looks promising - it does not change the Q4_3 format from master and only modifies slightly Q8_0 by adding low and high sums. The results should be identical, but now the Q4_3 dot product evaluates much faster:

#define QK8_0 32
typedef struct {
    float   d;          // delta
    float   s0;         // d * sum(qs[i]) low
    float   s1;         // d * sum(qs[i]) high
    int8_t  qs[QK8_0];  // quants
} block_q8_0;
llama_print_timings:      sample time =    47.11 ms /    64 runs   (    0.74 ms per run)
llama_print_timings: prompt eval time =   482.44 ms /     8 tokens (   60.30 ms per token)
llama_print_timings:        eval time =  3419.36 ms /    63 runs   (   54.28 ms per run)
llama_print_timings:       total time =  3959.05 ms

I think this is the way to go. But, let's see the ppl results from the Q4_3a #1108 approach first

@ggerganov ggerganov marked this pull request as ready for review April 21, 2023 20:14
@ggerganov
Copy link
Owner Author

Will fix the AVX2 implementation tomorrow and merge it

ggml.c Outdated
@@ -1469,10 +1499,16 @@ static void quantize_row_q8_0(const float * restrict x, void * restrict vy, int
#endif
#if defined __AVX__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #1099 where I intend to fix this, the #if condition is wrong here, causing the code below to be executed for AVX2, essentially duplicating the work. Just a thing to keep in mind or fix when measuring performance.

@ggerganov ggerganov merged commit 955ef9a into master Apr 22, 2023
@ggerganov ggerganov deleted the q4_3b branch April 22, 2023 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants