AVX implementations for remove-vzip #1370

sw · 2023-05-08T19:07:32Z

This adds AVX/AVX2 optimizations for PR #1305. It also has some improvements for the scalar implementations and the README and SHA256SUMS (incomplete).

As x86 doesn't seem to gain from this change that breaks file compatibility, I think quite a few people will not be very happy with it. But maybe someone can find an improvement to my AVX code.

The address sanitizer builds are failing because of some problem with the CI machines, I don't think it's caused by our code.

This reverts commit 948d124.

* ggml : remove Q4_0 bit shufling (ARM NEON) * ggml : remove Q4_1 bit shuffling (ARM NEON + reference) * ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) * ggml : remove Q4_2 bit shuffling (WIP, BROKEN) * ggml : remove Q5_0 bit shuffling (ARM NEON) * ggml : 2x faster scalar implementations * ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) * ggml : simplify scalar dot * ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit * ggml : fix Q4_1 quantization * ggml : update cuBLAS + normalize variable names * ggml : remove Q4_2 mode * ggml : minor formatting * ggml : fix Q5_0 quantization * scripts : add script for measuring the time per token * AVX implementations (#1370) * ggml : uniform 5th bit extraction * llama : produce error upon loading old model files * llama : fix model magic/version write * ggml : speed-up Q5_0 + Q5_1 at 4 threads * ggml : preserve old Q4 and Q5 formats * ggml : simplify Q8_1 - no need for low / high sums anymore * ggml : fix Q8_0 and Q8_1 rounding * Revert "AVX implementations (#1370)" This reverts commit 948d124. * ggml : fix AVX2 implementation * sha : update hashes for 7B and 13B * readme : update timings + remove warning banner * llama : update v2 PR number to 1405 * ggml : fix WASM comments * ggml : back to original bit order * readme : add note that Q4 and Q5 have been changed * llama : fix return for unknown version --------- Co-authored-by: Stephan Walter <[email protected]>

AVX implementations

de51f6f

sw requested a review from ggerganov May 8, 2023 19:07

ggerganov approved these changes May 8, 2023

View reviewed changes

ggerganov merged commit 948d124 into ggerganov:remove-vzip May 8, 2023

sw deleted the shuffle-avx branch May 9, 2023 17:31

ggerganov added a commit that referenced this pull request May 11, 2023

Revert "AVX implementations (#1370)"

32eb5c3

This reverts commit 948d124.

ggerganov pushed a commit that referenced this pull request May 11, 2023

AVX implementations (#1370)

9e49d20

ggerganov added a commit that referenced this pull request May 11, 2023

Revert "AVX implementations (#1370)"

bd5e373

This reverts commit 948d124.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX implementations for remove-vzip #1370

AVX implementations for remove-vzip #1370

sw commented May 8, 2023 •

edited

Loading

AVX implementations for remove-vzip #1370

AVX implementations for remove-vzip #1370

Conversation

sw commented May 8, 2023 • edited Loading

sw commented May 8, 2023 •

edited

Loading