sync : ggml #1717

ggerganov · 2024-01-03T09:49:19Z

No description provided.

* scripts : add sync-ggml-am.sh * sync : ggml (VMM, ARM dot prod fix, etc.) * build : fix CUDA build * ggml : fix some mul mat cases + add tests for src1 F16 ggerganov/ggml@dbd0295

* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <[email protected]>

Signed-off-by: hydai <[email protected]>

* feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (llama/4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 ggml-ci

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (llama/4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id

ggml-ci

Co-authored-by: slaren <[email protected]>

scripts : fix sync order + metal sed

494fdb4

gwenzek and others added 11 commits January 3, 2024 14:41

ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)

1a525a6

* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <[email protected]>

cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)

6eee41d

Signed-off-by: hydai <[email protected]>

CUDA: fix tensor core logic for Pascal and HIP (llama/4682)

03b2e39

CUDA: fixed tensor cores not being used on RDNA3 (llama/4697)

355a89a

ggml : add ggml_vdotq_s32 alias (llama/4715)

4877bb2

ggml-ci

metal : add kernel_get_rows_i32

007c68e

ggml-ci

cuda : mark I16 and I32 ops as unsupported

a1d98e6

ggml-ci

cuda : simplify expression

e1928a3

Co-authored-by: slaren <[email protected]>

ggerganov force-pushed the sync branch from 81701e2 to e1928a3 Compare January 3, 2024 12:42

ggerganov merged commit 14c5795 into master Jan 3, 2024
74 checks passed

ggerganov deleted the sync branch January 3, 2024 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : ggml #1717

sync : ggml #1717

ggerganov commented Jan 3, 2024

sync : ggml #1717

sync : ggml #1717

Conversation

ggerganov commented Jan 3, 2024