sync : llama.cpp #965

ggerganov · 2024-09-20T18:27:40Z

No description provided.

* add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <[email protected]>

…by submitting smaller cmdbuffers early. (llama/9118) * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. * fix compile issues * Fix issues where the last submit wasn't executed or handled properly. * remove trailing whitespace * Repair GGML_VULKAN_CHECK_RESULTS * Increase submit counter only if actual work has been submitted and increase submit count to 100. * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Removed WhiteSpaces * ggml : style changes + fix 512-bit nb loop check - fix local scope in switch cases - consistent predicate names - empty lines when necessary - opening braces, spaces - const-correctness - add asserts * Update ggml/src/ggml-quants.c Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

* rpc : fix nkvo * rpc : buf_size must not be static ref: #9337 --------- Co-authored-by: slaren <[email protected]>

* sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <[email protected]> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <[email protected]>

Signed-off-by: Xiaodong Ye <[email protected]>

…/9442) - Added ggml_cpu_has_riscv_v() in GGML to print system info in log - Modified Makefile to only use flag when cross compiling for RISC-V

* feat: Add host buffer type for Ascend NPU(CANN backend) * fix some checking errors * Add a few comments

…463) * cmake : use list(APPEND ...) instead of set() + dedup linker ggml-ci * cmake : try fix sycl * cmake : try to fix sycl 2 * cmake : fix sycl build (llama/9469) * try fix sycl build * use CMAKE_CXX_FLAGS as a string variable --------- Co-authored-by: Georgi Gerganov <[email protected]> * one more CMAKE_CXX_FLAGS fix (llama/9471) --------- Co-authored-by: Michael Podvitskiy <[email protected]>

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

ggerganov/llama.cpp#9418

* squashed readd my iq4_nl sgemm PR ggerganov/llama.cpp#8049 have ggml_vec_dot_q4_0 do two blocks per loop for avx try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per ggerganov/llama.cpp#8549 we can calculate several blocks at a time with no issue * shuffle * remove f16c iq4_nl as i cant make it faster than before

* cmake : do not hide GGML options ggml-ci * build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS for consistency ggml-ci

* threadpool: skip polling for unused threads Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1). This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur). n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written from one thread and read from other threads (not a race conditions). * threadpool: further simplify and improve ggml_barrier Avoid using strict memory order while polling, yet make sure that all threads go through full memory barrier (memory fence) on ggml_barrier entrace and exit. * threads: add simple barrier test This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead. * threadpool: improve thread sync for new-graphs Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order to keep it efficient, once the new graph is detected we do full fence using read-modify-write with strict memory order. * threadpool: improve abort handling Do not use threadpool->ec (exit code) to decide whether to exit the compute loop. threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it. Instead introduce atomic threadpool->abort flag used for this. This is consistent with how we handle threadpool->stop or pause. While at it add an explicit atomic_load for n_threads_cur for consistency. * test-barrier: release threadpool before releasing the context fixes use-after-free detected by gcc thread-sanitizer on x86-64 for some reason llvm sanitizer is not detecting this issue.

* ggml : fix n_threads_cur initialization with one thread * Update ggml/src/ggml.c --------- Co-authored-by: Max Krasnyansky <[email protected]>

ggml-ci

- d6a04f87 - 23e0d70b

ggml-ci

ggerganov and others added 30 commits September 20, 2024 21:19

scripts : add context to sync-llama-am.sh

6f8c166

add check malloc result on device (llama/9346)

f3068aa

* add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <[email protected]>

cuda : fix FA Q src index (1 -> 0) (llama/9374)

58ccd65

CUDA: fix variable name conflict for Windows build (llama/9382)

ab7c211

rpc : fix segfault with nkvo (llama/9389)

9f9246d

* rpc : fix nkvo * rpc : buf_size must not be static ref: #9337 --------- Co-authored-by: slaren <[email protected]>

metal : fix compile warning with GGML_METAL_NDEBUG (llama/0)

bf778d0

sycl : update support conditions (llama/9394)

98d9ff3

* sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <[email protected]> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <[email protected]>

musa: remove Clang builtins mapping (llama/9421)

6c6800c

Signed-off-by: Xiaodong Ye <[email protected]>

CUDA: fix --split-mode row race condition (llama/9413)

ed50f6e

cann: Fix error when running a non-exist op (llama/9424)

f24368f

riscv : modify Makefile and add a RISCV_VECT to print log info (llama…

4fcc15a

…/9442) - Added ggml_cpu_has_riscv_v() in GGML to print system info in log - Modified Makefile to only use flag when cross compiling for RISC-V

cann: Add host buffer type for Ascend NPU (llama/9406)

bab17a2

* feat: Add host buffer type for Ascend NPU(CANN backend) * fix some checking errors * Add a few comments

ggml : ggml_type_name return "NONE" for invalid values (llama/9458)

4de945c

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

cmake : try to fix sycl+intel build (llama/9487)

6d239fb

cmake : correct order of sycl flags (llama/9497)

6f523e2

common : reimplement logging (llama/9418)

6f8cf41

ggerganov/llama.cpp#9418

metal : handle zero-sized allocs (llama/9466)

18ecce4

cmake : do not hide GGML options + rename option (llama/9465)

0192921

* cmake : do not hide GGML options ggml-ci * build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS for consistency ggml-ci

ggml : link MATH_LIBRARY not by its full path (llama/9339)

1df27fc

ggml : fix n_threads_cur initialization with one thread (llama/9538)

4344c2d

* ggml : fix n_threads_cur initialization with one thread * Update ggml/src/ggml.c --------- Co-authored-by: Max Krasnyansky <[email protected]>

CUDA: fix sum.cu compilation for CUDA < 11.7 (llama/9562)

68ad0d0

ggml : fix trailing whitespace (llama/0)

23188a3

ggml-ci

ggml : fix builds (llama/0)

eea09cf

ggml-ci

ggml : refactoring (llama/#0)

cd7d18e

- d6a04f87 - 23e0d70b

sync : llama.cpp

242ae95

examples : adapt to ggml.h changes (#0)

a146842

ggml-ci

ggerganov marked this pull request as ready for review September 20, 2024 18:59

ggerganov merged commit 336c10a into master Sep 20, 2024
8 checks passed

ggerganov deleted the sync branch September 20, 2024 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #965

sync : llama.cpp #965

ggerganov commented Sep 20, 2024 •

edited

Loading

sync : llama.cpp #965

sync : llama.cpp #965

Conversation

ggerganov commented Sep 20, 2024 • edited Loading

ggerganov commented Sep 20, 2024 •

edited

Loading