llama.cpp: update submodule for CPU fallback fix #2640

cebtenzzre · 2024-07-10T21:45:27Z

There was a small error in #2409 that caused the Kompute build of llama.cpp to attempt to allocate tensors on GPU, even when we intended to fall back to CPU due to missing features.

This fixes the crash when attempting to load a model that is not supported by Kompute:

ggml_backend_sched_backend_from_buffer: error: no backend supports buffer type Kompute1 used in tensor blk.0.attn_q.weight
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-backend.c:1115: false

Signed-off-by: Jared Van Bortel <[email protected]>

llama.cpp: update submodule for CPU fallback fix

9400b39

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre requested a review from manyoso July 10, 2024 21:45

manyoso approved these changes Jul 10, 2024

View reviewed changes

manyoso merged commit 6cb3dda into main Jul 10, 2024
6 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp: update submodule for CPU fallback fix #2640

llama.cpp: update submodule for CPU fallback fix #2640

cebtenzzre commented Jul 10, 2024 •

edited

Loading

llama.cpp: update submodule for CPU fallback fix #2640

llama.cpp: update submodule for CPU fallback fix #2640

Conversation

cebtenzzre commented Jul 10, 2024 • edited Loading

cebtenzzre commented Jul 10, 2024 •

edited

Loading