Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp: update submodule for CPU fallback fix #2640

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

cebtenzzre
Copy link
Member

@cebtenzzre cebtenzzre commented Jul 10, 2024

There was a small error in #2409 that caused the Kompute build of llama.cpp to attempt to allocate tensors on GPU, even when we intended to fall back to CPU due to missing features.

This fixes the crash when attempting to load a model that is not supported by Kompute:

ggml_backend_sched_backend_from_buffer: error: no backend supports buffer type Kompute1 used in tensor blk.0.attn_q.weight
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-backend.c:1115: false

@cebtenzzre cebtenzzre requested a review from manyoso July 10, 2024 21:45
@manyoso manyoso merged commit 6cb3dda into main Jul 10, 2024
6 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants