Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; #10208

a1ix2 · 2024-11-07T23:02:01Z

What happened?

Trying to run a llama-server on Apple Silicon M2 running Ventura. Same error either using the latest release or building from source. I'm trying to load Llama-3.2-3B-Instruct F16 from Meta. I created the gguf using convert_hf_to_gguf.py.

$ ./llama-server -m Llama-3.2-3B-Instruct-F16.gguf --verbose

Name and Version

From source

./llama-cli --version
version: 4048 (a71d81c)
built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin22.6.0

From the release

$ ./llama-cli --version
version: 4044 (97404c4)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Mac

Relevant log output

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2
ggml_metal_init: picking default device: Apple M2
ggml_metal_init: using embedded metal library
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
" UserInfo={NSLocalizedDescription=program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
}
ggml_backend_metal_device_init: error: failed to allocate context
llama_new_context_with_model: failed to initialize Metal backend
common_init_from_params: failed to create context with model '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
srv    load_model: failed to load model, '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
main: exiting due to model loading error

The text was updated successfully, but these errors were encountered:

stefanb · 2024-11-08T20:47:02Z

According to llama.cpp pull requests in Homebrew the problem started appearing in Homebrew/homebrew-core#196827, between tags b4034 and b4038

Diff: b4034...b4038

a1ix2 · 2024-11-08T21:14:20Z

I can confirm that b4034 works, but b4036 throws the same error.

stefanb · 2024-11-08T21:20:29Z

I can confirm that b4034 works, but b4036 throws the same error.

Which narrows down the problematic diff to b4034...b4036

a1ix2 · 2024-11-08T22:51:28Z

The bug was introduced in a1eaf6. Previous commit b8deef works.

stefanb · 2024-11-09T05:28:58Z

Commit a1eaf6a was from merging

metal : add quantized FA support #10149

cc @ggerganov, any clues?

ggerganov · 2024-11-09T10:21:15Z

@stefanb Should be fixed now. Let me know if the issue persists.

stefanb · 2024-11-09T10:42:05Z

Tnx @ggerganov, waiting for the next tag (>b4056) containing the fix.

stefanb · 2024-11-09T13:53:00Z

@ggerganov, thanks, seems to be fixed 🎉 in

llama.cpp 4061 Homebrew/homebrew-core#197164

a1ix2 · 2024-11-09T17:33:13Z

Can confirm, works on my M2 Air! Thank you so much! Still impressed how fast Metal is even for reasonably sized models on a rather low-end laptop.

a1ix2 added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Nov 7, 2024

daeho-ro mentioned this issue Nov 8, 2024

llama.cpp 4050 Homebrew/homebrew-core#197066

Closed

stefanb mentioned this issue Nov 8, 2024

llama.cpp 4052 Homebrew/homebrew-core#197106

Closed

stefanb mentioned this issue Nov 9, 2024

llama.cpp 4053 Homebrew/homebrew-core#197117

Closed

ggerganov mentioned this issue Nov 9, 2024

metal : fix build and some more comments #10229

Merged

ggerganov closed this as completed in #10229 Nov 9, 2024

stefanb mentioned this issue Nov 9, 2024

llama.cpp 4056 Homebrew/homebrew-core#197152

Closed

stefanb mentioned this issue Nov 9, 2024

llama.cpp 4061 Homebrew/homebrew-core#197164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; #10208

Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; #10208

a1ix2 commented Nov 7, 2024

stefanb commented Nov 8, 2024

a1ix2 commented Nov 8, 2024

stefanb commented Nov 8, 2024

a1ix2 commented Nov 8, 2024 •

edited

Loading

stefanb commented Nov 9, 2024

ggerganov commented Nov 9, 2024

stefanb commented Nov 9, 2024

stefanb commented Nov 9, 2024

a1ix2 commented Nov 9, 2024 •

edited

Loading

Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; #10208

Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; #10208

Comments

a1ix2 commented Nov 7, 2024

What happened?

Name and Version

From source

From the release

What operating system are you seeing the problem on?

Relevant log output

stefanb commented Nov 8, 2024

a1ix2 commented Nov 8, 2024

stefanb commented Nov 8, 2024

a1ix2 commented Nov 8, 2024 • edited Loading

stefanb commented Nov 9, 2024

ggerganov commented Nov 9, 2024

stefanb commented Nov 9, 2024

stefanb commented Nov 9, 2024

a1ix2 commented Nov 9, 2024 • edited Loading

a1ix2 commented Nov 8, 2024 •

edited

Loading

a1ix2 commented Nov 9, 2024 •

edited

Loading