Invalid device function on ROCm #1008

DutchEllie · 2023-12-14T13:17:31Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I tried to compile llama-cpp-python to run on my local machine, which has an RX 7900 XTX in it and runs Arch Linux. I have installed (to the best of my knowledge) all of the required packages for compiling this software. In the past, I've ran Oobabooga's default setup (using pyenv instead of Conda) using ROCm based GPU acceleration (verified, it does use ROCm hipBLAS and not OpenCL CLBLAS, it says so in the logs). This still works now, but I wanted to compile the package myself so I could use the latest version supporting Mixtral. That's when I discovered it doesn't work.

I expected the package to just compile, install and run without issues.

Current Behavior

I compile the package according to the instructions in the README, running CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python and also setting CC=/opt/rocm/llvm/bin/clang and CXX=/opt/rocm/llvm/bin/clang++ to compile and install it. Then run it using Oobabooga's software, it doesn't work. Neither Mixtral nor any other gguf model works at that point.

What happens is that the webui starts and when I try to load a model using llama.cpp it works, even offloading some layers to the GPU works. However, when I try to generate anything I get the following error:

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 44.00 MiB
llama_new_context_with_model: kv self size  =   44.00 MiB
llama_build_graph: non-view tensors processed: 510/510
llama_new_context_with_model: compute buffer total size = 147.07 MiB
llama_new_context_with_model: VRAM scratch buffer: 144.00 MiB
llama_new_context_with_model: total VRAM used: 1236.51 MiB (model: 1048.51 MiB, context: 188.00 MiB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
2023-12-14 11:23:49 INFO:LOADER: llama.cpp
2023-12-14 11:23:49 INFO:TRUNCATION LENGTH: 2048
2023-12-14 11:23:49 INFO:INSTRUCTION TEMPLATE: Alpaca
2023-12-14 11:23:49 INFO:Loaded the model in 0.35 seconds.

CUDA error 98 at /tmp/pip-install-7tnzxpkb/llama-cpp-python_65cef98c297d463c84d58f07be341594/vendor/llama.cpp/ggml-cuda.cu:6951: invalid device function
current device: 0
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2904:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

I am pretty sure that last line can be ignored, as it doesn't always show up and I believe isn't related to this project, however what concerns me more is the line before that.

I have tried to compile both the latest v0.2.23 version as well as the version currently used by Oobabooga, v0.2.19. Both result in exactly the same error about invalid device function.

Environment and Context

Sorry, I am remote right now and rebooted, but it's not coming back up. I will add a fully detailed spec list later, but I will add as much as I can right now.

Physical (or virtual) hardware you are using, e.g. for Linux:

CPU

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen Threadripper 3960X 24-Core Processor
    CPU family:          23
    Model:               49
    Thread(s) per core:  2
    Core(s) per socket:  24
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU(s) scaling MHz:  65%
    CPU max MHz:         3800.0000
    CPU min MHz:         2200.0000
    BogoMIPS:            7603.96
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc
                          cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalig
                         nsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2
                          cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lb
                         rv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization features:
  Virtualization:        AMD-V
Caches (sum of all):
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    12 MiB (24 instances)
  L3:                    128 MiB (8 instances)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-47
Vulnerabilities:
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT enabled with STIBP protection
  Spec rstack overflow:  Mitigation; Safe RET
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Operating System, e.g. for Linux:

Arch Linux. Kernel version 6.6.6 (Maybe the devil is my issue 😆)

SDK version, e.g. for Linux:

python version 3.11.5
make 4.4.1
/opt/rocm/llvm/bin/clang++ --version --> clang version 17.0.0

Failure Information (for bugs)

Fails quite spectacularly sometimes. Doesn't always happen, but sometimes the screen of the PC will flash black for a minute and nvtop will keep reporting 100% GPU usage forever until a reboot is issued. This also breaks any future runs until a reboot is done.

Steps to Reproduce

Clone oobabooga text-generation-webui and follow the installation instructions for AMD with AVX2.
Run pip uninstall llama-cpp-python and pip uninstall llama-cpp-python-cuda to uninstall the versions that came with Ooba.
Run CC="/opt/rocm/llvm/bin/clang" CXX="/opt/rocm/llvm/bin/clang++" CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python to install the latest version or install v0.2.19, which verified works on Ooba.
Run Ooba normally, load a gguf model with some layers offloaded to the GPU.
Try to generate anything
Check logs, cuz it just broke.

It should be noted that this exact thing also happens with the original llama.cpp program, but this one breaks my computer way less. llama.cpp always messes up the entire GPU driver, preventing even just loading any of models until a reboot is issued.

Failure Logs

Already provides above.

The text was updated successfully, but these errors were encountered:

SuperPou1 · 2023-12-14T16:08:19Z

you need to specify your gpu architecture while building. see llama.cpp readme, unde hipblas it tells you how to select your specific gpu architecture

DutchEllie · 2023-12-14T18:33:09Z

you need to specify your gpu architecture while building. see llama.cpp readme, unde hipblas it tells you how to select your specific gpu architecture

Tried that already, doesn't work. I ran export AMDGPU_TARGETS=gfx1100 before running CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python --upgrade --no-cache-dir --force-reinstall and it produced about the same results:

llama_new_context_with_model: n_ctx      = 32768
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  = 4096.00 MiB, K (f16): 2048.00 MiB, V (f16): 2048.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 2139.32 MiB
llama_new_context_with_model: VRAM scratch buffer: 2136.00 MiB
llama_new_context_with_model: total VRAM used: 6943.06 MiB (model: 4807.05 MiB, context: 2136.00 MiB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
2023-12-14 19:31:01 INFO:LOADER: llama.cpp
2023-12-14 19:31:01 INFO:TRUNCATION LENGTH: 32768
2023-12-14 19:31:01 INFO:INSTRUCTION TEMPLATE: Mistral
2023-12-14 19:31:01 INFO:Loaded the model in 3.41 seconds.

CUDA error 98 at /tmp/pip-install-d9j8vt1_/llama-cpp-python_f434b6fed78941cbb1ce642d6a511ebb/vendor/llama.cpp/ggml-cuda.cu:7788: invalid device function
current device: 0
GGML_ASSERT: /tmp/pip-install-d9j8vt1_/llama-cpp-python_f434b6fed78941cbb1ce642d6a511ebb/vendor/llama.cpp/ggml-cuda.cu:7788: !"CUDA error"
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

SuperPou1 · 2023-12-14T18:44:29Z

try this:
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1100" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
i think AMDGPU_TARGETS is actually a cmake argument, since it worked like that for me

DutchEllie · 2023-12-14T18:57:28Z

Oh shit! It generates stuff now. However, it doesn't really generate all that nicely. When I have to reload the prompt data it seems, it messes up and just generates completely random stuff. This happens with any model, I tested both the older Mistral 7B instruct v0.1 and the new Mixtral.

I have to unload and reload the model for it to work again. This is a huge step forward tho.

SuperPou1 · 2023-12-14T18:58:46Z

i have the same issue. i thought it's just that my gpu isn't supported.
i do get this only in chat mode though.

DutchEllie · 2023-12-14T19:05:24Z

i have the same issue. i thought it's just that my gpu isn't supported. i do get this only in chat mode though.

I notice this happens only when I give it a prompt that is too long. Happens in both chat and the default/notebook when the prompt is too long. Very weird actually

SuperPou1 · 2023-12-14T19:09:48Z

does this happen with regular llama.cpp?

DutchEllie · 2023-12-15T08:03:56Z

I get normal working results when I set the context size to anything <32768!

SuperPou1 · 2023-12-17T11:27:11Z

i think this is a bug in llama.cpp itself. i made an issue there.

abetlen · 2023-12-22T18:39:31Z

@SuperPou1 is this still an issue with the latest release, I saw a few HIPBLAS / ROCm related fix commits recently in llama.cpp

DutchEllie · 2023-12-23T12:43:26Z

I don't think this still happens. I was able to compile and run now.

DutchEllie mentioned this issue Dec 15, 2023

llama : add Mixtral support ggerganov/llama.cpp#4381

Closed

abetlen added the bug Something isn't working label Dec 22, 2023

DutchEllie closed this as completed Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid device function on ROCm #1008

Invalid device function on ROCm #1008

DutchEllie commented Dec 14, 2023 •

edited

Loading

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 14, 2023

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 14, 2023

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 14, 2023

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 15, 2023

SuperPou1 commented Dec 17, 2023

abetlen commented Dec 22, 2023

DutchEllie commented Dec 23, 2023

Invalid device function on ROCm #1008

Invalid device function on ROCm #1008

Comments

DutchEllie commented Dec 14, 2023 • edited Loading

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 14, 2023

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 14, 2023

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 14, 2023

SuperPou1 commented Dec 14, 2023

DutchEllie commented Dec 15, 2023

SuperPou1 commented Dec 17, 2023

abetlen commented Dec 22, 2023

DutchEllie commented Dec 23, 2023

DutchEllie commented Dec 14, 2023 •

edited

Loading