Release Discussion: v1.73.1.yr1 | ROCm v6.2.0 #64

YellowRoseCx · 2024-08-24T16:28:05Z

YellowRoseCx
Aug 24, 2024
Maintainer

https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.73.1.yr1-ROCm

Danik-droid · 2024-08-25T18:49:30Z

Danik-droid
Aug 25, 2024

On my radeon 6900xt works well. I tested different language models and I don't see any problems. Dry works as it should. Performance is slightly better than on the previous version of rocm - example: old 35.77T/s vs new 38.43T/s. In the newer version, slightly faster processing (visible only with large amounts of text). The same vram usage in LLM. Slightly weaker performance in generating images (Stable UI) - for 1024x1024 -> 1.28s/it vs 1.34s/it (the same model and settings). Lower memory usage during image generation (older version vs newer) -> example: 9.3 vs 8.8 and in the final phase 15 vs 14.6

Windows 10 Pro, 22H2. AMD drivers: 24.7.1

0 replies

Andyholm · 2024-08-26T06:50:17Z

Andyholm
Aug 26, 2024

I get this error whenever I try to load any model, not sure why.

All I did was download koboldcpp_rocm.exe and run it.
I tried installing the hipsdk as well if that was needed, but still the same error.

Windows 11 Pro, 23H2, AMD drivers: 24.7.1, GPU: 7900 XTX

2 replies

YellowRoseCx Aug 26, 2024
Maintainer Author

I believe it's because of your AMD CPU having integrated graphics, try changing the "GPU ID" to a different one or disabling integrated graphics in your motherboard's BIOS

Andyholm Aug 26, 2024

Yep, that was the issue! I only used --usecublas mmq and never set the GPU ID :P

mpx14 · 2024-08-26T08:24:31Z

mpx14
Aug 26, 2024

On a gfx906 (Radeon VII) I get the following error: "rocBLAS error: Could not initialize Tensile host: No devices found". Had to roll back to the old working 1.72 build.

13 replies

YellowRoseCx Sep 27, 2024
Maintainer Author

Still no dice. I get the same error message as before. I tried running it with --debugmode but that does not seem to give me anything more useful. Let me know if there is any way to provide more info to toroubleshoot the issue.

I did just realize that you're not getting the "missing tensile_library.dat" error, you're actually getting an error saying it doesnt see your graphics cards at all

Are you using an AMD CPU by chance?

try changing the GPU number in the koboldcpp GUI to a different number or 'all'

mpx14 Sep 27, 2024

Yes the CPU is an AMD Ryzen 5800X. I don't think it is the slot # as it matches the number when in vulkan Preset which works as before. Also setting it to all renders the same result/error.

cb88 Oct 15, 2024

I get the same error on a Vega FE as well as a WX3200 (which is polaris). "rocBLAS error: Could not initialize Tensile host: No devices found"

I have ran it successfully on 6800M 12GB though (same chip as 6700) and W7800 but results were not expected the I was getting fuzzy results).

YellowRoseCx Oct 15, 2024
Maintainer Author

I get the same error on a Vega FE as well as a WX3200 (which is polaris). "rocBLAS error: Could not initialize Tensile host: No devices found"

I have ran it successfully on 6800M 12GB though (same chip as 6700) and W7800 but results were not expected the I was getting fuzzy results).

Which version is that on? Have you tried version 1.76.yr1 (latest) and make sure flash-attention and MMQ is disabled?

Are you able to try turning off the integrated graphics from your CPU in the BIOS? When you start KCPP, does the GPU that's found say 7800 or does it say something else? If it defaults to GPU number 1, try changing it to GPU number 2

cb88 Oct 17, 2024

I'll get around to the other checks later but the W7800 workstation does have the iGPU disabled (its AM5 so does have one) we disable it as its normally a solidworks workstation and we didn't want it ever defaulting to the iGPU.

The PC with the Vega FE does not have an iGPU its AM4 5800x3d.

The laptop with the 6800M works fine with ROCm and is reasonably fast even with a good chunk of slices on the CPU (I forget if the iGPU is enabled/disabled there). I believe the iGPU on it is a Vega 8.

Update: W7800 works as expected it I was conflating stablediffusion.cpp outputting fuzzy images there so ignore that.

Psyko38 · 2024-08-28T16:01:27Z

Psyko38
Aug 28, 2024

I've encountered this error but I don't know what it is except that it comes from rocM

Device 0: AMD Radeon RX 6600, compute capability 10.3, VMM: no
Traceback (most recent call last):
File "koboldcpp.py", line 4881, in
File "koboldcpp.py", line 4526, in main
File "koboldcpp.py", line 894, in load_model
OSError: exception: access violation reading 0x0000000000000000
[19384] Failed to execute script 'koboldcpp' due to unhandled exception!

0 replies

LogDifferent · 2024-08-28T20:09:56Z

LogDifferent
Aug 28, 2024

I did some speed comparisons, my testing environment wasn't too well controlled so take these with a grain of salt, but I'll drop these here if anyone's interested.

Ryzen 7 5800X3D
6950 XT 16GB
64GB DDR4 3600CL16

ROCm 5.7 = KoboldCpp v1.71.1.yr0-ROCm
ROCm 6.2 = KoboldCpp v1.73.1.yr1-ROCm

Model 1: Meta-Llama-3.1-8B-Instruct-Q6_K (Plenty of free VRAM)

ROCm 5.7
ProcessingTime: 3.563s
ProcessingSpeed: 1121.53T/s
GenerationTime: 3.976s
GenerationSpeed: 25.15T/s
TotalTime: 7.539s

ROCm 6.2
ProcessingTime: 3.399s
ProcessingSpeed: 1175.64T/s
GenerationTime: 3.774s
GenerationSpeed: 26.50T/s
TotalTime: 7.173s (faster)

Model 2: Lumimaid-v0.2-12B-Q4_K_M-imat (Good amount of free VRAM)

ROCm 5.7
ProcessingTime: 4.614s
ProcessingSpeed: 866.06T/s
GenerationTime: 4.874s
GenerationSpeed: 20.52T/s
TotalTime: 9.488s

ROCm 6.2
ProcessingTime: 4.776s
ProcessingSpeed: 836.68T/s
GenerationTime: 4.764s
GenerationSpeed: 20.99T/s
TotalTime: 9.540s (slower, margin of error)

Model 3: DaringMaid-20B-V1.1-Q4_K_M (VRAM limited)

ROCm 5.7
ProcessingTime: 53.566s
ProcessingSpeed: 74.60T/s
GenerationTime: 21.378s
GenerationSpeed: 4.68T/s
TotalTime: 74.944s

ROCm 6.2
ProcessingTime: 54.639s
ProcessingSpeed: 73.13T/s
GenerationTime: 17.889s
GenerationSpeed: 5.59T/s
TotalTime: 72.528s (faster)

Model 4: Lumimaid-v0.2-70B.q4_k_m (Very VRAM limited)

ROCm 5.7
ProcessingTime: 200.629s
ProcessingSpeed: 19.92T/s
GenerationTime: 150.338s
GenerationSpeed: 0.67T/s
TotalTime: 350.967s

ROCm 6.2
ProcessingTime: 192.231s
ProcessingSpeed: 20.79T/s
GenerationTime: 121.321s
GenerationSpeed: 0.82T/s
TotalTime: 313.552s (noticeably faster)

0 replies

Krxon · 2024-08-29T07:03:55Z

Krxon
Aug 29, 2024

1 reply

sinni800 Aug 29, 2024

I have the same thing still

sinni800 · 2024-08-29T18:06:56Z

sinni800
Aug 29, 2024

If you have both a Nvidia and an AMD GPU in your computer, it visually shows the NVidia GPU in the GUI.

After hitting launch, it recognizes it correctly, but it still breaks with a null pointer excpeption for me

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6950 XT, compute capability 10.3, VMM: no
Traceback (most recent call last):
  File "koboldcpp.py", line 4881, in <module>
  File "koboldcpp.py", line 4526, in main
  File "koboldcpp.py", line 894, in load_model
OSError: exception: access violation reading 0x0000000000000000
[363000] Failed to execute script 'koboldcpp' due to unhandled exception!

0 replies

AGx3x2x1 · 2024-09-01T09:24:17Z

AGx3x2x1
Sep 1, 2024

how do i use the image analyzing option

1 reply

Danik-droid Sep 1, 2024

how do i use the image analyzing option

You need the right model with a vision adapter. You could start with this:

https://huggingface.co/MoMonir/llava-llama-3-8b-v1_1-GGUF

You need llava-llama-3-8b-v1_1-mmproj-f16.gguf this is the vision module. For this download e.g. Q4 (llava-llama-3-8b-v1_1.Q4_K_M.gguf). It is best to put it in the same folder.

The image will appear in the chat and you can ask about it (choose a small resolution, I'm not sure what resolution this model can read images to...test it, start with very small ones - for example 512x512) If necessary, load it again, sometimes it happens to me that the model stops seeing it.

Have fun.

YW5555 · 2024-09-07T15:49:23Z

YW5555
Sep 7, 2024

Seems like generation speed is still behind Vulkan. Processing speed is much faster though. For RP I think Vulkan is still going to be better with the faster generation speed.

Model: daybreak-kunoichi-2dpo-7b-q6_k.gguf
GPU: 6900XT
Driver: 24.8.1

Vulkan:
ProcessingTime: 18.347s
ProcessingSpeed: 441.05T/s
GenerationTime: 3.509s
GenerationSpeed: 28.50T/s
TotalTime: 21.856s

v1.74.yr0-ROCm:
ProcessingTime: 9.862s
ProcessingSpeed: 820.52T/s
GenerationTime: 5.182s
GenerationSpeed: 19.30T/s
TotalTime: 15.044s

1 reply

YW5555 Sep 14, 2024

I would recommend sticking to the non-ROCm Branch and just use Vulkan. There's a no_CUDA.exe that's only 60MB.

sinni800 · 2024-09-26T16:59:35Z

sinni800
Sep 26, 2024

Since ROCM afair supports compiling to a CUDA card (?? does it still, that was advertised at some point), would it be possible to also target CUDA cards at the same time, like for example if you wanted to do multi-gpu across AMD and Nvidia using ROCM.

Would that even be feasible?

2 replies

YellowRoseCx Sep 27, 2024
Maintainer Author

I'm not entirely sure; while I'm not entirely familiar with HIP, my understanding is with ROCm you have "hipBLAS" and "rocBLAS"; hipBLAS is technically a "wrapper" for both rocBLAS and CUBLAS, meaning you could write your program in "HIP" language code and use hipBLAS which would then make calls to either rocBLAS or CUBLAS depending on if you have AMD or Nvidia.

I'm not sure if that means it can be used at the same time or not though

Given that simply running different types of AMD cards together can sometimes cause hiccups, I'd bet getting both nvidia and amd to work at the same time would be even more difficult, and it might also confuse the underlying "llama.cpp" code which has different code routes depending on if the HIPBLAS or CUBLAS backend is in use

It may be possible, but it would take someone with both and AMD and NVIDIA card to experiment and try developing a working install/compilation procedure

sinni800 Sep 27, 2024

Hmm, so currently, you have it set up in such a way that different cards need different backends compiled in, right? Is one of those backends that you can choose to compile in some sort of general cuda backend?

But what you say is probably why it would be absolute madness to try to get it to work across the GPUs

Release Discussion: v1.73.1.yr1 | ROCm v6.2.0 #64

YellowRoseCx Aug 24, 2024 Maintainer

Replies: 10 comments · 20 replies

YellowRoseCx Aug 26, 2024 Maintainer Author

YellowRoseCx Sep 27, 2024 Maintainer Author

YellowRoseCx Oct 15, 2024 Maintainer Author

Model 1: Meta-Llama-3.1-8B-Instruct-Q6_K (Plenty of free VRAM)

Model 2: Lumimaid-v0.2-12B-Q4_K_M-imat (Good amount of free VRAM)

Model 3: DaringMaid-20B-V1.1-Q4_K_M (VRAM limited)

Model 4: Lumimaid-v0.2-70B.q4_k_m (Very VRAM limited)

YellowRoseCx Sep 27, 2024 Maintainer Author

YellowRoseCx
Aug 24, 2024
Maintainer

Replies: 10 comments 20 replies

YellowRoseCx Aug 26, 2024
Maintainer Author

YellowRoseCx Sep 27, 2024
Maintainer Author

YellowRoseCx Oct 15, 2024
Maintainer Author

YellowRoseCx Sep 27, 2024
Maintainer Author