Releases: YellowRoseCx/koboldcpp-rocm
KoboldCPP-v1.77.yr1-ROCm
- Bring Speed Back
upstream llama.cpp introduced a change to calculate certain values in full 32 bit precision by default which introduced a major slow down for some users with AMD GPUs, this reverts that change until improvements are made
KoboldCPP-v1.77.yr0-ROCm
Update dependencies in cmake-rocm-windows.yml
KoboldCPP-v1.76.yr1-ROCm
version bump
KoboldCPP-v1.76.yr0-ROCm
Upstream changes & rocBlas GPU file reconstruction to attempt fixing the issue some RX 7000 GPU users were experiencing
Oct/14/2024 2:36PM CST - This build may be broke for some users.
try https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.76.yr1-ROCm
I'm sorry for the inconvenience, please work with me as I try to solve any errors🙏
KoboldCPP-v1.75.2.yr1-ROCm
- Recompiled with gfx906 support
- disable mmq by default
- update make_pyinstaller.sh (used to create a single linux executable)
KoboldCPP-v1.75.2.yr0-ROCm
Update cmake-rocm-windows.yml remove openblas
KoboldCPP-v1.74.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
v1.73.1.yr1-ROCm v6.2.0
KoboldCPP-ROCm v1.73.yr1
KoboldCPP-ROCm v1.73.yr1 with rocBLAS from ROCm v6.2.0 (the latest, newer than official Windows version)
I built rocBLAS and the tensile library files for the following GPU architectures: gfx803;gfx900;gfx1010;gfx1030;gfx1031;gfx1032;gfx1100;gfx1101;gfx1102
with the code from the ROCm 6.2.0 release
I was able to test out gfx1010 (5600xt) and gfx1030 (6800xt) and they both worked separately and together (have to use the Low VRAM setting for multi GPU it seems)
- NEW: Added dual-stack (IPv6) network support. KoboldCpp now properly runs on IPv6 networks, the same instance can serve both IPv4 and IPv6 addresses automatically on the same port. This should also fix problems with resolving
localhost
on some systems. Please report any issues you face. - NEW: Pure CLI Mode - Added
--prompt
, allowing KoboldCpp to be used entirely from command-line alone. When running with--prompt
, all other console outputs are suppressed, except for that prompt's response which is piped directly to stdout. You can control the output length with--promptlimit
. These 2 flags can also be combined with--benchmark
, allowing benchmarking with a custom prompt and returning the response. Note that this mode is only intended for quick testing and simple usage, no sampler settings will be configurable. - Changed the default benchmark prompt to prevent stack overflow on old bpe tokenizer.
- Pre-filter to the top 5000 token candidates before sampling, this greatly improves sampling speed on models with massive vocab sizes with negligible response changes.
- Moved chat completions adapter selection to Model Files tab.
- Improve GPU layer estimation by accounting for in-use VRAM.
--multiuser
now defaults to true. Set--multiuser 0
to disable it.- Updated Kobold Lite, multiple fixes and improvements
- Merged fixes and improvements from upstream, including Minitron and MiniCPM features (note: there are some broken minitron models floating around - if stuck, try this one first!)
Hotfix 1.73.1 - Fixed DRY sampler broken, fixed sporadic streaming issues, added letterboxing mode for images in Lite. The previous v1.73 release was buggy, so you are strongly suggested to upgrade to this patch release.
To use minicpm:
- download gguf model and mmproj file here https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf/tree/main
- launch kobold, loading BOTH the main model file as the model, and the mmproj file as mmproj
- upload images and talk to model
To use, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller.
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.
Discussion: KoboldCPP-ROCm v1.73.1.yr1-ROCm v6.2.0 Discussion #64
rocBLAS 4.2.0 for ROCm 6.2.0 for Windows
GPU tensile library files for gfx803;gfx900;gfx1010;gfx1030;gfx1031;gfx1032;gfx1100;gfx1101;gfx1102
and rocBLAS.dll built with ROCm 6.2.0 code
KoboldCPP-v1.72.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'