Releases: henk717/koboldcpp
v1.57 - Vulkan Only Pre-release
This is a Vulkan Only build of the upcoming v1.57, please check on https://koboldai.org/cpp if v1.57 has already released.
If v1.57 has a formal release this build has no advantages for you.
v1.59-Ofast
v1.59 but the makefile is changed to be OFast, for comparative testing.
1.52 - Linux
v1.52 Placebo commit to maybe fix CI
1.51.1 - Linux Binary Test
This is a special test release for linux, for other builds check https://koboldai.org/cpp
1.35
This repository is only used on special occation for compiled builds, get the latest from https://koboldai.org/cpp
Koboldcpp 1.35 build with sched_yield enabled and CUDA 11.4 for better GPU compatibiltiy
H2 update: (Still shows H on the version but newer than the henk_cuda from concedo's repository) Compiled in a VM for better dependency stability and CUDA 11.4 support.
H3 update: Same source code as the previous versions other than the version name change. Recompiled with a different psutil (from conda instead of pip) to make high priority work again.
Win7 build: Compiled without PrefetchVirtualMemory, normally Windows 7 is only supported on the Fallback backend. This is a limited edition build that has Windows 7 support on hopefully all backends (CUDA not tested) at the expense of the model loading speed.
Tools: Compilation of all the GGML conversion tools (make tools)
v1.0.3 - Windows
llamacpp-for-kobold-1.0.3
- Applied the massive refactor from the parent repo. It was a huge pain but I managed to keep the old tokenizer untouched and retained full support for the original model formats.
- Reduced default batch sizes greatly, as large batch sizes were causing bad output and high memory usage
- Support dynamic context lengths sent from client.
- TavernAI is working although I wouldn't recommend it, they spam the server with multiple requests of huge contexts so you're going to have a very painful time getting responses.
To use drag and drop a compatible quantized model for llamacpp on top of the exe.
v1.0.2 - 2048 Context - Windows
The original release was limited to 512 tokens, this release changes this to 2048 and uses all cores available on the system.
1.0.2 - Windows
Standalone LLaMAcpp server for KoboldAI, includes KoboldAI Lite.
To use drag and drop a compatible quantized model for llamacpp on top of the exe.