April 2024 Binary Update #662

martindevans · 2024-04-12T16:08:11Z

Updated binaries, using this build for llama.cpp commit f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7.

Added all new functions.
Moved some functions (e.g. SafeLlamaModelHandle specific functions) into SafeLlamaModelHandle.cs.
Exposed tokens on SafeLlamaModelHandle and LLamaWeights through a Tokens property. As new special tokens are added in the future they can be added here.
Changed all token properties to return nullable tokens, to handle some models not having some tokens.
Fixed DefaultSamplingPipeline to handle no newline token in some models.
Switched embeddings tests to use an embedding model (all-MiniLM-L12-v2.Q8_0). This model is tiny (<100MB) so it should speed up tests slightly.
Added higher level methods for saving/loading sequence state

Testing:

…aSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models.

- Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.

…e in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`

zsogitbe · 2024-04-13T07:55:08Z

It is not clear which llama.cpp version this is. Please update the llama.cpp submodule.

martindevans · 2024-04-13T14:06:12Z

@zsogitbe I've updated the submodule to f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7 which is the version this is based on.

martindevans · 2024-04-13T15:14:21Z

Not sure what's wrong with OSX CI, I think it's just the normal OSX flakiness.

SignalRT · 2024-04-13T19:54:47Z

@martindevans, basic test works on MacOS.

zsogitbe · 2024-04-13T20:21:26Z

Thank you Martin! It is a bit confusing because it is impossible to find a version called f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7 on llama.cpp. Without your link I cannot find it. Not sure what is happening here but I trust that it is a recent version that we have the last important updates of llama.cpp.

martindevans · 2024-04-13T20:38:47Z

f7001ccc... is the commit id, so it's this.

zsogitbe · 2024-04-14T05:39:23Z

I think that it would be better if you would use an official release from here: https://github.com/ggerganov/llama.cpp/releases.
Your f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7 cannot be found in these releases, but I understand that your link points to it. I do not feel comfortable with this GitHub behavior.

martindevans · 2024-04-14T14:51:49Z

That's not specific GitHub behaviour, it's just the commit ID you'd use if you wanted to check out llama.cpp at the right version (i.e. git checkout f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7).

Normally almost every commit in llama.cpp is associated with a "release" since the entire process is automated, we got unlucky with this one because their CI failed (looks like they have issues with flakey MacOS CI too) so the final release step got cancelled.

Next time I'll have a look at the releases as well as the commits. I can always pick a slightly older commit, to line up with a valid release, if there isn't one for the latest commit at the time I started the work (normally I just take whatever is the very latest commit).

m0nsky · 2024-04-14T22:08:29Z

Unit tests for CPU AVX2 and CUDA 12 both passed on my Windows 10 x64 system.

Lyrcaxis · 2024-04-15T08:02:16Z

I didn't run the unit tests, but inference examples work fine in Linux (CPU) + Linux (CUDA 12)

AsakusaRinne · 2024-04-16T17:20:43Z

Windows CUDA11 works fine for me.

zsogitbe · 2024-04-18T16:40:18Z

c325ac9#commitcomment-141108660

martindevans added 5 commits April 12, 2024 17:04

Removed exception if GpuLayerCount > 0 when GPU is not supported.

ae5ad71

Added update and defrag methods for KV cache in SafeLLamaContextHandle

869f389

martindevans force-pushed the april_2024_binary_update branch from b24581d to 869f389 Compare April 13, 2024 01:44

Updated submodule to f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7

0139b43

martindevans mentioned this pull request Apr 14, 2024

Information on new important updates in llama.cpp #627

Closed

Passing the sequence ID when saving a single sequence state

fd98941

martindevans mentioned this pull request Apr 15, 2024

IndexOutOfRangeException when calling IKernelMemory.AskAsync() #661

Open

AsakusaRinne added the backend label Apr 16, 2024

martindevans merged commit c325ac9 into SciSharp:master Apr 16, 2024
2 checks passed

martindevans deleted the april_2024_binary_update branch April 16, 2024 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

April 2024 Binary Update #662

April 2024 Binary Update #662

martindevans commented Apr 12, 2024 •

edited

Loading

zsogitbe commented Apr 13, 2024

martindevans commented Apr 13, 2024

martindevans commented Apr 13, 2024

SignalRT commented Apr 13, 2024

zsogitbe commented Apr 13, 2024

martindevans commented Apr 13, 2024

zsogitbe commented Apr 14, 2024

martindevans commented Apr 14, 2024

m0nsky commented Apr 14, 2024

Lyrcaxis commented Apr 15, 2024 •

edited

Loading

AsakusaRinne commented Apr 16, 2024

zsogitbe commented Apr 18, 2024

April 2024 Binary Update #662

April 2024 Binary Update #662

Conversation

martindevans commented Apr 12, 2024 • edited Loading

zsogitbe commented Apr 13, 2024

martindevans commented Apr 13, 2024

martindevans commented Apr 13, 2024

SignalRT commented Apr 13, 2024

zsogitbe commented Apr 13, 2024

martindevans commented Apr 13, 2024

zsogitbe commented Apr 14, 2024

martindevans commented Apr 14, 2024

m0nsky commented Apr 14, 2024

Lyrcaxis commented Apr 15, 2024 • edited Loading

AsakusaRinne commented Apr 16, 2024

zsogitbe commented Apr 18, 2024

martindevans commented Apr 12, 2024 •

edited

Loading

Lyrcaxis commented Apr 15, 2024 •

edited

Loading