Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) #4924

oobabooga · 2023-12-14T17:37:23Z

Adds Mixtral support.

Compiled using GitHub Actions workflows at https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels

The AMD and Metal workflows are failing, so I only have the NVIDIA and CPU wheels for now.

…#4924)

mjameson · 2023-12-15T00:33:59Z

Awesome, many thanks!!

Fastmedic · 2023-12-15T01:14:07Z

After this update my token generation speed seems to be about 10x slower on my 3090 running regular LLaMa models.

Also: https://www.reddit.com/r/Oobabooga/s/XqGCaA1Rtm

Ph0rk0z · 2023-12-15T13:35:15Z

It has a problem offloading KV cache. I forced it on and it's back to normal but I don't think most users can do that. All other models will be 1/2 speed.

oobabooga · 2023-12-15T14:05:17Z

That's a bit of a conundrum because the previous version does not support Mixtral. @Ph0rk0z is it necessary to recompile llama-cpp-python to apply this fix? Can it be monkeypatched?

Ph0rk0z · 2023-12-15T14:59:56Z

It's not the lib and just the python files. You can edit it under site packages I think. It's not a big patch, it's up in the reddit thread.

Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal)

9e86ec2

oobabooga merged commit 8835ea3 into main Dec 14, 2023

oobabooga added a commit that referenced this pull request Dec 14, 2023

Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) (…

8acecf3

…#4924)

oobabooga deleted the bump-llamacpp-mixtral branch December 15, 2023 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) #4924

Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) #4924

oobabooga commented Dec 14, 2023 •

edited

Loading

mjameson commented Dec 15, 2023

Fastmedic commented Dec 15, 2023

Ph0rk0z commented Dec 15, 2023

oobabooga commented Dec 15, 2023

Ph0rk0z commented Dec 15, 2023

Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) #4924

Bump llama-cpp-python to 0.2.23 (NVIDIA & CPU-only, no AMD, no Metal) #4924

Conversation

oobabooga commented Dec 14, 2023 • edited Loading

mjameson commented Dec 15, 2023

Fastmedic commented Dec 15, 2023

Ph0rk0z commented Dec 15, 2023

oobabooga commented Dec 15, 2023

Ph0rk0z commented Dec 15, 2023

oobabooga commented Dec 14, 2023 •

edited

Loading