You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I deployed model vicuna-v1.5-13b-Q5_K_M.gguf based on LocalAI v2.0 core docker image. I am sure the GPU setup is correct in my k8s cluster, and I have enabled GPU arguments for LocalAI image. But the GPU offloading is not working. Can anyone tell what's wrong with my configuration? Below is the debug output:
11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: using CUDA for GPU acceleration
11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: mem required = 8801.75 MB (+ 12800.00 MB per state)
11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: offloading 0 repeating layers to GPU
11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: offloaded 0/43 layers to GPU
11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: VRAM used: 0 MB
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I deployed model
vicuna-v1.5-13b-Q5_K_M.gguf
based on LocalAI v2.0 core docker image. I am sure the GPU setup is correct in my k8s cluster, and I have enabled GPU arguments for LocalAI image. But the GPU offloading is not working. Can anyone tell what's wrong with my configuration? Below is the debug output:Below are my configurations:
Beta Was this translation helpful? Give feedback.
All reactions