GPU Offload Issue #1395

thiner · 2023-12-07T12:01:21Z

thiner
Dec 7, 2023

I deployed model vicuna-v1.5-13b-Q5_K_M.gguf based on LocalAI v2.0 core docker image. I am sure the GPU setup is correct in my k8s cluster, and I have enabled GPU arguments for LocalAI image. But the GPU offloading is not working. Can anyone tell what's wrong with my configuration? Below is the debug output:

 11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: using CUDA for GPU acceleration

 11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: mem required  = 8801.75 MB (+ 12800.00 MB per state)

 11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: offloading 0 repeating layers to GPU

 11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: offloaded 0/43 layers to GPU

 11:50AM DBG GRPC(vicuna-13b-v1.5-16k.Q5_K_M.gguf-127.0.0.1:44091): stderr llm_load_tensors: VRAM used: 0 MB

Below are my configurations:

The model configuration

name: gpt-3.5-turbo
parameters:
  model: /opt/models/vicuna-13b-v1.5-16k.Q5_K_M.gguf
  temperature: 0.4

context_size: 512
threads: 16
backend: llama
embeddings: true

gpu_layers: 22
mmlock: true
tensor_split: ""
main_gpu: ""
prompt_cache_path: "prompt-cache"
prompt_cache_all: true
prompt_cache_ro: false
mmap: true
low_vram: true
# stopwords (if supported by the backend)
stopwords:
- "HUMAN:"
- "### Response:"
roles:
  function: 'Function Result:'
  assistant_function_call: 'Function Call:'
  assistant: 'Assitant:'
  user: 'User:'
  system: 'System:'
template:
  completion: vicuna-completion
  chat: vicuna-chat

The pod env setting

env:
            - name: PARALLEL_REQUESTS
              value: 'true'
            - name: CUDA_PATH
              value: /usr/local/cuda-12
            - name: API-KEY
              value: 'sk-xxxxx'
            - name: MODELS_PATH
              value: /models
            - name: REBUILD
              value: 'false'
            - name: BUILD_TYPE
              value: cublas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Offload Issue #1395

{{title}}

Replies: 0 comments

Select a reply

GPU Offload Issue #1395

thiner Dec 7, 2023

Replies: 0 comments

thiner
Dec 7, 2023