Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI) (pvc/init container) #90

Closed
brokedba opened this issue Aug 28, 2024 · 7 comments · Fixed by #94
Closed

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI) (pvc/init container) #90

brokedba opened this issue Aug 28, 2024 · 7 comments · Fixed by #94

Comments

@brokedba
Copy link

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI)

I'm looking for guidance on deploying Ollama LLM using Helm charts on a cloud-managed Kubernetes service, specifically Oracle Cloud Infrastructure (OCI). I have a few questions regarding the deployment process:

  1. Persistent Volume and Data Volume Mounting:

    • From the values in the Helm chart, how does the ollama-data volume mountPath: "" match the persistentVolume if it's enabled? It's unclear how these values are connected.
    • Do we need to create the storage class or PersistentVolumeClaim (PVC) manually for the persistentVolume.values to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.
  2. Loading Models with Init Containers:

    • Is there a way to load the models using an init container into the mountPath before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.

The documentation seems limited, making it challenging to proceed. Any examples or additional guidance would be greatly appreciated.
Thank you

@jdetroyes
Copy link
Contributor

Hello @brokedba,

Here's an explanation:

From the values in the Helm chart, how does the ollama-data volume mountPath: "" match the persistentVolume if it's enabled? It's unclear how these values are connected.

First, if ollama.mountPath is set, it overrides the default mount path of /root/.ollama. In most cases, this value doesn't need to be changed. When persistentVolume is enabled, there are two scenarios:

  • If persistentVolume.existingClaim is set:
    • The volume will be attached to the container.
  • If persistentVolume.existingClaim is not set:
    • If persistentVolume.storageClass is specified (or left empty), a PVC will be created by the provisioner and attached to the container (See pvc.yaml).

deployment.yaml

volumes:
  - name: ollama-data
    {{- if .Values.persistentVolume.enabled }}
    persistentVolumeClaim:
      claimName: {{ .Values.persistentVolume.existingClaim | default (printf "%s" (include "ollama.fullname" .)) }}
    {{- else }}
    emptyDir: { }
    {{- end }}

Do we need to create the storage class or PersistentVolumeClaim (PVC) manually for the persistentVolume.values to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.

You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim field. To disable automatic provisioning, set persistentVolume.storageClassName: "-".

Example using longhorn as provisioner:

# Enable persistence using Persistent Volume Claims
# ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
persistentVolume:
  # -- Enable persistence using PVC
  enabled: true


  # -- Ollama server data Persistent Volume Storage Class
  # If defined, storageClassName: <storageClass>
  # If set to "-", storageClassName: "", which disables dynamic provisioning
  # If undefined (the default) or set to null, no storageClassName spec is
  # set, choosing the default provisioner.  (gp2 on AWS, standard on
  # GKE, AWS & OpenStack)
  storageClass: "longhorn"

Is there a way to load the models using an init container into the mountPath before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.

To preload models at startup, simply populate the ollama.models array with the list of models you want to pull. If you're using a PVC, models that have already been pulled won't be downloaded again. The chart uses a postStart lifecycle hook to pull models, which are stored in the mountPath.

deployment.yaml

{{- if or .Values.ollama.models .Values.ollama.defaultModel }}
  lifecycle:
    postStart:
      exec:
        command: [ "/bin/sh", "-c", "{{- printf "echo %s | xargs -n1 /bin/ollama pull %s" (include "ollama.modelList" .) (ternary "--insecure" "" .Values.ollama.insecure)}}" ]
{{- end }}

Let me know if you need more details!


@brokedba
Copy link
Author

brokedba commented Aug 28, 2024

@jdetroyes thank you so much for the answers !!

First, if ollama.mountPath is set, it overrides the default mount path of /root/.ollama. In most cases, this value doesn't need to be changed. When persistentVolume is enabled, there are two scenarios:

  • If persistentVolume.existingClaim is set:
    The volume will be attached to the container.
  • If persistentVolume.existingClaim is not set:
  • If persistentVolume.storageClass is specified (or left empty), a PVC will be created by the provisioner and attached to the container (See pvc.yaml).

You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim field. To disable automatic provisioning, set persistentVolume.storageClassName: "-".

If I understand well , ollama.mountPath matches either:

  1. The existing claim if specified
  2. Else a new one that is created by the chart through dynamic provisioning by the storageClass

But if StorageClass doesn't exist , the option 2 will not really work am I right??
for now I tried creating local PV and pvc and specified the existing claim but I had below error

  • manifest:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: ollama-pv
spec:
  capacity:
    storage: 15Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /mnt/data/ollama
--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
spec:
  volumeName: ollama-pv
  storageClassName: ""
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 15Gi
Stream closed EOF for ollama/ollama-685b9f59df-qw98v (ollama)                                                                                                   
Stream closed EOF for ollama/ollama-685b9f59df-qw98v (install-and-setup-model) 

EDIT : it only worked after I hardcoded the storageClassName: "oci-bv" (Default in Oracle Clooud)

To preload models at startup, simply populate the ollama.models array with the list of models you want to pull.

Here's the thing , my K8 is CPU only . So I need GGUF models to be loaded not GPU. Hence my initcontainers section (See below gist)

  • Here is the chart yaml values I used
    ollama_values.yml
    I also wonder if keepalive value could also be set in the helm

the purpose of the init container

  1. install huggingface cli ,
  2. donwload a gguf model into the "mountPath" using hf cli
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
  --local-dir ai_models \
  --local-dir-use-symlinks False
  1. edit a modelfile
  2. load the model : run ollama create llama3 -f llama3.loc

@jdetroyes jdetroyes reopened this Aug 29, 2024
@jdetroyes
Copy link
Contributor

Hello @clouddude

Based on your scenario, here an example with initContainers and custom lifecycle to download and create model from huggingface.

initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    command: [sh, -c]
    args:
      - |
        pip install -U "huggingface_hub[cli]";
        mkdir -p /root/.ollama/download;
        huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
          --local-dir /root/.ollama/download \
          --local-dir-use-symlinks False;
        echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf' > /root/.ollama/download/llama3.loc;
    volumeMounts:
      - name: ollama-data # Use the same name defined is volumes sections in deployment.yaml
        mountPath: /root/.ollama  # Use same as default

# -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
  postStart:
    exec:
      command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]


persistentVolume:
  # Enable PVC for Ollama
  enabled: true

  # Use default storage class
  storageClass: ""

@jdetroyes jdetroyes linked a pull request Aug 29, 2024 that will close this issue
2 tasks
@brokedba
Copy link
Author

I hit Docker Hub rate limits so I needed to add dockerhub secret but it's complaining .

W0829 17:43:50.092448 9912 warnings.go:70] unknown field "spec.template.spec.initContainers[0].imagePullSecrets"
Release "ollama" has been upgraded. Happy Helming!

am I missing something ? is it included in the template ?

initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    imagePullSecrets:
      - name: dockerhub-sec
    command: [sh, -c]
    args:

@jdetroyes
Copy link
Contributor

jdetroyes commented Aug 29, 2024

Hey @brokedba

Docker secrets are shared with all containers in the deployment.

You don't have to add a line in the initContainers, you just have to populate in the values.yaml

# -- Docker registry secret names as an array

imagePullSecrets: []

@brokedba
Copy link
Author

brokedba commented Aug 29, 2024

my bad I had corrected before I could update the post.
the container is still in pending mode now after that change

Events:
│ Warning FailedScheduling 9m32s default-scheduler 0/3 nodes are available: persistentvolumeclaim "ollama" is being deleted. preemption: 0/3 nodes are available: 3 Preemption │
Preemption is not helpful for scheduling

Edit : the initcontainer phase worked but partially .
The tasks done :

  1. install hg cli
  2. download the model
  3. edit the modelfile
  • But the ollama command didn't/couldn't work . (trying the check var/log/syslog in this container but it doesn't seem to have the usual log files.

What do you think could have caused the postStart not to work (ollama create) ?

# -- Init containers to add to the pod
initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    command: [sh, -c]
    args:
      - |
        pip install -U "huggingface_hub[cli]";
        mkdir -p /root/.ollama/download;
        huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
          --local-dir /root/.ollama/download \
          --local-dir-use-symlinks False;
        echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf
        # Set custom parameter values
        PARAMETER temperature 1.0
        PARAMETER stop "<|start_header_id|>"
        PARAMETER stop "<|end_header_id|>"
        PARAMETER stop "<|eot_id|>"

        # Define the model template
         TEMPLATE """
        {{ if .System }}<|start_header_id|>system<|end_header_id|>
        {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
        {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
        {{ .Response }}<|eot_id|>
        """
        # Set the system message
        SYSTEM You are a helpful AI assistant named e-llmo Assistant.' > /root/.ollama/download/llama3.loc;
    volumeMounts:
      - name: ollama-data
        mountPath: /root/.ollama  # Use same as default
  # -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
  postStart:
    exec:
      command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]

@brokedba
Copy link
Author

brokedba commented Aug 30, 2024

I also noted that the result of the modelfile is truncated where any line between curly brackets {{}}} was ignored .
Although the echo or printf commands works manually after the pod is ready.

the below is the final version after I logged in to the container. I think it might be behind the create command working who knows.
Any idea how to escape curly braces in yaml ?

FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf

# Set custom parameter values
PARAMETER temperature 1.0
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

# Define the model template
TEMPLATE """<|start_header_id|>assistant<|end_header_id|>       <------lines below were all ignored
<|eot_id|>
"""

# Set the system message
SYSTEM You are a helpful AI assistant named e-llmo Assistant. 

I found online that {{ "{{" }} ... {{ "}}" }} could be the fix. will try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants