Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charmed Kubeflow 1.7 Airgapped deployment - user namespace repository #851

Closed
Barteus opened this issue Mar 22, 2024 · 3 comments · Fixed by canonical/kfp-operators#416
Closed
Labels
bug Something isn't working

Comments

@Barteus
Copy link

Barteus commented Mar 22, 2024

Bug Description

During the offline/airgapped deployment of CKF 1.7, how can I set the custom image for the Pods ui-artifact and visualizationserver in user namespace.

The images are in the local registry and are used for kfp-ui charm workload, but the config is not propagated while creating the above mentioned Pods in the user namespace.

To Reproduce

  1. Follow the guide: https://discourse.charmhub.io/t/install-charmed-kubeflow-in-an-air-gapped-environment/11733
  2. Log into the Kubeflow UI
  3. List the Pods in the user namespace (ie. admin).

Environment

microk8s: 1.24 (and 1.28)
juju: 3.1.7 (on LXD)
Bundle: 1.7/stable (downloaded both charms and OCI images)

Relevant Log Output

$ k get po -n admin
NAME                                              READY   STATUS             RESTARTS   AGE
bpk-test-e2e-0                                    2/2     Running            0          22h
ml-pipeline-visualizationserver-9d8bfdc78-szjz7   1/2     ImagePullBackOff   0          44h
ml-pipeline-ui-artifact-6cfb9c4886-d57mf          1/2     ImagePullBackOff   0          44h

$ k describe po ml-pipeline-visualizationserver-9d8bfdc78-szjz7 -n admin
Name:             ml-pipeline-visualizationserver-9d8bfdc78-szjz7
Namespace:        admin
Priority:         0
Service Account:  default-editor
Node:             mlin01/10.10.11.25
Start Time:       Wed, 20 Mar 2024 14:22:52 +0000
Labels:           app=ml-pipeline-visualizationserver
                  pod-template-hash=9d8bfdc78
                  security.istio.io/tlsMode=istio
                  service.istio.io/canonical-name=ml-pipeline-visualizationserver
                  service.istio.io/canonical-revision=latest
Annotations:      cni.projectcalico.org/containerID: 134a431520c6c35e52c2ea1085455eba18a3eaa998db0a3769b43a43fe3b5890
                  cni.projectcalico.org/podIP: 10.1.29.34/32
                  cni.projectcalico.org/podIPs: 10.1.29.34/32
                  kubectl.kubernetes.io/default-container: ml-pipeline-visualizationserver
                  kubectl.kubernetes.io/default-logs-container: ml-pipeline-visualizationserver
                  prometheus.io/path: /stats/prometheus
                  prometheus.io/port: 15020
                  prometheus.io/scrape: true
                  sidecar.istio.io/status:
                    {"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-env...
Status:           Pending
IP:               10.1.29.34
IPs:
  IP:           10.1.29.34
Controlled By:  ReplicaSet/ml-pipeline-visualizationserver-9d8bfdc78
Init Containers:
  istio-init:
    Container ID:  containerd://0acc85e350627257411d7a55939a4a030aba44b33c96b80d46bf3d457e687c00
    Image:         10.10.11.39:32000/docker.io/istio/proxyv2:1.16.2
    Image ID:      10.10.11.39:32000/docker.io/istio/proxyv2@sha256:2edfbdc4a4500175524480742c4cfebedb7017db20a73a2adef46b10050a9acb
    Port:          <none>
    Host Port:     <none>
    Args:
      istio-iptables
      -p
      15001
      -z
      15006
      -u
      1337
      -m
      REDIRECT
      -i
      *
      -x
      
      -b
      *
      -d
      15090,15021,15020
      --log_output_level=default:info
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 20 Mar 2024 14:22:52 +0000
      Finished:     Wed, 20 Mar 2024 14:22:52 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-75z6s (ro)
Containers:
  ml-pipeline-visualizationserver:
    Container ID:   
    Image:          gcr.io/ml-pipeline/visualization-server:2.0.0-alpha.7
    Image ID:       
    Port:           8888/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  1Gi
    Requests:
      cpu:        50m
      memory:     200Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-75z6s (ro)
  istio-proxy:
    Container ID:  containerd://152ff09373dc9d880ec9db6de089ca55615c5f8ae7a292c9d0f6617f0a17a892
    Image:         10.10.11.39:32000/docker.io/istio/proxyv2:1.16.2
    Image ID:      10.10.11.39:32000/docker.io/istio/proxyv2@sha256:2edfbdc4a4500175524480742c4cfebedb7017db20a73a2adef46b10050a9acb
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --log_output_level=default:info
      --concurrency
      2
    State:          Running
      Started:      Wed, 20 Mar 2024 14:23:23 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
    Environment:
      JWT_POLICY:                    third-party-jwt
      PILOT_CERT_PROVIDER:           istiod
      CA_ADDR:                       istiod.kubeflow.svc:15012
      POD_NAME:                      ml-pipeline-visualizationserver-9d8bfdc78-szjz7 (v1:metadata.name)
      POD_NAMESPACE:                 admin (v1:metadata.namespace)
      INSTANCE_IP:                    (v1:status.podIP)
      SERVICE_ACCOUNT:                (v1:spec.serviceAccountName)
      HOST_IP:                        (v1:status.hostIP)
      PROXY_CONFIG:                  {"discoveryAddress":"istiod.kubeflow.svc:15012","tracing":{"zipkin":{"address":"zipkin.kubeflow:9411"}}}
                                     
      ISTIO_META_POD_PORTS:          [
                                         {"containerPort":8888,"protocol":"TCP"}
                                     ]
      ISTIO_META_APP_CONTAINERS:     ml-pipeline-visualizationserver
      ISTIO_META_CLUSTER_ID:         Kubernetes
      ISTIO_META_INTERCEPTION_MODE:  REDIRECT
      ISTIO_META_WORKLOAD_NAME:      ml-pipeline-visualizationserver
      ISTIO_META_OWNER:              kubernetes://apis/apps/v1/namespaces/admin/deployments/ml-pipeline-visualizationserver
      ISTIO_META_MESH_ID:            cluster.local
      TRUST_DOMAIN:                  cluster.local
    Mounts:
      /etc/istio/pod from istio-podinfo (rw)
      /etc/istio/proxy from istio-envoy (rw)
      /var/lib/istio/data from istio-data (rw)
      /var/run/secrets/credential-uds from credential-socket (rw)
      /var/run/secrets/istio from istiod-ca-cert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-75z6s (ro)
      /var/run/secrets/tokens from istio-token (rw)
      /var/run/secrets/workload-spiffe-credentials from workload-certs (rw)
      /var/run/secrets/workload-spiffe-uds from workload-socket (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  workload-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  credential-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  workload-certs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
  istio-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  43200
  istiod-ca-cert:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio-ca-root-cert
    Optional:  false
  kube-api-access-75z6s:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age                    From     Message
  ----    ------   ----                   ----     -------
  Normal  BackOff  77s (x10826 over 44h)  kubelet  Back-off pulling image "gcr.io/ml-pipeline/visualization-server:2.0.0-alpha.7"

Additional Context

The juju status is green.

@Barteus Barteus added the bug Something isn't working label Mar 22, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5476.

This message was autogenerated

@ca-scribner
Copy link
Contributor

Since the admin namespace is a user's namespace, I'm pretty sure this is coming from the kfp-profile-controller. What I think we've missed is that we need to set the environment variables FRONTEND_IMAGE and VISUALIZATION_SERVER_IMAGE here when creating the deployment for the kfp-profile-controller's sync.py (or, in @Barteus case, since I think ckf 1.7 uses a podspec version of the charm, wherever the equivalent place was back then).

@misohu
Copy link
Member

misohu commented Apr 3, 2024

@ca-scribner is right, I did some digging ... both the podspec and the sidecar rewrite are setting those images in sync.py here and here. The value is propagated from the env variables VISUALIZATION_SERVER_IMAGE and FRONTEND_IMAGE ... its also important to note that the tags are received from variables VISUALIZATION_SERVER_TAG and FRONTEND_TAG which are by default set to value of KFP_VERSION example here. We were never setting those variables and because of that the images were using default values.

In charm code we need to create default-custom-images.json with both images set and then correctly set the envronment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants