Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to upgrade from 0.32.3 to 0.33.0 #5366

Closed
mikkoc opened this issue Dec 19, 2023 · 7 comments
Closed

Fail to upgrade from 0.32.3 to 0.33.0 #5366

mikkoc opened this issue Dec 19, 2023 · 7 comments

Comments

@mikkoc
Copy link

mikkoc commented Dec 19, 2023

          > Here is the command I ran per the instructions

@mikkoc @sde-melo Can you share the actual deployment output with kubectl get deploy -n $KARPENTER_NAMESPACE karpenter. That should help us see the environment variables that are being used here so we can see if the problem is at the binary level or if it is at the helm level.

Originally posted by @jonathan-innis in #5340 (comment)

$ kubectl get deploy karpenter -n kube-system -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "21"
    meta.helm.sh/release-name: karpenter
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2023-06-20T15:07:25Z"
  generation: 21
  labels:
    app.kubernetes.io/instance: karpenter
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: karpenter
    app.kubernetes.io/version: 0.33.0
    helm.sh/chart: karpenter-v0.33.0
  name: karpenter
  namespace: kube-system
  resourceVersion: "132103369"
  uid: 197f4eee-dd10-46be-ab93-fa225d115bd6
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: karpenter
      app.kubernetes.io/name: karpenter
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: karpenter
        app.kubernetes.io/name: karpenter
        needs.aws-apis: "true"
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: karpenter.sh/nodepool
                operator: DoesNotExist
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/instance: karpenter
                app.kubernetes.io/name: karpenter
            topologyKey: kubernetes.io/hostname
      containers:
      - env:
        - name: KUBERNETES_MIN_VERSION
          value: 1.19.0-0
        - name: KARPENTER_SERVICE
          value: karpenter
        - name: LOG_LEVEL
          value: info
        - name: METRICS_PORT
          value: "8000"
        - name: HEALTH_PROBE_PORT
          value: "8081"
        - name: SYSTEM_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: MEMORY_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: controller
              divisor: "0"
              resource: limits.memory
        - name: FEATURE_GATES
          value: Drift=true
        - name: BATCH_MAX_DURATION
          value: 10s
        - name: BATCH_IDLE_DURATION
          value: 1s
        - name: ASSUME_ROLE_DURATION
          value: 15m
        - name: VM_MEMORY_OVERHEAD_PERCENT
          value: "0.075"
        - name: RESERVED_ENIS
          value: "0"
        image: public.ecr.aws/karpenter/controller:v0.33.0@sha256:5e5f59f74d86ff7f13d7d80b89afff8c661cb4e3265f2fdda95b76dd9c838cc1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: http
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        name: controller
        ports:
        - containerPort: 8000
          name: http-metrics
          protocol: TCP
        - containerPort: 8081
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: http
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65536
          runAsNonRoot: true
          runAsUser: 65536
          seccompProfile:
            type: RuntimeDefault
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: Default
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: karpenter
      serviceAccountName: karpenter
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/instance: karpenter
            app.kubernetes.io/name: karpenter
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-06-20T15:07:31Z"
    lastUpdateTime: "2023-06-20T15:07:31Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2023-12-11T08:40:48Z"
    lastUpdateTime: "2023-12-19T08:34:42Z"
    message: ReplicaSet "karpenter-6656759fb9" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 21
  readyReplicas: 1
  replicas: 3
  unavailableReplicas: 2
  updatedReplicas: 2

We use Terraform to manage this and the karpenter-crd Helm Chart

@mikkoc
Copy link
Author

mikkoc commented Dec 19, 2023

 $ helm get values karpenter -n kube-system
USER-SUPPLIED VALUES:
podLabels:
  needs.aws-apis: "true"
serviceAccount:
  create: false
  name: karpenter
settings:
  aws:
    clusterName: devops-cluster
    enablePodENI: true
tolerations:
- key: CriticalAddonsOnly
  operator: Exists

@mikkoc
Copy link
Author

mikkoc commented Dec 19, 2023

Pod crash on start

│ panic: validating options, missing field, cluster-name                                                                                                                                                      │
│                                                                                                                                                                                                             │
│ goroutine 1 [running]:                                                                                                                                                                                      │
│ github.com/samber/lo.must({0x25cae80, 0xc000586e00}, {0x0, 0x0, 0x0})                                                                                                                                       │
│     github.com/samber/[email protected]/errors.go:53 +0x1e9                                                                                                                                                        │
│ github.com/samber/lo.Must0(...)                                                                                                                                                                             │
│     github.com/samber/[email protected]/errors.go:72                                                                                                                                                               │
│ sigs.k8s.io/karpenter/pkg/operator/injection.WithOptionsOrDie({0x30bf5f8, 0xc0007fe5a0}, {0xc0005214e0, 0x2, 0x2332e20?})                                                                                   │
│     sigs.k8s.io/[email protected]/pkg/operator/injection/injection.go:51 +0x138                                                                                                                             │
│ sigs.k8s.io/karpenter/pkg/operator.NewOperator()                                                                                                                                                            │
│     sigs.k8s.io/[email protected]/pkg/operator/operator.go:84 +0xb7                                                                                                                                         │
│ main.main()                                                                                                                                                                                                 │
│     github.com/aws/karpenter/cmd/controller/main.go:33 +0x25                                                                                                                                                │
│ Stream closed EOF for kube-system/karpenter-6656759fb9-jhcjr (controller)

@jls-appfire
Copy link

@mikkoc I think you noticed this, but the cluster-name is missing. That happened to me. In the upgrade notes, the Helm chart values paths changed (which is a breaking change...)

Your chart uses settings.aws.clusterName but the new path is just settings.clusterName, hence you're missing all the AWS values. My pod definition includes seven additional ENV vars that you are missing, including CLUSTER_NAME.

Double check the release notes, but you should just need to update your TF code to replace all settings.aws with just settings.

@jls-appfire
Copy link

Helm command from the documentation:

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

Note these values:
--set "settings.clusterName=${CLUSTER_NAME}"
--set "settings.interruptionQueue=${CLUSTER_NAME}" \

And note your values:

settings:
  aws:
    clusterName: devops-cluster
    enablePodENI: true

@mikkoc
Copy link
Author

mikkoc commented Dec 19, 2023

Double check the release notes, but you should just need to update your TF code to replace all settings.aws with just settings.

You're right, I must have missed this. It works now, thanks @jls-appfire I will close this issue.

@sigurjonviktorsson
Copy link

@jls-appfire would it be possible to add a warning in the upgrade guide for 0.33 here https://karpenter.sh/docs/upgrading/upgrade-guide/#upgrading-to-v0330?

This is a breaking change that's not documented in the upgrade guide.

@jmdeal
Copy link
Contributor

jmdeal commented Feb 21, 2024

The helm chart value migration is part of the v1beta1 migration guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants