Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] EKS 1.28 with Karpenter v0.33.1 not working #7481

Closed
ensean opened this issue Jan 16, 2024 · 3 comments · Fixed by #7503
Closed

[Bug] EKS 1.28 with Karpenter v0.33.1 not working #7481

ensean opened this issue Jan 16, 2024 · 3 comments · Fixed by #7503
Labels
area/karpenter kind/bug priority/important-soon Ideally to be resolved in time for the next release

Comments

@ensean
Copy link

ensean commented Jan 16, 2024

What were you trying to accomplish?

trying to create cluster 1.28 with karpenter v0.33.1

What happened?

cluster created but karpenter hanged

2024-01-16 07:49:37 [ℹ]  waiting for CloudFormation stack "eksctl-eks-kptv3-karpenter"
2024-01-16 07:50:07 [ℹ]  waiting for CloudFormation stack "eksctl-eks-kptv3-karpenter"
2024-01-16 07:50:46 [ℹ]  waiting for CloudFormation stack "eksctl-eks-kptv3-karpenter"
2024-01-16 07:51:26 [ℹ]  waiting for CloudFormation stack "eksctl-eks-kptv3-karpenter"
2024-01-16 07:52:27 [ℹ]  waiting for CloudFormation stack "eksctl-eks-kptv3-karpenter"
2024-01-16 07:52:27 [ℹ]  1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2024-01-16 07:52:27 [ℹ]  1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2024-01-16 07:52:27 [ℹ]  building iamserviceaccount stack "eksctl-eks-kptv3-addon-iamserviceaccount-karpenter-karpenter"
2024-01-16 07:52:27 [ℹ]  deploying stack "eksctl-eks-kptv3-addon-iamserviceaccount-karpenter-karpenter"
2024-01-16 07:52:27 [ℹ]  waiting for CloudFormation stack "eksctl-eks-kptv3-addon-iamserviceaccount-karpenter-karpenter"
2024-01-16 07:52:57 [ℹ]  waiting for CloudFormation stack "eksctl-eks-kptv3-addon-iamserviceaccount-karpenter-karpenter"
2024-01-16 07:52:57 [ℹ]  adding identity "arn:aws:iam::xxxxxxxxxx:role/eksctl-KarpenterNodeRole-eks-kptv3" to auth ConfigMap
2024-01-16 07:52:57 [ℹ]  adding Karpenter to cluster eks-kptv3
E0116 07:52:59.856684    3205 memcache.go:206] couldn't get resource list for karpenter.k8s.aws/v1beta1: the server could not find the requested resource
E0116 07:52:59.942104    3205 memcache.go:206] couldn't get resource list for karpenter.sh/v1beta1: the server could not find the requested resource
Error: failed to install Karpenter: failed to install Karpenter chart: failed to install chart: timed out waiting for the condition

How to reproduce it?

eksctl create cluster -f ./cluster.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: eks-kptv3
  region: ap-northeast-1
  version: "1.28"
  tags:
    karpenter.sh/discovery: eks-kptv3 # here, it is set to the cluster name
iam:
  withOIDC: true # required

karpenter:
  version: 'v0.33.1' # Exact version must be specified
  createServiceAccount: true # default is false
  withSpotInterruptionQueue: true

vpc:
  cidr: 10.10.0.0/16
  hostnameType: resource-name
  # disable public access to endpoint and only allow private access
  clusterEndpoints:
    publicAccess: true

managedNodeGroups:
  - name: ng-apps
    instanceType: t3.medium
    minSize: 2
    maxSize: 4
    desiredCapacity: 2
    volumeSize: 20

Logs

Anything else we need to know?

Versions

$ eksctl info
eksctl version: 0.167.0
kubectl version: v1.29.0
OS: linux

$helm version
version.BuildInfo{Version:"v3.13.3", GitCommit:"c8b948945e52abba22ff885446a1486cb5fd3474", GitTreeState:"clean", GoVersion:"go1.20.11"}
@myspotontheweb
Copy link

myspotontheweb commented Jan 17, 2024

Perhaps related to these closed issues?

Issues installing Karpenter v0.33.0

I have reproduced this problem:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cluster-with-karpenter-1
  region: eu-west-1
  version: '1.28'
  tags:
    karpenter.sh/discovery: cluster-with-karpenter-1 # here, it is set to the cluster name
iam:
  withOIDC: true # required

karpenter:
  version: 'v0.33.1' # Exact version must be specified

managedNodeGroups:
  - name: managed-ng-1
    minSize: 2
    maxSize: 2
    desiredCapacity: 2

The helm chart has failed to install

$ helm -n karpenter history karpenter
REVISION        UPDATED                         STATUS  CHART                   APP VERSION     DESCRIPTION                                                    
1               Wed Jan 17 12:11:51 2024        failed  karpenter-v0.33.1       0.33.1          Release "karpenter" failed: timed out waiting for the condition

Pods are crashlooping

$ kubectl get pods -n karpenter
NAME                        READY   STATUS             RESTARTS         AGE
karpenter-b5475c9cc-jsgns   0/1     CrashLoopBackOff   19 (2m41s ago)   74m
karpenter-b5475c9cc-l9jrr   0/1     CrashLoopBackOff   19 (2m32s ago)   74m

Log

$ kubectl logs karpenter-b5475c9cc-jsgns -n karpenter
panic: validating options, missing field, cluster-name

goroutine 1 [running]:
github.com/samber/lo.must({0x25cbe60, 0xc000606c80}, {0x0, 0x0, 0x0})
        github.com/samber/[email protected]/errors.go:53 +0x1e9
github.com/samber/lo.Must0(...)
        github.com/samber/[email protected]/errors.go:72
sigs.k8s.io/karpenter/pkg/operator/injection.WithOptionsOrDie({0x30c0598, 0xc00027b9e0}, {0xc0005ac9c0, 0x2, 0x2333e00?})
        sigs.k8s.io/[email protected]/pkg/operator/injection/injection.go:51 +0x138
sigs.k8s.io/karpenter/pkg/operator.NewOperator()
        sigs.k8s.io/[email protected]/pkg/operator/operator.go:84 +0xb7
main.main()
        github.com/aws/karpenter/cmd/controller/main.go:33 +0x25

Helm chart values (redacted)

$ helm get values karpenter
USER-SUPPLIED VALUES:
aws:
  defaultInstanceProfile: eksctl-KarpenterNodeInstanceProfile-cluster-with-karpenter-1
clusterEndpoint: https://XXXXXXXXXXXXXXX.gr7.eu-west-1.eks.amazonaws.com
clusterName: cluster-with-karpenter-1
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXX:role/eksctl-cluster-with-karpenter-1-iamservice-role
  create: false
  name: karpenter
settings:
  aws:
    clusterEndpoint: https://XXXXXXXXXXXXXXX.gr7.eu-west-1.eks.amazonaws.com
    clusterName: cluster-with-karpenter-1
    defaultInstanceProfile: eksctl-KarpenterNodeInstanceProfile-cluster-with-karpenter-1
    interruptionQueueName: cluster-with-karpenter-1

Hope this helps

Info

$ eksctl info
eksctl version: 0.167.0
kubectl version: v1.28.2
OS: linux

$ helm version
version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}

@myspotontheweb
Copy link

myspotontheweb commented Jan 17, 2024

It appears there is a breaking change in the helm chart, introduced in v0.33.0

Details

Explained in this issue:

The format of the values needs to be:

settings:
  clusterEndpoint: XXXXXXXXXXXXXXXXXXXXX
  clusterName: YYYYYYYYY
  ..

and not:

settings:
  aws:
    clusterEndpoint: XXXXXXXXXXXXXXXXXXXXX
    clusterName: YYYYYYYYY
    ..

Root cause

This is a breaking change in v0.33.0 of the helm chart, which I couldn't find in the release notes:

This PR also collapses all of the controller-wide settings in the settings.aws values block into settings. The old values will be supported up to v0.33.0, at which point they will be dropped.

@yuxiang-zhang
Copy link
Member

Thank you @myspotontheweb for the detailed analysis. I made the changes, please feel free to review 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/karpenter kind/bug priority/important-soon Ideally to be resolved in time for the next release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants