Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] User defined multipart User Data scripts do not work properly - breaks nodeadm functionality #7895

Open
bradwatsonaws opened this issue Jul 18, 2024 · 5 comments
Labels
kind/bug priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases

Comments

@bradwatsonaws
Copy link

bradwatsonaws commented Jul 18, 2024

What were you trying to accomplish?

I am trying to create a manage node group with my own multipart user data script as part of an overrideBootstrapCommand. This multipart user date script should run a mix of bash commands and also fulfill requirement for nodeadm node initialization.

What happened?

When eksctl creates the launch template and takes the user data script defined by the user, it appears to add it's own multipart boundaries, which prevent the user defined multipart user data script from working as expected. The result is that the node group is created with a launch template as per usual. However, the nodes are unable to join the cluster because nodeadm defaults to using imds for its configuration, and the eksctl created boundaries of the multipart user data script prevent nodeadm from finding a configuration in imds.

Example user defined multipart user data script passed into overrideBootstrapCommand:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
    name: rhel-eks
    apiServerEndpoint: https://myclusterapi.gr7.us-gov-east-1.eks.amazonaws.com
    certificateAuthority: mysuperlongcertificatexyzabc
    cidr: 10.100.0.0/16

--BOUNDARY
Content-Type: text/x-shellscript;

#!/bin/bash
set -ex
systemctl enable kubelet.service
systemctl disable nm-cloud-setup.timer
systemctl disable nm-cloud-setup.service
reboot

--BOUNDARY--

Resulting user data script created by eksctl in the node group launch template:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=478b56b7f407b2f8102862b68821d558cacbdf7575b0163bf3b5b98566a8

--478b56b7f407b2f8102862b68821d558cacbdf7575b0163bf3b5b98566a8
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
    name: rhel-eks
    apiServerEndpoint: https://myclusterapi.gr7.us-gov-east-1.eks.amazonaws.com
    certificateAuthority: mysuperlongcertificatexyzabc
    cidr: 10.100.0.0/16

--BOUNDARY
Content-Type: text/x-shellscript;

#!/bin/bash
set -ex
systemctl enable kubelet.service
systemctl disable nm-cloud-setup.timer
systemctl disable nm-cloud-setup.service
reboot

--BOUNDARY--

--478b56b7f407b2f8102862b68821d558cacbdf7575b0163bf3b5b98566a8--

As you can hopefully see, eksctl is generating it's own multipart script with it's own uniquely generated boundaries. This prevents the user defined boundaries from being respected.

How to reproduce it?

A zsh script with paramaters passed in that match the parameters defined at the top of this script:

#!/bin/zsh

EKS_CLUSTER=$1
AMI_ID=$2
MANAGED_NODE_GROUP=$3
AWS_REGION=$4
KEY_PAIR=$5
INSTANCE_TYPE=$6
MIN_SIZE=$7
DESIRED_SIZE=$8
MAX_SIZE=$9
API_ENDPOINT=$10
CIDR=$11
CERTIFICATE=$12
DATE_TIME=$(date +'%Y%m%d%H%M')

cat > managednodegroup-$DATE_TIME.yaml << EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: $EKS_CLUSTER
  region: $AWS_REGION

managedNodeGroups:
  - name: $MANAGED_NODE_GROUP
    minSize: $MIN_SIZE
    desiredCapacity: $DESIRED_SIZE
    maxSize: $MAX_SIZE
    ami: $AMI_ID
    amiFamily: AmazonLinux2023
    instanceType: $INSTANCE_TYPE
    labels:
      role: worker
    tags:
      nodegroup-name: $MANAGED_NODE_GROUP
    privateNetworking: true

    overrideBootstrapCommand: |
      MIME-Version: 1.0
      Content-Type: multipart/mixed; boundary="BOUNDARY"

      --BOUNDARY
      Content-Type: application/node.eks.aws

      ---
      apiVersion: node.eks.aws/v1alpha1
      kind: NodeConfig
      spec:
        cluster:
          name: $EKS_CLUSTER
          apiServerEndpoint: $API_ENDPOINT
          certificateAuthority: $CERTIFICATE
          cidr: $CIDR

      --BOUNDARY
      Content-Type: text/x-shellscript;

      #!/bin/bash
      set -ex
      systemctl enable kubelet.service
      systemctl disable nm-cloud-setup.timer
      systemctl disable nm-cloud-setup.service
      reboot

      --BOUNDARY--
EOF

eksctl create nodegroup --config-file=managednodegroup-$DATE_TIME.yaml --cfn-disable-rollback

Logs
2024-07-18 08:51:16 [ℹ] will use version 1.29 for new nodegroup(s) based on control plane version
2024-07-18 08:51:18 [ℹ] nodegroup "rhel-eks-nodeadmn-new" will use "ami-095c7b500f70da3d0" [AmazonLinux2/1.29]
2024-07-18 08:51:18 [ℹ] 2 existing nodegroup(s) (rhel-eks-github,rhel-eks-nodeadm) will be excluded
2024-07-18 08:51:18 [ℹ] 1 nodegroup (rhel-eks-nodeadmn-new) was included (based on the include/exclude rules)
2024-07-18 08:51:18 [ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "rhel-eks"
2024-07-18 08:51:19 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "rhel-eks-nodeadmn-new" } }
}
2024-07-18 08:51:19 [ℹ] checking cluster stack for missing resources
2024-07-18 08:51:19 [ℹ] cluster stack has all required resources
2024-07-18 08:51:19 [ℹ] building managed nodegroup stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:51:20 [ℹ] deploying stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:51:20 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:51:50 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:52:42 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:54:03 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:55:08 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:56:09 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:56:59 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:58:12 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 08:59:49 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:00:26 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:01:30 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:02:30 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:03:54 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:05:07 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:06:54 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:08:02 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:09:12 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:10:27 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:12:07 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:13:38 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:14:53 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new"
2024-07-18 09:14:53 [ℹ] 1 error(s) occurred and nodegroups haven't been created properly, you may wish to check CloudFormation console
2024-07-18 09:14:53 [ℹ] to cleanup resources, run 'eksctl delete nodegroup --region=us-gov-east-1 --cluster=rhel-eks --name=' for each of the failed nodegroup
2024-07-18 09:14:53 [✖] waiter state transitioned to Failure
Error: failed to create nodegroups for cluster "rhel-eks"

Anything else we need to know?
OS: MacOS
Authentication: SSO through AWS CLI and Okta

Versions
0.187.0

$ eksctl info
Copy link
Contributor

Hello bradwatsonaws 👋 Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@bradwatsonaws
Copy link
Author

By taking the resulting CloudFormation that comes from eksctl, I was able to deploy a nodegroup successfully that doesn't wrap the user data with a unique BOUNDARY. All unique values below were scrubbed.

AWSTemplateFormatVersion: '2010-09-09'
Description: 'EKS Managed Nodes (SSH access: false)'
Mappings:
  ServicePrincipalPartitionMap:
    aws:
      EC2: ec2.amazonaws.com
      EKS: eks.amazonaws.com
      EKSFargatePods: eks-fargate-pods.amazonaws.com
    aws-cn:
      EC2: ec2.amazonaws.com.cn
      EKS: eks.amazonaws.com
      EKSFargatePods: eks-fargate-pods.amazonaws.com
    aws-iso:
      EC2: ec2.c2s.ic.gov
      EKS: eks.amazonaws.com
      EKSFargatePods: eks-fargate-pods.amazonaws.com
    aws-iso-b:
      EC2: ec2.sc2s.sgov.gov
      EKS: eks.amazonaws.com
      EKSFargatePods: eks-fargate-pods.amazonaws.com
    aws-us-gov:
      EC2: ec2.amazonaws.com
      EKS: eks.amazonaws.com
      EKSFargatePods: eks-fargate-pods.amazonaws.com
Resources:
  LaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        BlockDeviceMappings:
          - DeviceName: /dev/sda1
            Ebs:
              Encrypted: false
              Iops: 3000
              Throughput: 125
              VolumeSize: 80
              VolumeType: gp3
        ImageId: ami-0b2e91234574a54c0
        MetadataOptions:
          HttpPutResponseHopLimit: 2
          HttpTokens: required
        SecurityGroupIds:
          - !ImportValue 'eksctl-rhel-eks-cluster::ClusterSecurityGroupId'
        TagSpecifications:
          - ResourceType: instance
            Tags:
              - Key: Name
                Value: rhel-eks-rhel-eks-cfn-Node
              - Key: alpha.eksctl.io/nodegroup-type
                Value: managed
              - Key: nodegroup-name
                Value: rhel-eks-cfn
              - Key: alpha.eksctl.io/nodegroup-name
                Value: rhel-eks-cfn
          - ResourceType: volume
            Tags:
              - Key: Name
                Value: rhel-eks-rhel-eks-cfn-Node
              - Key: alpha.eksctl.io/nodegroup-type
                Value: managed
              - Key: nodegroup-name
                Value: rhel-eks-cfn
              - Key: alpha.eksctl.io/nodegroup-name
                Value: rhel-eks-cfn
          - ResourceType: network-interface
            Tags:
              - Key: Name
                Value: rhel-eks-rhel-eks-cfn-Node
              - Key: alpha.eksctl.io/nodegroup-type
                Value: managed
              - Key: nodegroup-name
                Value: rhel-eks-cfn
              - Key: alpha.eksctl.io/nodegroup-name
                Value: rhel-eks-cfn
        UserData:
          Fn::Base64: !Sub |
            MIME-Version: 1.0
            Content-Type: multipart/mixed; boundary="BOUNDARY"

            --BOUNDARY
            Content-Type: application/node.eks.aws

            ---
            apiVersion: node.eks.aws/v1alpha1
            kind: NodeConfig
            spec:
              cluster:
                name: rhel-eks
                apiServerEndpoint: https://5B3FABCDE05F2D983E65079309B80C06.gr7.us-gov-east-1.eks.amazonaws.com
                certificateAuthority: LS0tLS1CRULS0tLS0K
                cidr: 10.100.0.0/16

            --BOUNDARY
            Content-Type: text/x-shellscript;

            #!/bin/bash
            set -ex
            systemctl enable kubelet.service
            systemctl disable nm-cloud-setup.timer
            systemctl disable nm-cloud-setup.service
            reboot

            --BOUNDARY--
      LaunchTemplateName: !Sub '${AWS::StackName}'
  ManagedNodeGroup:
    Type: AWS::EKS::Nodegroup
    Properties:
      ClusterName: rhel-eks
      InstanceTypes:
        - t3.medium
      Labels:
        alpha.eksctl.io/cluster-name: rhel-eks
        alpha.eksctl.io/nodegroup-name: rhel-eks-cfn
        role: worker
      LaunchTemplate:
        Id: !Ref 'LaunchTemplate'
      NodeRole: !GetAtt 'NodeInstanceRole.Arn'
      NodegroupName: rhel-eks-cfn
      ScalingConfig:
        DesiredSize: 2
        MaxSize: 2
        MinSize: 2
      Subnets:
        - subnet-0f034415c5b1237f0
        - subnet-0bdba07340be1232f
        - subnet-05c651fa62a123b2c
      Tags:
        alpha.eksctl.io/nodegroup-name: rhel-eks-cfn
        alpha.eksctl.io/nodegroup-type: managed
        nodegroup-name: rhel-eks-cfn
  NodeInstanceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action:
              - sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
                - !FindInMap
                  - ServicePrincipalPartitionMap
                  - !Ref 'AWS::Partition'
                  - EC2
        Version: '2012-10-17'
      ManagedPolicyArns:
        - !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly'
        - !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy'
        - !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy'
        - !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonSSMManagedInstanceCore'
      Path: /
      Tags:
        - Key: Name
          Value: !Sub '${AWS::StackName}/NodeInstanceRole'

Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Aug 22, 2024
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2024
@cPu1 cPu1 reopened this Aug 27, 2024
@cPu1 cPu1 added priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases and removed stale labels Aug 27, 2024
@TiberiuGC
Copy link
Collaborator

Related to #7903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases
Projects
None yet
Development

No branches or pull requests

3 participants