AWS EKS infra: transition to k8s 1.24 and add required storage addon #2056

consideRatio · 2023-01-18T18:22:43Z

This adds the required changes to the eksctl cluster configuration files, as generated from a template.jsonnet file (depending on a libsonnet/nodegroup.jsonnet file).

It will influence newly created clusters only as that is when template.jsonnet is used.

Closes #2054, but work to migrate existing hubs remain, this is covered by #2057.

yuvipanda

NAICE!

consideRatio · 2023-01-18T18:33:06Z

Thanks for reviewing @yuvipanda! I just updated inline comments and force-pushed, and updated the PR description to include a question:

Should we add iam.withAddonPolicies.ebs=true to the default node group object so all node groups get support for mounting PVCs using EBS storage, or should we just add it to the core node-group that is known to need it?

eksctl/template.jsonnet

yuvipanda · 2023-01-18T19:13:10Z

@consideRatio the question to me is - where does the actual controller run? does it run just on the core nodes (as does the autoscaler, for example?) - in which case just letting it be there for the core nodes is the way to go. Or, is it a daemonset that needs to run on every node? In which case we should still probably just set it up for the core nodes (least principle and all that) but document that it needs to be explicitly enabled.

My intuition is that the actual controller is a deployment / statefulset that just runs on the core nodes and so having the permissions be just on the core nodes is fine.

yuvipanda · 2023-01-18T19:13:28Z

However this intuition has no evidence :D Should be easy enough to figure out though.

consideRatio · 2023-01-18T19:33:22Z

Hmmm hmmm. Observations...

the ebs-csi-node pod has serviceAccountName: ebs-csi-node-sa
the ebs-csi-node pod's from a daemonset is tolerating our taints on the user nodes by default, so shows up on all nodes, and they have the permission to read the host filesystem.
I think the ebs-csi-controller are scheduled on the core node by chance so far.

kube-system    ebs-csi-controller-774b7fc5b-ljfb5                  6/6     Running   1 (6d9h ago)   7d1h
kube-system    ebs-csi-controller-774b7fc5b-vt4g4                  6/6     Running   0              7d1h
kube-system    ebs-csi-node-4r78z                                  3/3     Running   0              7d1h

I don't understand the coupling with AWS credentials to k8s SecurityAccounts so well, hmm....

What test do you suggest? To make the controller schedule on a user node, and then see if it functions to provision and bind a PV to a PVC?

yuvipanda · 2023-01-18T19:42:17Z

@consideRatio I think the controller creates PVs, but the ds is what 'binds' them. I would suggest trying to launch a pod that has a PVC that will run on a non-core-nodepool to test this. I think the controller will always run in the core nodepool, the question is, what permissions does the daemonset need to do the attach? So we want to test 'will EBS volumes mount on arbitrary nodes even if they don't have this permission explicitly defined?'

consideRatio · 2023-01-18T21:06:07Z

Thanks for your help thinking about this @yuvipanda!!!

Test: to add a new PVC, and a new Pod to mount it, and force the pod to run on the user node without considered iam ebs stuff.
Test situation 1: Run test as defined above
Test situation 2: Run test as defined above, but let the ebs controller run on the user node

Both worked out. Due to this, I figure we go with removing the nodeGroup's addition of iam for ebs stuff entirely and see if it works, if not, we add it. This would be a win for us, as its easier to just add the addon to the EKS based k8s cluster than to node pools with delete/create operations.

In 8fe4009 I remove the nodeGroup's iam ebs stuff and this is ready to merge. In case things won't work, lets add it back to the core node pool.

yuvipanda · 2023-01-18T21:08:42Z

@consideRatio sounds good to me.

consideRatio · 2023-01-18T21:37:58Z

@GeorgianaElena when deploying the NASA VEDA hub, be on the lookout for a failure to schedule the prometheus pod of the support chart for example and an associated PVC resource that is stuck pending. That would indicate that I didn't get everything right in this PR - write to me on slack happens and we'll resolve it.

GeorgianaElena · 2023-01-19T10:45:48Z

Will do! Thanks @consideRatio!

AWS EKS infra: transition to k8s 1.24 and add required storage addon

8b43192

consideRatio force-pushed the pr/update-eksctl-cluster-config-template branch from 5480f8f to 8b43192 Compare January 18, 2023 18:30

yuvipanda approved these changes Jan 18, 2023

View reviewed changes

consideRatio commented Jan 18, 2023

View reviewed changes

eksctl/template.jsonnet Show resolved Hide resolved

consideRatio added the Engineering:SRE Cloud infrastructure operations and development. label Jan 18, 2023

consideRatio self-assigned this Jan 18, 2023

consideRatio mentioned this pull request Jan 18, 2023

Document procedure and migrate existing AWS EKS based hubs from k8s 1.21+ to 1.24+ #2057

Closed

10 tasks

AWS EKS infra: remove seemingly not needed ebs iam

8fe4009

consideRatio merged commit 393ac93 into 2i2c-org:master Jan 18, 2023

consideRatio mentioned this pull request Jan 18, 2023

Remove outdated comment #2058

Merged

consideRatio mentioned this pull request Jan 23, 2023

Misc documentation updates I'd like to make #2041

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS EKS infra: transition to k8s 1.24 and add required storage addon #2056

AWS EKS infra: transition to k8s 1.24 and add required storage addon #2056

consideRatio commented Jan 18, 2023 •

edited

Loading

yuvipanda left a comment

consideRatio commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

consideRatio commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

consideRatio commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

consideRatio commented Jan 18, 2023 •

edited

Loading

GeorgianaElena commented Jan 19, 2023

AWS EKS infra: transition to k8s 1.24 and add required storage addon #2056

AWS EKS infra: transition to k8s 1.24 and add required storage addon #2056

Conversation

consideRatio commented Jan 18, 2023 • edited Loading

yuvipanda left a comment

Choose a reason for hiding this comment

consideRatio commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

consideRatio commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

consideRatio commented Jan 18, 2023

yuvipanda commented Jan 18, 2023

consideRatio commented Jan 18, 2023 • edited Loading

GeorgianaElena commented Jan 19, 2023

consideRatio commented Jan 18, 2023 •

edited

Loading

consideRatio commented Jan 18, 2023 •

edited

Loading