Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Kubernetes 1.27 #3948

Closed
8 tasks done
smerle33 opened this issue Feb 12, 2024 · 12 comments
Closed
8 tasks done

Upgrade to Kubernetes 1.27 #3948

smerle33 opened this issue Feb 12, 2024 · 12 comments
Assignees

Comments

@smerle33
Copy link
Contributor

smerle33 commented Feb 12, 2024

Service(s)

Other

Summary

Previous upgrade (1.26): #3683

Depreciation timelines for 1.26 (justifying the upgrade to 1.27):


Task list:

Reproduction steps

No response

@smerle33 smerle33 added the triage Incoming issues that need review label Feb 12, 2024
Copy link

smerle33 added a commit to smerle33/docker-helmfile that referenced this issue Feb 12, 2024
@smerle33
Copy link
Contributor Author

smerle33 commented Feb 12, 2024

Digital Ocean Clusters (as per 1.26 here #3683 (comment)):

Post Mortem

The upgrade as-code failed with the error Error: Unable to upgrade cluster version: POST https://api.digitalocean.com/v2/kubernetes/clusters/<redacted>/upgrade: 422 (request "<redacted>") **invalid upgrade path**]

The Digital Ocean UI explained a problem to upgrade to the latest patch version (handled by digital ocean directly on Sundays) : #3948 (comment)

We add to correct the webhook timeout to proceed the patch upgrade (through Digital Ocean UI) and then remove a "hack" we used (using a data source to configure the Kubernetes provider and decouple it from the cluster resource itself).

The Upgrade as-code went then smoothly.

As such, the upgrade procedure is amended to the following workflow:

  • Disable Kubernetes-Management for Digital Ocean clusters
  • Keep the terraform PRs (one or 2 clusters at a time, both can be done)
  • Check the plans
  • Merge the PRs and check the output
  • Enable Kubernetes-Management for Digital Ocean clusters

smerle33 added a commit to jenkins-infra/packer-images that referenced this issue Feb 13, 2024
@dduportal dduportal added this to the infra-team-sync-2024-02-20 milestone Feb 13, 2024
@dduportal dduportal removed the triage Incoming issues that need review label Feb 13, 2024
@smerle33 smerle33 self-assigned this Feb 14, 2024
smerle33 added a commit to jenkins-infra/kubernetes-management that referenced this issue Feb 15, 2024
dduportal pushed a commit to jenkins-infra/kubernetes-management that referenced this issue Feb 15, 2024
@smerle33
Copy link
Contributor Author

smerle33 commented Feb 15, 2024

For Digital Ocean, to use terraform plan on local computer, we need to provide a token (no doctl auth)
export DIGITALOCEAN_ACCESS_TOKEN="$(security find-generic-password -a doctl-token-infra -w)";

@dduportal
Copy link
Contributor

Update for DigitalOcean (on behalf of @smerle33 ):

  • The upgrade through Terraform was not possible during previous upgrades due to a bug with (Terraform 1.1 + DigitalOcean terraform provider + Kubernetes provider) in the same project. Now that we have Terraform 1.6+, the "hack" we used (using a data source to configure the Kubernetes provider and decouple it from the cluster resource itself) can be removed. See feat(doks-public) bump Kubernetes version from 1.26.to 1.27 digitalocean digitalocean#179

  • The upgrade of doks-public from 1.26.12-do.0 to 1.27.10-do.0 looked good on feat(doks-public) bump Kubernetes version from 1.26.to 1.27 digitalocean digitalocean#179 but failed with the following error:

    Error: Unable to upgrade cluster version: POST https://api.digitalocean.com/v2/kubernetes/clusters/<redacted>/upgrade: 422 (request "<redacted>") invalid upgrade path]
    
  • Looking at the DigitalOcean console showed there was a patch upgrade to 1.26.13-do0. This updates should have been applied last Sunday automatically (as per https://github.com/jenkins-infra/digitalocean/blob/41f59716187b14fcc14ba4d4ff14eb8015db2c6f/doks-public-cluster.tf#L21-L24) but it did not.

    • Trying to apply the patch manually from the DigitalOcean cloud console failed with the following errors (along with the 30-ish usual warnings related to hostpath and resource requests/limits due to datadog and certmanager):

      Capture d’écran 2024-02-16 à 11 43 49
      Validating webhook with a TimeoutSeconds value smaller than 1 second or greater than 29 seconds will block upgrades.
      Resources:
          validating webhook configuration: cert-manager-webhook
      
      Mutating webhook with a TimeoutSeconds value smaller than 1 second or greater than 29 seconds will block upgrades.
      Resources:
          mutating webhook configuration: cert-manager-webhook
      
  • Short term actions: we decided to start by upgrading to the latest patch so we can retry the upgrade 1.26.x to 1.27.x again with terraform.

@dduportal
Copy link
Contributor

dduportal commented Feb 16, 2024

Next steps:

@smerle33
Copy link
Contributor Author

doks-public seems ok for upgrade:

Capture d’écran 2024-02-19 à 14 43 48

@smerle33
Copy link
Contributor Author

need to create a updatecli manifest to track kubectl on trusted agent and add it to the "process" for upgrading with the kubectl cli for packer-images.

@smerle33
Copy link
Contributor Author

smerle33 commented Mar 7, 2024

to get last addon versions :

aws eks describe-addon-versions --profile prod  --kubernetes-version 1.27 --addon-name coredns --region us-east-2
aws eks describe-addon-versions --profile prod  --kubernetes-version 1.27 --addon-name kube-proxy --region us-east-2
aws eks describe-addon-versions --profile prod  --kubernetes-version 1.27 --addon-name vpc-cni --region us-east-2
aws eks describe-addon-versions --profile prod  --kubernetes-version 1.27 --addon-name aws-ebs-csi-driver --region us-east-2

coredns ; v1.10.1-eksbuild.7
kube-proxy: v1.27.10-eksbuild.2
vpc-cni: v1.16.4-eksbuild.2
aws-ebs-csi-driver: v1.28.0-eksbuild.1

@smerle33
Copy link
Contributor Author

smerle33 commented Mar 12, 2024

AZURE

Update: AKS Upgrade plan privatek8s

  • Check changelog:

Current changelog notable elements for Kubernetes 1.27:

privatek8s

@smerle33
Copy link
Contributor Author

smerle33 commented Mar 12, 2024

AZURE

Update: AKS Upgrade plan publick8s

publick8s

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

we took the opportunity to deploy nginx-ingress jenkins-infra/kubernetes-management#5032 and falco jenkins-infra/kubernetes-management#4997 upgrade.
This process should be added to the preparation for each upgrade of kubernetes, to take advantage of the downtime for important upgrades like those

@smerle33
Copy link
Contributor Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants