Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes upgrade #1829

Open
jefflill opened this issue Jul 25, 2023 · 3 comments
Open

Kubernetes upgrade #1829

jefflill opened this issue Jul 25, 2023 · 3 comments
Assignees
Labels
cluster-setup neonKUBE cluster setup neon-kube Related to our Kubernetes distribution

Comments

@jefflill
Copy link
Collaborator

jefflill commented Jul 25, 2023

Kubernetes v1.24 reaches end-of-life status as of Friday (July 28, 2023) so it's time to upgrade to the latest v1.27 patch version and it also looks like 1.28 is coming soon. Now that the first friend/family beta release is (almost) ready, it's time to start thinking about this upgrade.

I think a reasonable approach would be to do this upgrade in 6 steps, ultimately upgrading to v1.27.4:

1.24.0  --> 1.25.0 --> 1.25.12
1.25.12 --> 1.26.0 --> 1.26.7
1.26.7  --> 1.27.0 --> 1.27.4

I think this will be easier than doing this all in one go but I believe we can skip the intermediate patch releases without too much drama. We're going to track these release steps in individual comments below.

GitHub Branches and Versioning

The 0.10.0-beta.0 release is still on master right now and we're going to keep it there until we finalize the release in a day or two and then we'll just patch that release if necessary by incrementing the beta number: 0.10.0-beta.1, 0.10.0-beta.2,...

I figure the next friends/family release version will be 0.11.0-beta.0 and the first GA preview release will be 0.11.0-preview.0. I'm going to do some scouting on a new 0.11.0-beta.0 branch for now but we'll merge that into master after we formally release 0.10.0-beta.0 into it's own release branch. I figure we'll ship GA as 0.11.0 and then increment the minor version number until it looks like things have stabilized and and we see no breaking changes on the horizon (and perhaps upgrade to Kubernetes v1.28, we'll bump our version to 1.0.0.

@jefflill jefflill self-assigned this Jul 25, 2023
@jefflill jefflill added neon-kube Related to our Kubernetes distribution cluster-setup neonKUBE cluster setup debt Engineering debt labels Jul 25, 2023
@jefflill
Copy link
Collaborator Author

jefflill commented Jul 25, 2023

I'm going to go ahead and do the Kubernetes and any necessary container upgrades all at once.

Upgrade Tasks

  • kubernetes v1.24 --> v1.28
  • neonkube v0.10.0-beta.3 --> v0.11.0-beta.0
  • podman v3.4.2 --> 3.4.4+ds1-1ubuntu1.22.04.2
  • CRI-O v1.24.0 -- >v1.28.0
  • etcd v3.5.3-0 --> v3.5.9-0
  • neon cluster setup --debug mode working?
  • neon (cli) tool upgrade
  • pause v3.7 --> v3.9
     
  • calico --> cilium v1.14.4
    • on first control node
    • values.yaml: mode: "cluster-pool" --> "kubernetes"
    • convert cilium-operator into a daemon set running on the control plane
    • restart CRI-O to pick up new CNI (on other control nodes too?)
    • VXLAN mode for Azure? (add IHostingManager.VxLANMode?)
    • enable mTLS (with Wireguard video)
    • use cilium-proxy (with eBPF) instead of kube-proxy
    • istio modifications: link
      • disable mTLS
      • install istio-cli on cluster nodes
         
  • Upgrade KubernetesClient nuget package to v12.1.1 and remove KubernetesClient.Basic and KubernetesClient.Models since that functionality looks like it's included in the main package now: info
     
    • KubernetesClient v12.1.1 is no longer compatible with netstandard2.0. Change these projects to target .NET 7 instead:
      • Neon.Operator.Analyzers
      • Neon.Operator.Core
      • Neon.Kube.Resources
      • Probably a good consolidate Kubernetes utilities into NEONSDK: link
         
  • We also need to stop hardcoding these versions in Helm charts. The easiest solution may be to use PreprocessReader to substitute these constant values using variable references. This will make it much easier to upgrade components in the future by not having to manually chase down and modify version references.
    • Add a KubeVersionAttribute that identifies constants in KubeVersions.cs that will be included in the PreprocessReader variable definitions.
    • Add a KubeVersion.CreatePreprocessor() method that returns a PrteprocessReader that replaces version variable references in a string with the version values.
    • Modify helm chart uploads to do this preprocessing first:
      • node image builds
      • --upload-charts
      • neon-desktop
    • Update all base and setup container image publish/build scripts to reference versions from KubeVersions.cs
    • Add all component versions to the release note templates
    • Add these versions to: KubeVersions.cs
      • bitnami-kubectl
      • bitnami-memcached-exporter
      • busybox
      • calico-cni
      • calico-kube-controllers
      • calico-node
      • calico-pod2daemon-flexvol
      • coredns
      • coredns-plugin
      • coreos-etcd
      • dexidp-dex
      • etcd
      • glauth-glauth
      • goharbor-harbor-operator
      • grafana-agent
      • grafana-agent-operator
      • grafana-grafana
      • grafana-loki
      • grafana-mimir
      • grafana-operator-grafana-operator
      • grafana-operator-grafana_plugins_init
      • grafana-tempo
      • grafana-tempo-query
      • haproxy
      • harbor-chartmuseum-photon
      • harbor-core
      • harbor-exporter
      • harbor-jobservice
      • harbor-notary-server-photon
      • harbor-notary-signer-photon
      • harbor-portal
      • harbor-registry-photon
      • harbor-registryctl
      • harbor-trivy-adapter-photon
      • install-cni
      • jetstack-cert-manager-cainjector
      • jetstack-cert-manager-controller
      • jetstack-cert-manager-webhook
      • jettech-kube-webhook-certgen
      • k8scsi-csi-attacher
      • k8scsi-csi-node-driver-registrar
      • k8scsi-csi-provisioner
      • k8scsi-csi-resizer
      • k8scsi-csi-snapshotter
      • k8scsi-livenessprobe
      • k8scsi-snapshot-controller
      • kiali-kiali
      • kiali-kiali-operator
      • kube-apiserver
      • kube-controller-manager
      • kube-proxy
      • kube-scheduler
      • kube-state-metrics
      • kubernetes-e2e-test-images-dnsutils
      • kubernetesui-dashboard
      • kubernetesui-metrics-scraper
      • memcached
      • metrics-server
      • minio-console
      • minio-minio
      • minio-operator
      • neon-acme
      • neon-cluster-operator
      • neon-node-agent
      • neon-sso-session-proxy
      • node-problem-detector
      • oauth2-proxy-oauth2-proxy
      • oliver006-redis_exporter
      • openebs-cspc-operator
      • openebs-cstor-csi-driver
      • openebs-cstor-istgt
      • openebs-cstor-pool
      • openebs-cstor-pool-manager
      • openebs-cstor-volume-manager
      • openebs-cstor-webhook
      • openebs-cvc-operator
      • openebs-jiva
      • openebs-jiva-csi
      • openebs-jiva-operator
      • openebs-linux-utils
      • openebs-m-exporter
      • openebs-nfs-server-alpine
      • openebs-node-disk-exporter
      • openebs-node-disk-manager
      • openebs-node-disk-operator
      • openebs-provisioner-localpv
      • openebs-provisioner-nfs
      • operator
      • pause
      • pilot
      • prom-blackbox-exporter
      • prometheus-operator-prometheus-config-reloader
      • prometheuscommunity-postgres-exporter
      • proxyv2
      • redis
      • stakater-reloader
      • zalan-acid-pgbouncer
      • zalan-postgres-health-check
      • zalan-postgres-operator
      • zalan-spilo-14
         
  • Rebuild all base images to pick up a few changes:
    • AWS
    • Azure
    • Hyper-V
    • XenServer
       
  • kubelet: relocate command line options to the config file:
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-c>
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubele>
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubele>
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-c>
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --runtime-request-timeout has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-clus>
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet->
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --image-gc-low-threshold has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-clust>
    Nov 15 21:52:59 control-0 kubelet[7497]: Flag --image-gc-high-threshold has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-clus>
    
    It appears that these changes may be required to to get cluster setup to work: info
     
  • Change the cluster default editor to nano: KUBE_EDITOR=nano. Not strictly an upgrade task, but now's a good time for this. We'll do this when creating node images.
     
  • Installer changes:
    • install cilium-cli
    • install istio-cli
    • install hubble-cli?
       
  • Verify that we can deploy a cluster with one control-plane and one worker node (this didn't work with Calico)
     
  • Add a neon cluster delete-evicted-pods command that removes evicted pods from the current namespace, a specified namespace, or all namespaces?
  • Grafana: https://grafana.com/docs/alloy/latest/tasks/migrate/from-operator/

@jefflill
Copy link
Collaborator Author

jefflill commented Jul 25, 2023

1.25.12 --> 1.26.0 --> 1.26.7

@jefflill
Copy link
Collaborator Author

jefflill commented Jul 25, 2023

1.26.7 --> 1.27.0 --> 1.27.4

@jefflill jefflill removed the debt Engineering debt label Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster-setup neonKUBE cluster setup neon-kube Related to our Kubernetes distribution
Projects
None yet
Development

No branches or pull requests

1 participant