Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement high availability control plane #1

Open
gothub opened this issue Jun 2, 2021 · 6 comments
Open

Implement high availability control plane #1

gothub opened this issue Jun 2, 2021 · 6 comments
Assignees

Comments

@gothub
Copy link
Contributor

gothub commented Jun 2, 2021

Maintenance tasks such as k8s upgrades, OS upgrades and re-configurations (disk, etc) can require k8s nodes to be offline for reconfiguration and rebooting.

Minimize k8s service disruptions when these maintenance tasks are performed by:

  • configuring a multi-master k8s configuration
  • implement high availability services where possible
    • currently one pod or service instance of these items is running at any time for metadig:
      • metadig-controller
      • rabbitmq
      • metadig-nginx-controller
      • metadig-scheduler
      • metadig Postgres server
      • metadig-scorer
  • use appropriate k8s management tools to aid this process, such as draining worker nodes to prepare them for maintenance

This issue supercedes NCEAS/metadig-engine#287

@gothub gothub self-assigned this Jun 2, 2021
@gothub gothub changed the title Develop a process to reboot k8s nodes with no service downtime Implement high availability control plane Jun 14, 2021
@gothub
Copy link
Contributor Author

gothub commented Jun 14, 2021

Some approaches to implementing a high availability control plane are detailed here

This document discusses both external load balancers (e.g. HAproxy on external nodes) or software load balancing. For the later configuration, keepalive and haproxy run on the control plane nodes, so an external load balancer is not required to switch control to a new active cluster control node in case the current primary becomes unavailable.

With either configuration (external load balancing or internal) extra nodes would need to be added to the cluster that could act as the stand by control nodes.

@gothub
Copy link
Contributor Author

gothub commented Jul 8, 2021

BTW - the link shown above (https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md) uses kubeadm to implement a 3 control-node HA k8s cluster, with a 'stacked' etcd cluster, or optionally with the etcd nodes external to the cluster.

@nickatnceas
Copy link
Contributor

Two VMs, k8s-ctrl-2 and k8s-ctrl-3 have been provisioned for K8s over in https://github.nceas.ucsb.edu/NCEAS/Computing/issues/98

The physical: virtual layout of the control plane VMs is:

host-ucsb-6: k8s-ctrl-1
host-ucsb-7: k8s-ctrl-2
host-ucsb-8: k8s-ctrl-3

@nickatnceas
Copy link
Contributor

In a Slack discussion we decided to setup backups for K8s and K8s-dev before converting our install to HA.

We may need to upgrade K8s before the HA changes, which in turn may require an OS upgrade on the existing controllers.

@nickatnceas
Copy link
Contributor

I'm starting a migration checklist to deploy HA on k8s-dev:

  • Deploy a load balancer on hosts k8s-dev-ctrl-1, k8s-dev-ctrl-2, and k8s-dev-ctrl-3
    • Choose between keepalived and haproxy vs kube-vip
      • keepalived and haproxy: Choose between services vs static pods
      • kube-vip: Choose between ARP vs BGP
    • Deploy $choices to ctrl hosts 1-3
      • Install kubeadm and kubelet to k8s-dev-ctrl-2 and k8s-dev-ctrl-3 (!)
      • Continue setting up $choices...
      • Change DNS for k8s-dev services from k8s-dev-ctrl-1 IP to load balancer IP
  • Migrate etcd to a cluster
  • Migrate the Control Plane to a Stacked Control Plane

Links:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/
https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md

@nickatnceas
Copy link
Contributor

I'm stuck on needing to install the same version of K8s on k8s-dev-ctrl-2 and k8s-dev-ctrl-3. I'm not able to find any 1.23.3 .deb packages hosted on official websites or repositories. The older repo which hosted 1.23.3 was apparently shut down, and the newer repo starts at version 1.24: https://kubernetes.io/blog/2023/08/31/legacy-package-repository-deprecation/

There are some ways around this, such as cloning k8s-dev-ctrl-1 RBD and running an kubeadm reset, installing binaries, etc. These have some tradeoffs, take time, etc. The best option is probably going to be upgrading K8s from 1.23 to 1.24 (or later), as we need to upgrade anyway, and then migrating to HA from there.

I'm moving over to the upgrade issue: #35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants