Discuss etcd load balancing within Options for Highly Available Topology #40028

natereid72 · 2023-03-15T17:08:54Z

I thought that kube-apiserver utilized gRPC client-side load balancing. I see in the Operating etcd clusters for Kubernetes mention of using a load balancer in front of etcd cluster, with single etcd address (LBs) supplied to control plane (See here)

Is the use of a load balancer in front of the etcd cluster required for load balancing etcd to kube-apiserver? Or is there a benefit to it? Perhaps some explanation of the reason one would consider that option would be useful.

I can see that not having to update kube-apiserver etcd config when adding/removing etcd nodes is one possible benefit. But I don't know what the cons might be of using external load balancer in front of etcd vs. client-side lb.

sftim · 2023-03-15T17:59:12Z

@natereid72 Can you explain what page isn't right, and what about that page should be improved?

I can see that https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/ exists but I'm not sure what improvement you're proposing. I think the advice in https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/ is accurate.

You might be suggesting that we explain the pros and cons of using a managed load balancer in front of the cluster, versus having each API server be aware of the individual etcd instances. Is that what you had in mind?

(If so, I think I'd like to see an evergreen - ie, maintained - blog article about that)

/language en
/sig cluster-lifecycle

cenkalti · 2023-03-15T18:07:47Z

Use of external load balancer is not required to run Kubernetes but it is an option.

The document should suggest between using client side load-balancing or external load balancer. If not possible to give suggestion, there should be a section that lists pros and cons for each option.

natereid72 · 2023-03-15T18:13:32Z

You might be suggesting that we explain the pros and cons of using a managed load balancer in front of the cluster, versus having each API server be aware of the individual etcd instances. Is that what you had in mind?

The document should suggest between using client side load-balancing or external load balancer. If not possible to give suggestion, there should be a section that lists pros and cons for each option.

Yes to both. I am not aware of the tradeoffs between client-side vs. external/managed LB. If someone that is could add that to the existing page, it would be very helpful.

Perhaps the pros and cons would be best detailed in this doc: Options for Highly Available Topology

sftim · 2023-03-16T09:49:59Z

/retitle Discuss etcd load balancing within Options for Highly Available Topology

/sig scalability
/triage accepted
/priority backlog

natereid72 · 2023-03-18T21:13:03Z

~~jfyi additional info. From my testing with external etcd cluster, and kube-apiserver configured to use any of them (vs. one etcd node addr per kube-apiserver config), things go haywire.~~

~~So I'm not suspecting that the kube-apiserver's use of etcd client handles client-side LB well enough to go that route. I'm guessing this is why a managed/external LB is mentioned in the current docs.~~

Ignore that for now. It may have just been a temporary unrelated symptom.

baumann-t · 2023-03-25T18:50:33Z

@sftim @natereid72 I think the explanations and pros and cons listed on this page in the gRPC documentation provides answers to your questions: https://grpc.io/blog/grpc-load-balancing/
If that's what you were looking for, I can add the link to the documentation or I could draft a paragraph to be added.

natereid72 · 2023-03-27T00:18:47Z

Thanks @sftim. On first take of that doc, I read it as leading to the the right choice for K8s config to be relying on client-side architecture. So perhaps just removing the proxy LB architecture reference from the K8s docs altogether is the right path?

sftim · 2023-03-27T08:50:43Z

removing the proxy LB architecture reference from the K8s docs altogether

Using a load balancer is still a viable option (this is almost a tenet of non-abstract scalable architecture design). For example, we might not trust Kubernetes to kill etcd nodes, but we might allow a load balancer to send that shutdown signal.

natereid72 · 2023-03-28T00:54:29Z

Using a load balancer is still a viable option (this is almost a tenet of non-abstract scalable architecture design). For example, we might not trust Kubernetes to kill etcd nodes, but we might allow a load balancer to send that shutdown signal.

I think this would require the assumption that etcd was being managed by Kubernetes; no? Of course the etcd cluster can be an STS, or Deployment, as statice pods, and/or completely outside of the kube-apiserver's purview.

From what I understand of etcd client-side LB, this is handled either way.

From the doc referenced above:

Deciding between proxy versus client-side load balancing is a primary architectural choice. In Proxy load balancing, the client issues RPCs to the a Load Balancer (LB) proxy. The LB distributes the RPC call to one of the available backend servers that implement the actual logic for serving the call. The LB keeps track of load on each backend and implements algorithms for distributing load fairly. The clients themselves do not know about the backend servers. Clients can be untrusted. This architecture is typically used for user facing services where clients from open internet can connect to servers in a data center, as shown in the picture below. In this scenario, clients make requests to LB (#1). The LB passes on the request to one of the backends (#2), and the backends report load to LB (#3).

and:

In Client side load balancing, the client is aware of multiple backend servers and chooses one to use for each RPC. The client gets load reports from backend servers and the client implements the load balancing algorithms. In simpler configurations server load is not considered and client can just round-robin between available servers. This is shown in the picture below. As you can see, the client makes request to a specific backend (#1). The backends respond with load information (#2), typically on the same connection on which client RPC is executed. The client then updates its internal state.

So linking the grpc LB doc doesn't clear it up for me.

sftim · 2023-03-28T08:51:04Z

I think this would require the assumption that etcd was being managed by Kubernetes; no? Of course the etcd cluster can be an STS, or Deployment, as statice pods, and/or completely outside of the kube-apiserver's purview.

No: some kinds of load balancer can send a signal to fence or shut down unhealthy targets, even when you don't use Kubernetes. You can run etcd on cloud compute and with a load balancer, without any Kubernetes at all. And then - if you want to - you can point one or more kube-apiserver at the IP address(es) of that load balancer. The number of load balancer IP addresses might or might not be the same as the expected number of etcd cluster members.

(If you'd like to discuss different ways to run Kubernetes and its components, https://discuss.kubernetes.io/ is a good place to have that conversation.)

So, I think we have an opportunity to cover the more unusual cases enough that readers don't see them as unviable or prohibited. At the same time, it's helpful to steer readers towards the most common architectures. A typical reader just wants to set up a cluster, rather than learn how to architect control planes for special scenarios.

natereid72 · 2023-03-28T21:53:07Z

No: some kinds of load balancer can send a signal to fence or shut down unhealthy targets

Fair enough, I misread that originally. I agree that this is a pro in the case of external LB. I think the client-side LB of etcd client has this functionality though.

even when you don't use Kubernetes. You can run etcd on cloud compute and with a load balancer, without any Kubernetes at all. And then - if you want to - you can point one or more kube-apiserver at the IP address(es) of that load balancer.

Aware of this, this is how I've pointed kube-apiserver to external etcd node cluster. And is actually what started my thought on the OP here, when I read about using an external LB in the docs.

(If you'd like to discuss different ways to run Kubernetes and its components, https://discuss.kubernetes.io/ is a good place to have that conversation.)

Thanks for that reference. I was unaware of it. That is great to know of. I will definitely use that for topics like this moving forward.

So, I think we have an opportunity to cover the more unusual cases enough that readers don't see them as unviable or prohibited. At the same time, it's helpful to steer readers towards the most common architectures.

Yes, I think this is where I was landing on it, thanks much.

natereid72 · 2023-03-30T23:15:15Z

@baumann-t I inadvertently replied to your post, to Tim above. I think that inclusion of gRPC link would suffice. I will admit that after considering that info, and after configuring an HAproxy setup for the etcd cluster, it seems the use case of Kubernetes with etcd client to etcd cluster is best served with client-side LB. I'm still not certain it ever makes sense to use a managed/external LB in this scenario.

natereid72 · 2023-04-01T03:50:50Z

wrote this blog post covering my thoughts on this topic. I'm happy to close this issue if there is no further need to clarify the K8s docs.

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 15, 2023

natereid72 changed the title ~~Operating etcd clusters for Kubernetes - Loadbalancer~~ docs: Operating etcd clusters for Kubernetes - Loadbalancer Mar 15, 2023

k8s-ci-robot added language/en Issues or PRs related to English language sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Mar 15, 2023

k8s-ci-robot changed the title ~~docs: Operating etcd clusters for Kubernetes - Loadbalancer~~ Discuss etcd load balancing within Options for Highly Available Topology Mar 16, 2023

natereid72 closed this as completed Apr 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss etcd load balancing within Options for Highly Available Topology #40028

Discuss etcd load balancing within Options for Highly Available Topology #40028

natereid72 commented Mar 15, 2023 •

edited

Loading

sftim commented Mar 15, 2023

cenkalti commented Mar 15, 2023

natereid72 commented Mar 15, 2023 •

edited

Loading

sftim commented Mar 16, 2023

natereid72 commented Mar 18, 2023 •

edited

Loading

baumann-t commented Mar 25, 2023

natereid72 commented Mar 27, 2023

sftim commented Mar 27, 2023

natereid72 commented Mar 28, 2023 •

edited

Loading

sftim commented Mar 28, 2023

natereid72 commented Mar 28, 2023

natereid72 commented Mar 30, 2023

natereid72 commented Apr 1, 2023

Discuss etcd load balancing within Options for Highly Available Topology #40028

Discuss etcd load balancing within Options for Highly Available Topology #40028

Comments

natereid72 commented Mar 15, 2023 • edited Loading

sftim commented Mar 15, 2023

cenkalti commented Mar 15, 2023

natereid72 commented Mar 15, 2023 • edited Loading

sftim commented Mar 16, 2023

natereid72 commented Mar 18, 2023 • edited Loading

baumann-t commented Mar 25, 2023

natereid72 commented Mar 27, 2023

sftim commented Mar 27, 2023

natereid72 commented Mar 28, 2023 • edited Loading

sftim commented Mar 28, 2023

natereid72 commented Mar 28, 2023

natereid72 commented Mar 30, 2023

natereid72 commented Apr 1, 2023

natereid72 commented Mar 15, 2023 •

edited

Loading

natereid72 commented Mar 15, 2023 •

edited

Loading

natereid72 commented Mar 18, 2023 •

edited

Loading

natereid72 commented Mar 28, 2023 •

edited

Loading