Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerate (partial) connection failures in endpoints in the Balancer Client #39

Open
rrichardson opened this issue Jun 6, 2022 · 1 comment

Comments

@rrichardson
Copy link

Motivation:

We connect to a quorum of etcd servers across regions (not the recommended architecture, but it works quite well)

For various reasons, a small subset of the nodes might be unavailable.
This should instead tolerate failures and adjust the pool accordingly, if that is the desire of the consumer of the API.

This functionality lives in the tower::balancer and tonic::transport::service behavior. The discovery mechanism in balancer_channel connects "lazily" upon receiving its requests. It appears to connect to all endpoints, but if one fails, the entire operation fails.

It seems like the only option here is to work with the Tower team to provide a partial success route. This is preferred not only because it is the right thing for initial connection, but should provide the proper behavior in an ongoing fashion.

I will continue to pursue this approach, but I'd like to leave this ticket open because there will likely be some (hopefully non-breaking) changes to the etcd client to optionally utilize the partial-success behavior.

@sylzd
Copy link

sylzd commented Apr 8, 2024

ditto~ And it's an important issue I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants