Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite networking to have proper round-robin across instances and retries #218

Merged
merged 6 commits into from
Sep 16, 2024

Conversation

sosthene-nitrokey
Copy link
Contributor

@sosthene-nitrokey sosthene-nitrokey commented Sep 5, 2024

This PR:

  • Adds state for each instance in a slot. This state stores information on whether the instance is is a "failed" state or not
  • On failure (IO or 5xx or 412), mark the instance as failed
  • When an instance is marked as failed and threads are allowed, spawn a background thread to retry it with some backoff.
  • Don't use failed instances
  • If all instances are failed, try failed instances anyway (this will especially happen in the case where spawning threads is not allowed).
  • On IO failure, remove all the idle connections

Depends on:

This will allow keeping track of the "bad" status in each instance
This also makes the round-robin real for the slots, not just for the session
This also rejects slots with no instances
Retry count should only count retries, not the number of attempts.
A retry count of 0  should still mean 1 connection attempt.

A retry count of N should mean 1 attempt + N retry attempt
@sosthene-nitrokey
Copy link
Contributor Author

In the single threaded case it would probably be nice to perform a health-check request with a very short timeout (1s) regularly even if it means slowing down the "real" request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants