Rewrite networking to have proper round-robin across instances and retries #218

sosthene-nitrokey · 2024-09-05T15:06:28Z

This PR:

Adds state for each instance in a slot. This state stores information on whether the instance is is a "failed" state or not
On failure (IO or 5xx or 412), mark the instance as failed
When an instance is marked as failed and threads are allowed, spawn a background thread to retry it with some backoff.
Don't use failed instances
If all instances are failed, try failed instances anyway (this will especially happen in the case where spawning threads is not allowed).
On IO failure, remove all the idle connections

Depends on:

This will allow keeping track of the "bad" status in each instance This also makes the round-robin real for the slots, not just for the session This also rejects slots with no instances

Retry count should only count retries, not the number of attempts. A retry count of 0 should still mean 1 connection attempt. A retry count of N should mean 1 attempt + N retry attempt

sosthene-nitrokey · 2024-09-13T11:28:07Z

In the single threaded case it would probably be nice to perform a health-check request with a very short timeout (1s) regularly even if it means slowing down the "real" request.

sosthene-nitrokey force-pushed the rewrite-networking branch 3 times, most recently from ab93224 to 334812d Compare September 10, 2024 09:03

sosthene-nitrokey force-pushed the rewrite-networking branch from e41521f to 2414f2f Compare September 13, 2024 09:34

sosthene-nitrokey marked this pull request as ready for review September 13, 2024 09:37

sosthene-nitrokey requested review from daringer, ansiwen and robin-nitrokey September 13, 2024 09:55

sosthene-nitrokey added 6 commits September 13, 2024 11:57

LoginCtx: Store reference to the entire slot

e8e5774

This will allow keeping track of the "bad" status in each instance This also makes the round-robin real for the slots, not just for the session This also rejects slots with no instances

Remove unrequired LoginCtx clones

014bdaa

Store metadata with the instance configuration

f10604d

Run regular retries for new failed instances

366c3f9

Fix retry count

94b8838

Retry count should only count retries, not the number of attempts. A retry count of 0 should still mean 1 connection attempt. A retry count of N should mean 1 attempt + N retry attempt

Clear all connections pool on IO errors

69abbac

sosthene-nitrokey force-pushed the rewrite-networking branch from 2414f2f to 69abbac Compare September 13, 2024 09:57

daringer approved these changes Sep 16, 2024

View reviewed changes

sosthene-nitrokey merged commit ff3b7de into main Sep 16, 2024
6 checks passed

sosthene-nitrokey deleted the rewrite-networking branch September 16, 2024 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite networking to have proper round-robin across instances and retries #218

Rewrite networking to have proper round-robin across instances and retries #218

sosthene-nitrokey commented Sep 5, 2024 •

edited

Loading

sosthene-nitrokey commented Sep 13, 2024

Rewrite networking to have proper round-robin across instances and retries #218

Rewrite networking to have proper round-robin across instances and retries #218

Conversation

sosthene-nitrokey commented Sep 5, 2024 • edited Loading

sosthene-nitrokey commented Sep 13, 2024

sosthene-nitrokey commented Sep 5, 2024 •

edited

Loading