Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement short retries in single threaded context #223

Merged
merged 6 commits into from
Sep 19, 2024

Conversation

sosthene-nitrokey
Copy link
Contributor

@sosthene-nitrokey sosthene-nitrokey commented Sep 16, 2024

Also, fix round retry mechanism not being used for login This two are merged because the changing of the next_instance signature showed that login() did not use the retry mechanism.

This does not count in the timeout calculation for each requests, so requests are limited to perform up to 3 health check at most (each with 1s timeout, so 3 seconds total).

This also makes all tests run both in single-threaded an multi-threaded modes.

This also removes some memory leaks in tests and in the Finalize call (the background thread did not close the instances it had.

Also, fix round retry mechanism not being used for login
This two are merged because the changing of the `next_instance` signature
showed that `login()` did not use the retry mechanism
Instances stored a Arc reference in the background thread
When `Finalize` is called, this thread would not get cleaned up.

This instead uses `Weak` to clean it up over time in the worst case,
all is cleaned up after 5 minutes
@sosthene-nitrokey
Copy link
Contributor Author

Once this is merged, we can fix #209:

  • Failing Instances are marked as bad and retried in a background thread, so they won't be tried unless all instances are unreachable
  • If no background thread can be spawned (CKF_LIBRARY_CANT_CREATE_OS_THREADS), failed instances will be tried during normal operations, slowing down the requests. To minimise this, such "inline" health checks are limited to 1 seconds timeouts, and only 3 health checks can be attempted per request (can only be reached if a large numbe of instances are failed).
  • The total number of requests is: num_retries + 1
  • The total timeout for 1 request attempt is: (num_retries + 1)*timeout_seconds + 3
  • The total timeout for 1 PKCS11 function call will vary because some functions will lead to multiple API calls in the nethsm.

@sosthene-nitrokey
Copy link
Contributor Author

sosthene-nitrokey commented Sep 18, 2024

@robin-nitrokey @daringer can you please review ?

@sosthene-nitrokey sosthene-nitrokey merged commit d01720b into main Sep 19, 2024
6 checks passed
@sosthene-nitrokey sosthene-nitrokey deleted the improve-retries-single-thread branch September 19, 2024 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants