Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for exponential backup logic on retries #65

Closed
dagen opened this issue May 16, 2024 · 1 comment · Fixed by #66
Closed

Support for exponential backup logic on retries #65

dagen opened this issue May 16, 2024 · 1 comment · Fixed by #66
Assignees

Comments

@dagen
Copy link

dagen commented May 16, 2024

Is your feature request related to a problem? Please describe.
We operate a large Vault Enterprise cluster and we see significant spikes in requests/sec when some teams refresh their infrastructure (apply a high state to ~10k virtual machines). This results in spikes of a few thousand requests/sec which impacts Vault's performance for other clients.

Describe the solution you'd like
HashiCorp Vault Enterprise supports a rate limiting feature - resource quotas. When implemented, requests to a resource that exceed the limit receive a HTTP Response Code of 429. Example:

Code: 429. Errors:* request path "transit/encrypt/test": rate limit quota exceeded

I'm requesting to support retry logic on reads (and writes) from minions when receiving a 429 response code so that the state apply can gracefully handle situations where the Vault cluster is under too much load. Obviously, you need an 'exit' condition if the 429 response codes persist. I recommend retrying 5 times with increasing delays in http requests to the vault server. (0.2s, 0.3s, 0.5s, 0.8s, 1.3s, exit).

Describe alternatives you've considered
Additional solutions involve asking teams with large infrastructure to 'stage' their rollouts to a few hundred instances at a time. We've also considered providing a dedicated Vault cluster for this one infrastructure team, but that means additional operating costs (cloud spend, observability, alerting, configuration management, etc.).

Additional context

@lkubb
Copy link
Member

lkubb commented May 16, 2024

Agreed, this (= retrying retryable errors with exponential backoff) should be supported.

A nice-to-have would be to respect the Retry-After header, which can be enabled in Vault: https://developer.hashicorp.com/vault/api-docs/system/quotas-config#enable_rate_limit_response_headers

@lkubb lkubb self-assigned this May 16, 2024
@lkubb lkubb closed this as completed in #66 May 19, 2024
lkubb added a commit that referenced this issue May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants