-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
research AWS retries, reduce fetch hanging behaviour #1391
Comments
"throttling warnings" don't actually always mean that all retries (from maxRetries) has been exhausted. It is also possible to get throttling warnings even if the highest number of retries was 4.
Working theory is that "retryTokens" are shared between different API calls, so even if every API call doesn't retry 20 times, the retry-tokens still get exhausted. |
Definitely looks like there are deeper questions here. However, I would also support not investigating this super-deep and just setting:
|
I am reminded of a conversation with @bbernays where he said that part of the throttling we are seeing happens in the client-side (i.e. in the https://github.com/aws/aws-sdk-go). |
Also mentioning @roneli @spangenberg @disq for completeness of discussion |
Attaching summary of retries (extracted from logs with python-script). |
aws sdk has provisions for setting any error type as "retryable" and it does a good job of specifying which ones should be retried - source. I don't think there's any way to specify some errors to have different retry quotas or make it exempt from client side throttling depending on the type of error (but still have it retry). |
TODO: remove suggestion to user to increase their max_retries. It is unhelpful and might cause what users perceive as "hangs". |
…868) See context at https://github.com/cloudquery/cq-provider-aws/issues/853. Increasing max_retries will not have a positive effect in most cases, and will likely only exacerbate any "hanging" issues.
Leaving this bug as "research AWS retries, reduce fetch hanging behaviour". |
Summary:
Further research is required. I think we can leave this as low priority until we get customer-reported hangs. |
* chore: Synced file(s) with cloudquery/.github * Add missing workflow file Co-authored-by: cq-bot <null> Co-authored-by: Herman Schaaf <[email protected]>
Per my latest research AWS |
Describe the bug
too high max_retries and max_backoff can cause "hanging" behaviour, while all goroutines "sleep" to wait for retries.
Expected Behavior
It's better to fail than to hang.
Steps to Reproduce
the hanging is notoriously difficult to reproduce.
Possible Solution
No response
Provider and CloudQuery version
core: 0.22.10, aws: 0.11.2
Additional Context
No response
The text was updated successfully, but these errors were encountered: