-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazon Builder not retrying after RequestLimitExceeded - throttling issue #6330
Comments
Throttling shouldn't have status code 503. See https://docs.aws.amazon.com/health/latest/APIReference/CommonErrors.html |
I appreciate your point, however the status code is clearly 503, and the error is RequestLimitExceeded - and a lot of Amazon's documentation refers to that as a throttling error. So I guess being pragmatic one would expect to deal with that error as per other throttling errors. |
Coul you add some more lines of the log so we know which API call this is for? |
"Error on AMIStateRefresh" narrows it down precisely - that message only appears once in the codebase. I have asked Amazon for more info on the specific request ID, and if I get anything I'll post it here. |
No it doesn't.. There are multiple places where we are waiting for AMI's to become available. |
Information from Amazon: Regarding the specific request provided in the case: this was a DescribeImages API call I see also that for the same time-frame you got other API calls throttled, most of them Describe actions (e.g. for DescribeInstances, DescribeInstanceAttribute, DescribeSecurityGroups, etc.) That said, being that you are using a programmatic approach to API calls, I would recommend you to implement exponential back-off in order to retry calls and make sure they will succeed. Reference documentation on this can be found here: I've checked the code base again and that message is only used in that one procedure call but I think I may have misunderstood what you were asking - here it is in context of the output:
|
Oh I taught #6177 was released, but apparently it is awaiting 1.3.0 which is the next release. It should handle this correctly. |
That's great, thanks! Is there an estimated release date? (So I can shove the ticket on our backlog with a "revisit after" date) |
Unfortunately no, maybe @SwampDragons or @mwhooker have one. |
we just put out 1.2.4 and normally we wait ~ 6 weeks between releases; I would guess mid-July at the earliest. That said, I'm working on #6177 now, so if you want to follow that ticket I can build you a binary when it's done so you can test it out next week or so. |
@CarlInglisBJSS thanks for the report. Hopefully #6177 will solve this, and it would be really useful if you could help us test that when it’s ready. In the mean time, I would suggest increasing the AWS_POLL_DELAY_SECONDS env var as a workaround. |
I'm watching that issue and will happily help test it when it's ready. I'll try the AWS_POLL_DELAY_SECONDS change - thanks. |
Our existing AWS_POLL_DELAY_SECONDS was 30 seconds. I have added a random 0-9 seconds to that for each invocation to stagger the number of simultaneous calls. This, coupled with a random 1-10 second at the start of each invocation, appears (based on one set of 15 simultaneous starts) to have given us a work around. EDIT: Now up to 3 different sets of 15 simultaneous starts of the script - looks like this is working! Thanks. |
This was fixed by #6332 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Packer version: 1.2.3
Platform: Linux
Builder: amazon-ebs
From my examination of the code (I'm not a Go programmer) it appears that function
WaitForState
is relying onisTransientNetworkError
to determine retries, and503
isn't classed as a retryable error.I can't tell if the go sdk is retrying, however I suspect not given how quickly the error is being returned.
I have opened a support case with Amazon to get more information on the specific request and I will post anything significant that I get back from them to this issue.
We do have multiple packers running at the time, however the issue I am concerned about is not that, it's that the builder isn't retrying after throttling.
It is of course possible that I've misunderstood something in the Go code. If that's the case I'd appreciate being pointed at my misunderstanding. Thanks.
The text was updated successfully, but these errors were encountered: