Amazon Builder not retrying after RequestLimitExceeded - throttling issue #6330

CarlInglisBJSS · 2018-06-01T10:20:12Z

Packer version: 1.2.3
Platform: Linux
Builder: amazon-ebs

08:28:26 2018/06/01 07:28:24 packer: 2018/06/01 07:28:24 Waiting for state to become: available  
08:28:26 2018/06/01 07:28:24 packer: 2018/06/01 07:28:24 Using 30s as polling delay (change with AWS_POLL_DELAY_SECONDS)  
08:28:26 2018/06/01 07:28:24 packer: 2018/06/01 07:28:24 Allowing 300s to complete (change with AWS_TIMEOUT_SECONDS)  
08:29:33 2018/06/01 07:29:30 packer: 2018/06/01 07:29:30 Error on AMIStateRefresh: RequestLimitExceeded: Request limit exceeded.  
08:29:33 2018/06/01 07:29:30 packer: 	status code: 503, request id: 2c9c6471-16fc-45c4-9361-d92b6c5a78a1

From my examination of the code (I'm not a Go programmer) it appears that function WaitForState is relying on isTransientNetworkError to determine retries, and 503 isn't classed as a retryable error.

I can't tell if the go sdk is retrying, however I suspect not given how quickly the error is being returned.

I have opened a support case with Amazon to get more information on the specific request and I will post anything significant that I get back from them to this issue.

We do have multiple packers running at the time, however the issue I am concerned about is not that, it's that the builder isn't retrying after throttling.

It is of course possible that I've misunderstood something in the Go code. If that's the case I'd appreciate being pointed at my misunderstanding. Thanks.

The text was updated successfully, but these errors were encountered:

rickard-von-essen · 2018-06-01T10:53:05Z

Throttling shouldn't have status code 503. See https://docs.aws.amazon.com/health/latest/APIReference/CommonErrors.html

CarlInglisBJSS · 2018-06-01T11:08:29Z

I appreciate your point, however the status code is clearly 503, and the error is RequestLimitExceeded - and a lot of Amazon's documentation refers to that as a throttling error. So I guess being pragmatic one would expect to deal with that error as per other throttling errors.

rickard-von-essen · 2018-06-01T12:11:56Z

Coul you add some more lines of the log so we know which API call this is for?

CarlInglisBJSS · 2018-06-01T12:14:15Z

"Error on AMIStateRefresh" narrows it down precisely - that message only appears once in the codebase.

I have asked Amazon for more info on the specific request ID, and if I get anything I'll post it here.

rickard-von-essen · 2018-06-01T12:31:26Z

No it doesn't.. There are multiple places where we are waiting for AMI's to become available.

CarlInglisBJSS · 2018-06-01T12:40:03Z

Information from Amazon:

Regarding the specific request provided in the case:
request id: 2c9c6471-16fc-45c4-9361-d92b6c5a78a1

this was a DescribeImages API call
by userArn [snipped]
from sourceIP [snipped]

I see also that for the same time-frame you got other API calls throttled, most of them Describe actions (e.g. for DescribeInstances, DescribeInstanceAttribute, DescribeSecurityGroups, etc.)

That said, being that you are using a programmatic approach to API calls, I would recommend you to implement exponential back-off in order to retry calls and make sure they will succeed.

Reference documentation on this can be found here:
https://docs.aws.amazon.com/general/latest/gr/api-retries.html
https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html

I've checked the code base again and that message is only used in that one procedure call but I think I may have misunderstood what you were asking - here it is in context of the output:

08:27:54 2018/06/01 07:27:53 ui:     amazon-ebs: Complete!
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [INFO] RPC endpoint: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 [INFO] 19043 bytes written for 'stdout'
08:27:54 2018/06/01 07:27:53 [INFO] 0 bytes written for 'stderr'
08:27:54 2018/06/01 07:27:53 [INFO] RPC client: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 [INFO] RPC endpoint: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [INFO] 19043 bytes written for 'stdout'
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [INFO] 0 bytes written for 'stderr'
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [INFO] RPC client: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [DEBUG] Opening new ssh session
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [DEBUG] starting remote command: rm -f /tmp/script_4177.sh
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [INFO] RPC endpoint: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 [INFO] RPC client: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 [INFO] RPC endpoint: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 packer: 2018/06/01 07:27:53 [INFO] RPC client: Communicator ended with: 0
08:27:54 2018/06/01 07:27:53 [INFO] (telemetry) ending shell
08:27:54 2018/06/01 07:27:53 ui: ==> amazon-ebs: Stopping the source instance...
08:27:54 2018/06/01 07:27:53 ui:     amazon-ebs: Stopping instance, attempt 1
08:27:54 2018/06/01 07:27:53 ui: ==> amazon-ebs: Waiting for the instance to stop...
08:28:26 2018/06/01 07:28:23 ui: ==> amazon-ebs: Creating the AMI: mot2-api_java_mot_test-ops-OPSTEAM-1750-update-packer/40-79081
08:28:26 2018/06/01 07:28:24 ui:     amazon-ebs: AMI: ami-7db7b204
08:28:26 2018/06/01 07:28:24 ui: ==> amazon-ebs: Waiting for AMI to become ready...
08:28:26 2018/06/01 07:28:24 packer: 2018/06/01 07:28:24 Waiting for state to become: available
08:28:26 2018/06/01 07:28:24 packer: 2018/06/01 07:28:24 Using 30s as polling delay (change with AWS_POLL_DELAY_SECONDS)
08:28:26 2018/06/01 07:28:24 packer: 2018/06/01 07:28:24 Allowing 300s to complete (change with AWS_TIMEOUT_SECONDS)
08:29:33 2018/06/01 07:29:30 packer: 2018/06/01 07:29:30 Error on AMIStateRefresh: RequestLimitExceeded: Request limit exceeded.
08:29:33 2018/06/01 07:29:30 packer: 	status code: 503, request id: 2c9c6471-16fc-45c4-9361-d92b6c5a78a1
08:29:33 2018/06/01 07:29:30 ui error: ==> amazon-ebs: Error waiting for AMI: RequestLimitExceeded: Request limit exceeded.
08:29:33 2018/06/01 07:29:30 ui: ==> amazon-ebs: Terminating the source AWS instance...
08:29:33 2018/06/01 07:29:30 packer: 2018/06/01 07:29:30 Waiting for state to become: terminated
08:29:33 2018/06/01 07:29:30 packer: 2018/06/01 07:29:30 Using 30s as polling delay (change with AWS_POLL_DELAY_SECONDS)
08:29:33 2018/06/01 07:29:30 packer: 2018/06/01 07:29:30 Allowing 300s to complete (change with AWS_TIMEOUT_SECONDS)
08:30:06 2018/06/01 07:30:01 ui: ==> amazon-ebs: Cleaning up any extra volumes...
08:30:06 2018/06/01 07:30:01 ui: ==> amazon-ebs: No volumes to clean up, skipping
08:30:06 2018/06/01 07:30:01 ui: ==> amazon-ebs: Deleting temporary security group...

rickard-von-essen · 2018-06-01T13:02:53Z

Oh I taught #6177 was released, but apparently it is awaiting 1.3.0 which is the next release. It should handle this correctly.

CarlInglisBJSS · 2018-06-01T13:05:36Z

That's great, thanks! Is there an estimated release date? (So I can shove the ticket on our backlog with a "revisit after" date)

rickard-von-essen · 2018-06-01T13:43:47Z

Unfortunately no, maybe @SwampDragons or @mwhooker have one.

SwampDragons · 2018-06-01T17:07:10Z

we just put out 1.2.4 and normally we wait ~ 6 weeks between releases; I would guess mid-July at the earliest. That said, I'm working on #6177 now, so if you want to follow that ticket I can build you a binary when it's done so you can test it out next week or so.

mwhooker · 2018-06-02T07:35:49Z

@CarlInglisBJSS thanks for the report. Hopefully #6177 will solve this, and it would be really useful if you could help us test that when it’s ready. In the mean time, I would suggest increasing the AWS_POLL_DELAY_SECONDS env var as a workaround.

CarlInglisBJSS · 2018-06-04T07:10:34Z

I'm watching that issue and will happily help test it when it's ready.

I'll try the AWS_POLL_DELAY_SECONDS change - thanks.

CarlInglisBJSS · 2018-06-04T10:08:54Z

Our existing AWS_POLL_DELAY_SECONDS was 30 seconds. I have added a random 0-9 seconds to that for each invocation to stagger the number of simultaneous calls. This, coupled with a random 1-10 second at the start of each invocation, appears (based on one set of 15 simultaneous starts) to have given us a work around.

EDIT: Now up to 3 different sets of 15 simultaneous starts of the script - looks like this is working! Thanks.

SwampDragons · 2018-08-10T20:46:23Z

This was fixed by #6332

ghost · 2020-03-31T02:08:56Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

rickard-von-essen added bug need-more-info builder/amazon labels Jun 1, 2018

rickard-von-essen added the stage/waiting-reply label Jun 1, 2018

rickard-von-essen removed need-more-info stage/waiting-reply labels Jun 1, 2018

mwhooker mentioned this issue Jun 2, 2018

User amazon SDK waiters with robust defaults #6177

Closed

4 tasks

SwampDragons closed this as completed Aug 10, 2018

SwampDragons mentioned this issue Aug 27, 2019

add AWS API call reties on AMI prevalidation #8034

Merged

t5unamie mentioned this issue Aug 29, 2019

Build 'amazon-ebs' errored: Error retrieving auto-generated instance password: RequestLimitExceeded: Request limit exceeded. #8048

Closed

ghost locked and limited conversation to collaborators Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazon Builder not retrying after RequestLimitExceeded - throttling issue #6330

Amazon Builder not retrying after RequestLimitExceeded - throttling issue #6330

CarlInglisBJSS commented Jun 1, 2018 •

edited by rickard-von-essen

Loading

rickard-von-essen commented Jun 1, 2018

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018 •

edited

Loading

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018

SwampDragons commented Jun 1, 2018

mwhooker commented Jun 2, 2018

CarlInglisBJSS commented Jun 4, 2018

CarlInglisBJSS commented Jun 4, 2018 •

edited

Loading

SwampDragons commented Aug 10, 2018

ghost commented Mar 31, 2020

Amazon Builder not retrying after RequestLimitExceeded - throttling issue #6330

Amazon Builder not retrying after RequestLimitExceeded - throttling issue #6330

Comments

CarlInglisBJSS commented Jun 1, 2018 • edited by rickard-von-essen Loading

rickard-von-essen commented Jun 1, 2018

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018 • edited Loading

CarlInglisBJSS commented Jun 1, 2018

rickard-von-essen commented Jun 1, 2018

SwampDragons commented Jun 1, 2018

mwhooker commented Jun 2, 2018

CarlInglisBJSS commented Jun 4, 2018

CarlInglisBJSS commented Jun 4, 2018 • edited Loading

SwampDragons commented Aug 10, 2018

ghost commented Mar 31, 2020

CarlInglisBJSS commented Jun 1, 2018 •

edited by rickard-von-essen

Loading

rickard-von-essen commented Jun 1, 2018 •

edited

Loading

CarlInglisBJSS commented Jun 4, 2018 •

edited

Loading