-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User amazon SDK waiters with robust defaults #6177
Comments
Looking at #6330, I find it odd that the default waiters use a constant delay. I wonder if we should add an option to use exponential back off... |
I like the approach of using the SDK waiters for this solution - any ETA on having |
It'll probably be at least a month -- I'm working on implementing the waiters right now in #6332 and will hopefully be done with testing today. I can build you a binary if you'd like to try it out. |
@mwhooker I'm not sure points 2 and 4 are necessary in the way I've implemented it; I have used Packer's current env vars instead of overriding them with anything, so the default is already set by those env vars (point 2) and it should already be documented because nothing's changed (point 4) |
Could I please point out that the issue reported in #6305 that ends up getting tracked here is about the timeout happening when we have specifically requested that no timeout should apply via disable_stop_instance
Using packer 1.0.2 packer waits indefinitely as we would want it to. |
@nstreet |
@nstreet Looking closer at this, you've found a bug unrelated to the timeouts -- the issue here is that the waiter should be located inside an if statement that it's actually outside of. I can fix that for the next release. |
Okay, final comment @nstreet -- I looked into this and it turns out it's a flag for a very very specific use case; for that use case it works fine, but it does not do what you want it to do. I've made a PR updating the docs to make them clearer. |
@SwampDragons any chance I could get a build with this fix in it to try out? |
@PCSshell sure -- what's your host OS? |
@SwampDragons Ubuntu 16.04 |
@SwampDragons Apparently I am not doing something correctly. Below are the commands I am running to set the timeout settings. In the log it is telling me that the values to over ride are not set. $ export AWS_POLL_DELAY_SECONDS=10 ==> amazon-ebs: Error waiting for instance to stop: ResourceNotReady: exceeded wait attempts $ echo $AWS_MAX_ATTEMPTS |
Oh weird. And of course I merged before I saw your comment. I'll take a look at this. |
ugh, found it. I made some last-minute changes to make testing easier, and... broke the functionality. 😅 |
I am still seeing the same behavior. The --version number I am seeing now is [INFO] Packer version: 1.3.0-dev |
The same behavior as the previous build I gave you? I just tested again locally, and I only get the I was bouncing back and forth between several issues yesterday so I guess it's possible I built from the wrong commit (I can't test the binary right now since I'm on a darwin machine)... just in case, here's a new build of the master branch. |
Looks like it is recognizing the environment variables now, but does not seem to respect what I am entering for them. export AWS_POLL_DELAY_SECONDS=30 2018/07/12 13:47:41 ui: ==> amazon-ebs: Automatic instance stop disabled. Please stop instance manually. |
Okay, I dug around some more -- it looks like you've found a bug where this code was not governed by our environment variables even before I swapped out the waiters in the attached PR; it just used the waiter's default. I missed it when swapping out our custom code for the AWS waiters, because this particular wait had already been swapped out. Third time's the charm? |
@SwampDragons - Working as expected now. Thanks for all the assistance! |
No problem -- thanks for your patience as I looked into this. |
@SwampDragons I have tried both packer v1.3 and v1.2.3 that I wasn't able to get rid of the error saying unexpected state 'failed', wanted target 'available'� I currently building all 4 different AMIs from jenkins-slaves for 4 different aws accounts and three of them are working fine meaning it builds AMIs on all three accounts without having this error. Only one account is giving me the error even though packer version is on v1.2.3 and runs based on the same Ansible (v2.4.1) packer codes. [1;32m==> aws: Copying AMI: us-west-2(ami-0a30229bcbd197499)�[0m ==> Some builds didn't complete successfully and had errors: ==> Builds finished but no artifacts were created. The jenkins slave machine which doesn't work, I have ran pip list to make sure all pip installed components are the same as the other including awscli, pip version, ansible, requests and etc. There is very minimal version discrepancies and almost the same. 2018/08/29 01:23:06 packer: 2018/08/29 01:23:06 Detected home directory from env var: /home/jenkins-slave Any suggestions? |
I'll try to look at this today. |
@SwampDragons Thanks for your attention and I added some logs at https://github.com/WeekendsBull/packerbuild-error/tree/master/packer_debug I put both centos and windows 2012r2 build logs with ones working and works failed. I have only provided packer v1.2.3 and would you want me to provide the v1.3.0-dev as well? I put an excel file showing what I have installed in each jenkins slave instances but they are almost identical for installed version. Here is the copy of centos template I use at https://github.com/WeekendsBull/packerbuild-error/blob/master/centos7build.json (each build name associated with each account). |
@WeekendsBull I figured this out and responded to you in the mailing list, but I'll leave a note here in case future users run into the same problem. You need to increase the AWS timeout by setting the environment variables For example:
or simply:
|
Using 1.4.2 version of Packer. And still hit issue, in spite of setting
|
This issue is a year old -- if you're still experiencing problems then please open a new ticket with full steps to reproduce and full debug logs. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Before the aws SDK had a viable wait method (i.e. to wait for resources to become ready), packer implemented its own version of this. See builder/amazon/common/state.go.
We routinely get issues reported that we're not waiting long enough. I've slowly been replacing the homegrown waiters with the SDK waiters, which generally seems to solve the problems. I think we should remove the home grown waiters for good and replace them with the SDK waiters.
There is a BC aspect to consider:
The common/state.go waiters use the env vars
AWS_POLL_DELAY_SECONDS
andAWS_TIMEOUT_SECONDS
to control how long to wait.The SDK waiters implement defaults for the above values per resource. For example, the wait for instances to be running looks like this
as you can see, it has the delay hardcoded to 15s, and retries set to 40. We do a 5 minute absolute timeout and 2 second delay.
I believe packer has a different use-case than the SDK defaults. We don't care so much about liveliness, and instead care deeply about the request eventually succeeding. I believe we should use the SDK waiters with default retries set to 10x the SDK default.
We must also allow these to be overridable by the user in case the values are unacceptable. Ideally we could reuse the existing env vars in a way that makes sense, but we may have to come up with an alternate configuration syntax.
Todo
The text was updated successfully, but these errors were encountered: