-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct entrypoint.sh causing bootloop in some cases #1628
Conversation
Lines "kill -15 $SUPERVISOR_PID" then "wait $SUPERVISOR_PID" may fail if process is killed sufficiently fast. Command "ps -p" ensure process is still here before waiting to its state to change.
@adferrand Seems working for me. I tested it with a restart after a migration. |
Are there any things left or can we start with a new release ? |
@solidnerd And I deployed it on my production environment without any issue. I think we are good to go. |
Okay I will start with this than now. Thanks for your work 👍 |
IIUC, you've tried to address a race between the wait $SUPERVISOR_PID &
kill -15 $SUPERVISOR_PID
fg Not sure if If using |
I thought about the possibility of a the race condition between ps and wait, but in fact I think there is no race. First empirically because the issue is solved in the same conditions. Second theoretically. Before, kill and wait were executed as two different commands. I think that there is no guaranty that two consecutive bash commands are executed by two consecutive cpu cycles. So there could have sufficiently enough cpu cycles to allow the process to be effectively killed before the wait is invoked, leading to the original error. But it is quite different for a boolean evaluation, which is the case of my correction for ps and wait. Consistency of the evaluation of two operands in a binary combination is required, or your boolean test could be different depending on the runtime. So I think that two operands in a binary association are evaluated as one unique command, and operands are all executed in consecutive cpu cycles, which guarantee that nothing happens between them, and so the consistency, and so the insurance that what ps gave will still be relevant for the wait. For your example however, the wait and kill will never finish normally, because at the time of the wait we said nothing to the process which could change its state (kill it for our case), so wait will stale until timeout. |
About the empirical evidence, I just had both 10.8.3 and 10.8.3-1 loop on a clean startup 😒. I use As for your theoretical analysis, I think it depends a lot on the implementation details of the shell that's in use, i.c. |
And so died my hypothesis. As a scientist, I have to admit my mistake ^^ I will remember that shells are definitely programming languages, and that everything must be designed with the multi-threading nature of the underlying os in mind. So your solution is better as it ensures that no race can occur. I will make a new PR. Thanks! |
Hum. In fact executing Started from the foreground, wait, as a built-in command, stays in the shell and then can act normally. So of course we could make a while loop on the @paddy-hack, do you have an idea ? |
Or will just do a |
Following issue #1626
Entrypoint executes following commands:
This may fail if process is killed sufficiently fast: indeed process may already be killed when wait is invoked, and then results into an error. Command
ps -p
ensures process is still here before waiting to its state to change.