-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236
base: master
Are you sure you want to change the base?
#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236
Conversation
29b7866
to
34ce875
Compare
To test such functionality, consider using an approach inspired by Apache httpd: https://github.com/apache/httpd/blob/trunk/test/modules/core/test_002_restarts.py. This approach involves making assertions based on log output. While it would require implementing some additional tools, this method seems to simplify testing complex scenarios such as multiprocessing and greenlets. |
gunicorn/workers/ggevent.py
Outdated
# Retrieve all active greenlets and repeatedly check, until the keepalive period ends, | ||
# if any of those greenlets are still active. If none are active, exit the loop. | ||
|
||
greenlets = {id(greenlet) for server in servers for greenlet in server.pool.greenlets} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this line is executed, self.alive is already False, which means that all keepalive connections are either closed or are being handled by the existing greenlets. No new keepalive connections will be created until the worker restarts.
Therefore, if the greenlets found in this line have finished, we do not need to wait any longer, as we are assured that there are no more active keepalive connections.
I am not yet confident I understand all possibly implications.. but whether further changed or not.. this definitely needs documentation (It already needed documentation before, but after the change the behaviour can be even more unexpected). Maybe a time table that lists the state transitions from hitting |
gunicorn/workers/ggevent.py
Outdated
greenlets = {id(greenlet) for server in servers for greenlet in server.pool.greenlets} | ||
ts = time.time() | ||
|
||
while time.time() - ts <= self.cfg.keepalive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obligatory "only do that with a monotonic clock" comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I changed it to use a monotonic clock. By the way, shouldn't it be changed in other places as well? For example, I can see: https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L98
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only the first part of this change (ensuring we don't process keepalive during closing is OK.
I disagree with the second part. Can you update your patch to only take care of the first one?
bcdbc09
to
fb62e1e
Compare
@benoitc thanks for review and sorry for delay. I have removed the code you suggested to remove. But I still think that this is not final fix for that issue but at least is a step foreward. Agree that idea to wait for all keepalive sessions is not good and would lead to other problems |
This change will prevent potential disruptions in request handling.
Currently, when a worker is restarted due to the max_request limit and the keepalive period is greater than zero, there is a gap between stopping the acceptance of new requests and starting a new worker. This happens because the process must wait for the lesser of the grace period and the keepalive period, even if there are no pending requests. It waits due to open connections that remain active for the duration of the keepalive period.
See also: #3227
cc: @benoitc @pajod