-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure handler.flush() doesn't deadlock. #1112
Make sure handler.flush() doesn't deadlock. #1112
Conversation
contrib/opencensus-ext-azure/opencensus/ext/azure/log_exporter/__init__.py
Show resolved
Hide resolved
contrib/opencensus-ext-azure/opencensus/ext/azure/log_exporter/__init__.py
Outdated
Show resolved
Hide resolved
Currently it deadlocks during process termination, when atexit first calls handler.close() and then logging.shutdown(), that in turn calls handler.flush() without arguments. handler.close() kills the worker, and then handler.flush() forever waits for the dead worker to send the messages from the queue. After this change, the deadlock is still possible if something concurrently closes the handler from another thread during the flush. However, this scenario is much less likely.
0bdd025
to
033bfdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'll let @lzchen merge this since it's azure related. Thanks for the PR and description 🙂
I might be missing something, but where does |
@lzchen here: This code is executed while importing |
@gukoff |
Because
I too thought about this option but didn't want to introduce a breaking change to Now when I think about it, we could make such a change non-breaking with the sentinel pattern: _sentinel = object()
...
def close(timeout=_sentinel):
if timeout is _sentinel: # no arguments passed -> close with the default grace_period
timeout = self.options.grace_period
... What do you think? I don't have a preference. |
@gukoff |
@gukoff |
If I'm reading the CI log correctly, Try rerunning CI checks? ;) |
@lzchen would it be possible to release a new version with this fix in it? |
@gukoff |
Currently
flush()
deadlocks during process termination if there's any unsent messages in the queue.This is because
atexit
first callshandler.close()
and thenlogging.shutdown()
, that in turn callshandler.flush()
without arguments. I.e.handler.close()
kills the worker, and thenhandler.flush()
forever waits for the dead worker to send the messages from the queue.Stacktrace dump by
py-spy
of the application in a deadlock:After this change, the deadlock is still possible if another thread concurrently closes the handler during the flush. However, this scenario is much less likely.