Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calling handler.close() from different thread causes deadlock when using AzureLogHandler #1125

Closed
csadam opened this issue Jun 1, 2022 · 5 comments · Fixed by #1134
Closed
Labels
azure Microsoft Azure bug

Comments

@csadam
Copy link

csadam commented Jun 1, 2022

Describe your environment.
Python 3.8.13 on Linux, opencensus-ext-azure == 1.1.4, azureml-sdk[automl] ~= 1.42.0, azureml-mlflow==1.42.0, mlflow-skinny==1.26.0

Steps to reproduce.
I can't give an exact test code. The problem is related to #1112.
We add an AzureLogHandler on the main thread. Then later we run automl Experiment in multiple threads.

The code in the thread at some point runs import mlflow, and the program stucks.

The import causes to run _configure_mlflow_loggers(root_module_name=__name__), which then runs the logging.config.dictConfig

This runs _clearExistingHandlers in logging.config, and inside of this, the shutdown function runs.
The shutdown runs the following calls for all defined logging handlers:

   h.acquire()
   h.flush()
   h.close()

With previous versions, for example opencensus-ext-azure == 1.1.3 the program stuck at the h.flush() call.
With the 1.1.4 version the h.flush() finishes, but then the run still stops when the self._worker.stop(timeout) is called inside the h.close().

What is the expected behavior?
The import finishes successfully and the program continues

What is the actual behavior?
The program stops in deadlock

@csadam csadam added the bug label Jun 1, 2022
@lzchen lzchen added the azure Microsoft Azure label Jun 2, 2022
@jorgenader
Copy link

had similar issue. resolved it by patching handler close to add sentinel guard for timeout and use default automatically - it was freezing when timeout = None.

versions used also.
Python 3.9 on Linux, opencensus-ext-azure == 1.1.4, Django==3.2, django-dramatiq == 0.10.0, django-periodiq == 0.2.0

@csadam
Copy link
Author

csadam commented Jun 3, 2022

I'm using the following monkeypatch as workaround, but I don't think this is the "correct" way to solve this problem.

    ### AzureLogHandler deadlock workaround
    #def close(self, timeout=None):
    def close(self, timeout=20.0):
        if self.storage:
            self.storage.close()
        if self._worker:
            self._worker.stop(timeout)

    from opencensus.ext.azure.log_exporter import BaseLogHandler    
    BaseLogHandler.close = close
    ###

@lzchen
Copy link
Contributor

lzchen commented Jun 21, 2022

@jorgenader
Are you referring to this?

@lzchen
Copy link
Contributor

lzchen commented Jun 21, 2022

@csadam
Ideally, we would use the sentinel pattern as described here to avoid breaking changes, and then remove the custom atexit hook in place of logging shutdown.

@jorgenader
Copy link

@jorgenader Are you referring to this?

Yes, but more like adding default timeout instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure Microsoft Azure bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants