Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock at ExecuteSubmission vs LoggingService #6717

Merged

Conversation

rokonec
Copy link
Contributor

@rokonec rokonec commented Jul 29, 2021

Fixes #6712

Context

During blocking execution of LoggingService.WaitForThreadToProcessEvents from BuildManager, event processing can invoke callback into the same BuildManager instance and cause deadlock.

Changes Made

Blocking execution of LoggingService.WaitForThreadToProcessEvents is now not called from inside of lock so thread handling event processing callback can than acquire BuildManager._syncLock without deadlock.
Core was reviewed considering possible state mutation during critical sections splitting => lock{}; other-thread-mutation; lock{} and considered safe.

Testing

Manual testing by throwing exception during code pause.

Notes

I was trying my best, but still, this code requires careful review!

@Forgind Forgind requested a review from cdmihai August 2, 2021 15:24
@ladipro
Copy link
Member

ladipro commented Aug 3, 2021

During blocking execution of LoggingService.WaitForThreadToProcessEvents from BuildManager, event processing can invoke callback into the same BuildManager instance and cause deadlock.

Can you please post sample callstacks where this happens? Which BuildManager methods are called while we're waiting for the logging service?

@rokonec
Copy link
Contributor Author

rokonec commented Aug 3, 2021

During blocking execution of LoggingService.WaitForThreadToProcessEvents from BuildManager, event processing can invoke callback into the same BuildManager instance and cause deadlock.

Can you please post sample callstacks where this happens? Which BuildManager methods are called while we're waiting for the logging service?

In related issue #6712 there is screen shot of parallel stack view, which defines those conflicting callstacks. I will copy/paste bellow with detailed explanation:
image
Right thread 14088 inside of BuildManager.ExecuteSubmission locks BuildManager._syncLock and calls LoggingService.TerminateLoggingEventQueue which block and wait for event signalized that all queued messages from LoggingService has been handled.
Left thread, 19940, is handling queued messages from LoggingService and is just handling particular LoggingService message which processing calls back into BuildManager.OnProjectFinished which requests lock (_syncLock) but _syncLokc will not be released until ALL messages, including just processing message, is process.

@cdmihai
Copy link
Contributor

cdmihai commented Aug 3, 2021

I tried splitting the lock in ExecuteSubmission in the past but ended up reverting it because I just couldn't reason about all the interleaving states :)

The deadlock is happening because two threads want to achieve the same high level action at the same time: complete the logging for a given submission. So instead of dividing the locks, a less intrusive change would be to detect when two threads want to complete the same submission and have one of them back off. In this case, OnProjectFinished should back off if it sees that someone else (ExecuteSubmission) is completing the submission.

@rokonec
Copy link
Contributor Author

rokonec commented Aug 4, 2021

@cdmihai

The deadlock is happening because two threads want to achieve the same high level action at the same time: complete the logging for a given submission.

In this particular deadlock instance the submission 'A' trying to be completed by OnProjectFinished thread is different from submission 'B' being completed by BuildManager.ExecuteSubmission. LoggingService.TerminateLoggingEventQueue blocks until ALL events from its current queue are handled, it could be, and mostly probably is, empty but it could also contains many ProjectFinishedEventArgs for different projects/submissions.
From the code it is not obvious, why completing submissions, in some cases (mostly cancel or failure), require all current logging events to be handled. To me it does not seems to be good/correct design, but it is what we have, and changing it seems more risky to me than changes I have made in this PR.

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like a low-risk change but the deadlock has to be fixed. Let's give it a try 👍

// This method may be called in the shutdown submission callback, this callback may be called after the logging service has
// shutdown and nulled out the events we were going to wait on.
if (_logMode == LoggerMode.Asynchronous && _loggingQueue != null)
lock (_lockObject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to the fix or does it address another issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is another bug. TerminateLoggingEventQueue() put LoggingService into state which will either silently reject events coming into its queue while it is waiting for all existing events to be handled.
Also CreateLoggingEventQueue() modifies _loggingQueue which shall be protected for concurrent access.
However, I will have to modify it, because the way I have fixed it, it could cause different deadlock, when any registered logger callback into LoggingService for example with intent to log something. That seems to me as quite possible.
I'll modify it and let you know.

@danmoseley
Copy link
Member

From the code it is not obvious, why completing submissions, in some cases (mostly cancel or failure), require all current logging events to be handled.

Is it simply to ensure that when a build ends, nothing of the log is lost? It would be a surprise if eg., the build copied a file, but it didn't appear in the log as around the same time the build was canceled .

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Aug 5, 2021
@rokonec rokonec merged commit 6bc1e0e into dotnet:main Aug 11, 2021
AR-May pushed a commit that referenced this pull request Sep 15, 2021
Fixes #6823

Context
There is possible deadlock in BuildManager vs LoggingService when:

verbosity=diagnostic or higher or /bl
LoggingService works in LoggerMode.Synchronous
Changes Made
LoggingService callbacks (OnProjectStarted, OnProjectFinished and OnThreadException) has been modified into async by leveraging existing workQueue.

Testing
Compiled Orchardcore in CLI and VS2022.

Notes
If these changes are considered safe, we can optionally revert PR #6717
@ladipro ladipro added this to the VS 17.0 milestone Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock at ExecuteSubmission vs LoggingService during shutdown
5 participants