-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential lags/deadlocks during tab close #14041
Conversation
ab2abe8
to
b8511bd
Compare
b8511bd
to
2f09d15
Compare
@@ -631,12 +647,6 @@ namespace winrt::Microsoft::Terminal::TerminalConnection::implementation | |||
const auto result{ til::u8u16(std::string_view{ _buffer.data(), read }, _u16Str, _u8State) }; | |||
if (FAILED(result)) | |||
{ | |||
if (_isStateAtOrBeyond(ConnectionState::Closing)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain that we'll never make it here while a connection is closing? As long as we never transition a connection into the Failed state after we transition it into Closing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually entirely possible and the source of the race condition that I fixed.
While _isStateAtOrBeyond
itself might be atomic, that doesn't mean it protects us from other threads changing the state under us concurrently. Unless I misunderstood your question, I don't believe that this is any more or less correct than before. (But the change is required to make the CancelSynchronousIo
handling work correctly.)
// MSFT:40146639 | ||
// The premise of this function is that 1 thread enters and 0 threads leave alive. | ||
// We need to prevent anyone from calling us until we actually ExitProcess(), | ||
// so that we don't TriggerTeardown() twice. LockConsole() can't be used here, | ||
// because doing so would prevent the render thread from progressing. | ||
AcquireSRWLockExclusive(&s_shutdownLock); | ||
if (locked.exchange(true, std::memory_order_relaxed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe on all architectures? Is relaxed correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's safe on all architectures. Atomic operations themselves are always exactly as atomic as any other, no matter what memory order is specified. Relaxed only means that no additional memory is synchronized/flushed (which is fine here).
AcquireSRWLockExclusive(&s_shutdownLock); | ||
if (locked.exchange(true, std::memory_order_relaxed)) | ||
{ | ||
Sleep(INFINITE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to confirm.
If we get here, it is the first entry and we take the lock, we proceed .We might cause either...
- A recursive call to
RundownAndExit
- Another thread to
RundownAndExit
The first one is fixed by preventRecursion
, and the second one is fixed by locked
-> Sleep(INFINITE)
?
What if we trigger another thread to RundownAndExit
while we're waiting on it to complete? Is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the change to s_shutdownLock
is optional. I did it because I initially tried to implement this with a recursive mutex (which SRWLOCK
can't do), failed and then wrote the atomic+sleep code, which I kept because I felt like it's easier to see what it does compared to abusing a SRWLOCK
for "don't actually lock, only suspend anyone but me".
@@ -39,14 +39,30 @@ void ServiceLocator::SetOneCoreTeardownFunction(void (*pfn)()) noexcept | |||
s_oneCoreTeardownFunction = pfn; | |||
} | |||
|
|||
[[noreturn]] void ServiceLocator::RundownAndExit(const HRESULT hr) | |||
void ServiceLocator::RundownAndExit(const HRESULT hr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes here are scary. Are we sure this won't cause the OneCore console to regress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I'm not sure, but I don't believe so. The only effective difference is the addition of the thread_local
and I could revert the change to s_shutdownLock
if you'd like.
What behavior do you see for process management when you open a hundred tabs and then close the window (f.ex)? |
Closing ~40 pwsh tabs took about 2-3s for WT to disappear. The pwsh/OpenConsole processes however quit instantly. (Maybe some sort of WinUI lag? Didn't investigate it any further.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this. I'll pack it into a 1.16 build and ship it out for folks...!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ngl, it's a bit hard to follow. I think with a few answered questions I'll feel comfortable approving though.
@@ -382,7 +382,7 @@ void ConPtyTests::DiesOnClose() | |||
Log::Comment(NoThrowString().Format(L"Sleep a bit to let the process attach")); | |||
Sleep(100); | |||
|
|||
_ClosePseudoConsoleMembers(&pty); | |||
_ClosePseudoConsoleMembers(&pty, TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y'all don't like default params, huh? :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a C function - that's why it can't have that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also FWIW no, I personally don't like default parameters! I also don't like underdocumented boolean parameters for reasons I've mentioned before, but since this is an internal/in-file interface I don't care as much.
@@ -86,7 +86,6 @@ namespace Microsoft::Console::Interactivity | |||
|
|||
static HWND LocatePseudoWindow(const HWND owner = nullptr /*HWND_DESKTOP = 0*/); | |||
|
|||
protected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove protected? Should we keep protected for some of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is final
so that protected
has no effect. It's also not usual / best practice to make classes any more private than necessary. A non-public copy constructor for instance is pretty unusual since you would only be able to use them inside the class. It gets even more unusual when you consider that the functions are explicitly deleted, so not even the class itself can use them. In short, this code was kinda whacky. 😅
AcquireSRWLockExclusive(&s_shutdownLock); | ||
if (locked.exchange(true, std::memory_order_relaxed)) | ||
{ | ||
Sleep(INFINITE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put a big ol' comment here why it's ok to put Sleep(INFINITE);
. I think it'd be nice instead of having to find this PR and read through the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ended up being a bit of a shorter comment. But I think that one, plus the longer explanation in front of the if
condition should sufficiently explain what this does... 🤔
Hello @DHowett! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
`ConptyClosePseudoConsole` blocks until OpenConsole exits. This is problematic for the changes in 666c446, which stopped calling that function on a background thread to solve a race condition. This commit fixes the potential lags/deadlocks from waiting on OpenConsole's exit, by adding `ConptyClosePseudoConsoleNoWait` which only closes the IO handles and allows OpenConsole to exit naturally. This uncovered another potential deadlock in `ServiceLocator::RundownAndExit` which might call itself recursively. Closes #14032 ## Validation Steps Performed * Print tons of text and concurrently close the tab. Tab closes, OpenConsole/pwsh exits instantly ✅ * Use `Enter-VsDevShell` and close the tab. Tab closes instantly, OpenConsole/pwsh exits after ~5 seconds ✅ (cherry picked from commit 274bdb3) Service-Card-Id: 85767341 Service-Version: 1.16
🎉 Handy links: |
`ConptyClosePseudoConsole` blocks until OpenConsole exits. This is problematic for the changes in 666c446, which stopped calling that function on a background thread to solve a race condition. This commit fixes the potential lags/deadlocks from waiting on OpenConsole's exit, by adding `ConptyClosePseudoConsoleNoWait` which only closes the IO handles and allows OpenConsole to exit naturally. This uncovered another potential deadlock in `ServiceLocator::RundownAndExit` which might call itself recursively. Closes #14032 ## Validation Steps Performed * Print tons of text and concurrently close the tab. Tab closes, OpenConsole/pwsh exits instantly ✅ * Use `Enter-VsDevShell` and close the tab. Tab closes instantly, OpenConsole/pwsh exits after ~5 seconds ✅ (cherry picked from commit 274bdb3) Service-Card-Id: 86209670 Service-Version: 1.15
🎉 Handy links: |
2 new ConPTY APIs were added as part of this commit: * `ClosePseudoConsoleTimeout` Complements `ClosePseudoConsole`, allowing users to override the `INFINITE` wait for OpenConsole to exit with any arbitrary timeout, including 0. * `ConptyReleasePseudoConsole` This releases the `\Reference` handle held by the `HPCON`. While it makes launching further PTY clients via `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE` impossible, it does allow conhost/OpenConsole to exit naturally once all console clients have disconnected. This makes it unnecessary to having to monitor the exit of the spawned shell/application, just to then call `ClosePseudoConsole`, while carefully continuing to read from the output pipe. Instead users can now just read from the output pipe until it is closed, knowing for sure that no more data will come or clients connect. This is especially useful in combination with `ClosePseudoConsoleTimeout` and a timeout of 0, to allow conhost/OpenConsole to exit asynchronously. These new APIs are used to fix an array of bugs around Windows Terminal exiting either too early or too late. They also make usage of the ConPTY API simpler in most situations (when spawning a single application and waiting for it to exit). Depends on #13882, #14041, #14160, #14282 Closes #4564 Closes #14416 Closes MSFT-42244182 ## Validation Steps Performed * Calling `FreeConsole` in a handoff'd application closes the tab ✅ * Create a .bat file containing only `start /B cmd.exe`. If WT stable is the default terminal the tab closes instantly ✅ With these changes included the tab stays open with a cmd.exe prompt ✅ * New ConPTY tests ✅
ConptyClosePseudoConsole
blocks until OpenConsole exits.This is problematic for the changes in 666c446, which stopped calling that
function on a background thread to solve a race condition. This commit fixes
the potential lags/deadlocks from waiting on OpenConsole's exit, by adding
ConptyClosePseudoConsoleNoWait
which only closes the IO handles and allowsOpenConsole to exit naturally. This uncovered another potential deadlock
in
ServiceLocator::RundownAndExit
which might call itself recursively.Closes #14032
Validation Steps Performed
Tab closes, OpenConsole/pwsh exits instantly ✅
Enter-VsDevShell
and close the tab.Tab closes instantly, OpenConsole/pwsh exits after ~5 seconds ✅