-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for clients to exit on ConPTY shutdown #14282
Conversation
Notes from chat on Saturday: pseudoconsole teardown should have the same effect as clicking the |
Please check all prior criteria, including VSCode. :) |
I think only VS Code was missing. Seems to work great. Including running |
if (!_GetData(&msg, sizeof(msg))) | ||
{ | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these were missing. _GetData
returns a bool
after all.
{ | ||
_Shutdown(); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that RundownAndExit
is missing, we aren't [[noreturn]]
anymore and need to return something. Almost didn't catch this...
// If we failed to read because the terminal broke our pipe (usually due | ||
// to dying itself), close gracefully with ERROR_BROKEN_PIPE. | ||
// Otherwise throw an exception. ERROR_BROKEN_PIPE is the only case that | ||
// we want to gracefully close in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a copy/paste from PtySignalInputThread
, but it doesn't actually apply to this code. Also there's only 4 uses of _exitRequested
in total which makes this code easy to follow anyways IMO.
src/host/PtySignalInputThread.cpp
Outdated
{ | ||
THROW_HR(E_UNEXPECTED); | ||
} | ||
return S_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't throw an exception here - no one is catching it and it would cross an ABI boundary otherwise. This is also why this function is now noexcept
- it should've always been.
Should I add some logging here? If so what would be best? RETURN_HR_MSG
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could wrap the whole function in a try
block and add a CATCH_LOG_RETURN_HR(S_OK);
at the end. Then just keep the default
case as a THROW_HR(E_UNEXPECTED)
.
Looks a bit ugly, but it's true, the default:
case is definitely unexpected.
if (!_GetData(&signalId, sizeof(signalId))) | ||
{ | ||
return S_OK; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this in the end, so that the _GetData
calls are all uniform and consistent. I just felt that this looks better...
6c09273
to
6f177cf
Compare
void ConPtyTests::DiesOnBreakBoth() | ||
{ | ||
PseudoConsole pty = { 0 }; | ||
wil::unique_handle outPipeOurSide; | ||
wil::unique_handle inPipeOurSide; | ||
wil::unique_handle outPipePseudoConsoleSide; | ||
wil::unique_handle inPipePseudoConsoleSide; | ||
SECURITY_ATTRIBUTES sa; | ||
sa.nLength = sizeof(sa); | ||
sa.bInheritHandle = TRUE; | ||
sa.lpSecurityDescriptor = nullptr; | ||
VERIFY_IS_TRUE(CreatePipe(inPipePseudoConsoleSide.addressof(), inPipeOurSide.addressof(), &sa, 0)); | ||
VERIFY_IS_TRUE(CreatePipe(outPipeOurSide.addressof(), outPipePseudoConsoleSide.addressof(), &sa, 0)); | ||
VERIFY_IS_TRUE(SetHandleInformation(inPipeOurSide.get(), HANDLE_FLAG_INHERIT, 0)); | ||
VERIFY_IS_TRUE(SetHandleInformation(outPipeOurSide.get(), HANDLE_FLAG_INHERIT, 0)); | ||
|
||
VERIFY_SUCCEEDED( | ||
_CreatePseudoConsole(defaultSize, | ||
inPipePseudoConsoleSide.get(), | ||
outPipePseudoConsoleSide.get(), | ||
0, | ||
&pty)); | ||
auto closePty1 = wil::scope_exit([&] { | ||
_ClosePseudoConsoleMembers(&pty, TRUE); | ||
}); | ||
|
||
DWORD dwExit; | ||
VERIFY_IS_TRUE(GetExitCodeProcess(pty.hConPtyProcess, &dwExit)); | ||
VERIFY_ARE_EQUAL(dwExit, (DWORD)STILL_ACTIVE); | ||
|
||
wil::unique_process_information piClient; | ||
std::wstring realCommand = L"cmd.exe"; | ||
VERIFY_SUCCEEDED(AttachPseudoConsole(&pty, realCommand, piClient.addressof())); | ||
|
||
VERIFY_IS_TRUE(GetExitCodeProcess(piClient.hProcess, &dwExit)); | ||
VERIFY_ARE_EQUAL(dwExit, (DWORD)STILL_ACTIVE); | ||
|
||
// Close one of the pipes... | ||
VERIFY_IS_TRUE(CloseHandle(outPipeOurSide.get())); | ||
|
||
// ... Wait for a couple seconds, make sure the child is still alive. | ||
VERIFY_ARE_EQUAL(WaitForSingleObject(pty.hConPtyProcess, 2000), (DWORD)WAIT_TIMEOUT); | ||
VERIFY_IS_TRUE(GetExitCodeProcess(pty.hConPtyProcess, &dwExit)); | ||
VERIFY_ARE_EQUAL(dwExit, (DWORD)STILL_ACTIVE); | ||
|
||
// Tricky - write some input to the pcon. We need to do this so conhost can | ||
// realize that the output pipe has broken. | ||
VERIFY_SUCCEEDED(WriteFile(inPipeOurSide.get(), L"a", sizeof(wchar_t), nullptr, nullptr)); | ||
|
||
// Close the other pipe, and make sure conhost dies | ||
VERIFY_IS_TRUE(CloseHandle(inPipeOurSide.get())); | ||
|
||
VERIFY_ARE_EQUAL(WaitForSingleObject(pty.hConPtyProcess, 10000), (DWORD)WAIT_OBJECT_0); | ||
VERIFY_IS_TRUE(GetExitCodeProcess(pty.hConPtyProcess, &dwExit)); | ||
VERIFY_ARE_NOT_EQUAL(dwExit, (DWORD)STILL_ACTIVE); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't work anymore because now conhost will keep running unless all clients have disconnected. But winconpty holds a console client handle inside hPtyReference
and it's only closed once you call ClosePseudoConsole
. This is already being tracked in the DiesOnClose()
test.
LockConsole(); | ||
const auto unlock = wil::scope_exit([] { UnlockConsole(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To quote myself from the PR:
During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds.
This happens because CloseConsoleProcessState
calls HandleCtrlEvent
without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the CTRL_CLOSE_EVENT
. But that's problematic because the CTRL_CLOSE_EVENT
leads to a ConsoleControl
call of type ConsoleEndTask
which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond.
Solution: Don't race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So wait, calling RundownAndExit
is safe to do while holding the console lock? Isn't it another recipe for a deadlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's safe at all, but the conhost code did exactly that. Window::ConsoleWindowProc
acquires the console lock and then calls Window::_CloseWindow
which calls CloseConsoleProcessState
which potentially calls ServiceLocator::RundownAndExit
.
In the past without ConPTY RundownAndExit
was trivial since it only did ProcessExit
but now with the render flush it has been a disaster recipe for quite a while unfortunately...
// ctrl event will never actually get dispatched. | ||
// So, lock and unlock here, to make sure the ctrl event gets handled. | ||
LockConsole(); | ||
UnlockConsole(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay just so I understand... the Unlock happening after the scope_exit will have the same effect of making sure the CTRL event gets handled? Can you look up the repro for MSFT-19419231?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have to trust this. We've gotten it wrong a number of times, but this must be better?
…tdown-deadlock-part2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to each file make sense to me. Full disclosure, I don't get how the functions flow into each other though (like, the overall code flow). If you feel uncomfortable with that, definitely have @zadjii-msft take a look, I won't mind 😊.
Changes look good though and I agree it's an improvement. I added a few comments to add more logging, but that's up to you if you want to follow them.
src/host/PtySignalInputThread.cpp
Outdated
{ | ||
THROW_HR(E_UNEXPECTED); | ||
} | ||
return S_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could wrap the whole function in a try
block and add a CATCH_LOG_RETURN_HR(S_OK);
at the end. Then just keep the default
case as a THROW_HR(E_UNEXPECTED)
.
Looks a bit ugly, but it's true, the default:
case is definitely unexpected.
…tdown-deadlock-part2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaa
With @DHowett supplying me with a patched The most recent commits contain a fix (fb14982) that caused Ctrl+C to not work in VS Code. This happened because Furthermore I found a bug in WSL's usage of the pseudoconsole API, which can't be easily patched on our side and is also reproducible on vanilla conhost/kernelbase build 25247:
From what I can tell, WSL is neither reading from, nor closing the output pipe conhost is writing VT/text to, while being stuck waiting on |
…tdown-deadlock-part2
64eb22f
to
4c2da9e
Compare
@@ -230,6 +230,8 @@ HRESULT _CreatePseudoConsole(const HANDLE hToken, | |||
pPty->hConPtyProcess = pi.hProcess; | |||
pi.hProcess = nullptr; | |||
|
|||
// The hPtyReference we create here is used when the PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE attribute is processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we ever solve the issue where conpty looks like a client to conhost?
Hello @lhecker! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
where (on my system) 9 out of 10 times everything works correctly, but sometimes OpenConsole exits, while pwsh and bash keep running. My leading theory is that the new code is exiting OpenConsole faster than the old code. This prevents clients from calling console APIs, etc. causing them to get stuck. The old code (and new code) calls `ExitProcess` when the ConPTY pipes break and I think this is wrong: In conhost when you close the window we only call `CloseConsoleProcessState` via the `WM_CLOSE` event and that's it. Solution: Remove the call to `RundownAndExit` for ConPTY. During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds. This happens because `CloseConsoleProcessState` calls `HandleCtrlEvent` without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the `CTRL_CLOSE_EVENT`. But that's problematic because the `CTRL_CLOSE_EVENT` leads to a `ConsoleControl` call of type `ConsoleEndTask` which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond. Solution: Don't race conditions. * `Enter-VsDevShell` and close the tab Everything exits after 5s ✅ * Run msys2 bash from within pwsh and close the tab Everything exits instantly ✅ * Run `cat bigfile.txt` and close the tab Everything exits instantly ✅ * Patch `conhost.exe` with `sfpcopy`, as well as `KernelBase.dll` with the recent changes to `winconpty`, then launch and exit shells and applications via VS Code's terminal ✅ * On the main branch without this modification remove the call to `TriggerTeardown` in `RundownAndExit` (this speeds up the shutdown). Run (msys2's) `bash.exe --login` and hold enter and then press Ctrl+Shift+W simultaneously. The tab should close and randomly OpenConsole should exit early while pwsh/bash keep running. Then retry this with this branch and observe how the child processes don't stick around forever anymore. ✅
where (on my system) 9 out of 10 times everything works correctly, but sometimes OpenConsole exits, while pwsh and bash keep running. My leading theory is that the new code is exiting OpenConsole faster than the old code. This prevents clients from calling console APIs, etc. causing them to get stuck. The old code (and new code) calls `ExitProcess` when the ConPTY pipes break and I think this is wrong: In conhost when you close the window we only call `CloseConsoleProcessState` via the `WM_CLOSE` event and that's it. Solution: Remove the call to `RundownAndExit` for ConPTY. During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds. This happens because `CloseConsoleProcessState` calls `HandleCtrlEvent` without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the `CTRL_CLOSE_EVENT`. But that's problematic because the `CTRL_CLOSE_EVENT` leads to a `ConsoleControl` call of type `ConsoleEndTask` which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond. Solution: Don't race conditions. * `Enter-VsDevShell` and close the tab Everything exits after 5s ✅ * Run msys2 bash from within pwsh and close the tab Everything exits instantly ✅ * Run `cat bigfile.txt` and close the tab Everything exits instantly ✅ * Patch `conhost.exe` with `sfpcopy`, as well as `KernelBase.dll` with the recent changes to `winconpty`, then launch and exit shells and applications via VS Code's terminal ✅ * On the main branch without this modification remove the call to `TriggerTeardown` in `RundownAndExit` (this speeds up the shutdown). Run (msys2's) `bash.exe --login` and hold enter and then press Ctrl+Shift+W simultaneously. The tab should close and randomly OpenConsole should exit early while pwsh/bash keep running. Then retry this with this branch and observe how the child processes don't stick around forever anymore. ✅ (cherry picked from commit b6b1ff8)
#14160 didn't fix #14132 entirely. There seems to be a race condition left where (on my system) 9 out of 10 times everything works correctly, but sometimes OpenConsole exits, while pwsh and bash keep running. My leading theory is that the new code is exiting OpenConsole faster than the old code. This prevents clients from calling console APIs, etc. causing them to get stuck. The old code (and new code) calls `ExitProcess` when the ConPTY pipes break and I think this is wrong: In conhost when you close the window we only call `CloseConsoleProcessState` via the `WM_CLOSE` event and that's it. Solution: Remove the call to `RundownAndExit` for ConPTY. During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds. This happens because `CloseConsoleProcessState` calls `HandleCtrlEvent` without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the `CTRL_CLOSE_EVENT`. But that's problematic because the `CTRL_CLOSE_EVENT` leads to a `ConsoleControl` call of type `ConsoleEndTask` which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond. Solution: Don't race conditions. ## Validation Steps Performed * `Enter-VsDevShell` and close the tab Everything exits after 5s ✅ * Run msys2 bash from within pwsh and close the tab Everything exits instantly ✅ * Run `cat bigfile.txt` and close the tab Everything exits instantly ✅ * Patch `conhost.exe` with `sfpcopy`, as well as `KernelBase.dll` with the recent changes to `winconpty`, then launch and exit shells and applications via VS Code's terminal ✅ * On the main branch without this modification remove the call to `TriggerTeardown` in `RundownAndExit` (this speeds up the shutdown). Run (msys2's) `bash.exe --login` and hold enter and then press Ctrl+Shift+W simultaneously. The tab should close and randomly OpenConsole should exit early while pwsh/bash keep running. Then retry this with this branch and observe how the child processes don't stick around forever anymore. ✅ (cherry picked from commit b6b1ff8) Service-Card-Id: 87207831 Service-Version: 1.15
#14160 didn't fix #14132 entirely. There seems to be a race condition left where (on my system) 9 out of 10 times everything works correctly, but sometimes OpenConsole exits, while pwsh and bash keep running. My leading theory is that the new code is exiting OpenConsole faster than the old code. This prevents clients from calling console APIs, etc. causing them to get stuck. The old code (and new code) calls `ExitProcess` when the ConPTY pipes break and I think this is wrong: In conhost when you close the window we only call `CloseConsoleProcessState` via the `WM_CLOSE` event and that's it. Solution: Remove the call to `RundownAndExit` for ConPTY. During testing I found that continuously printing text inside msys2 will cause child processes to only exit slowly one by one every 5 seconds. This happens because `CloseConsoleProcessState` calls `HandleCtrlEvent` without holding the console lock. This creates a race condition where most of the time the console IO thread is the one picking up the `CTRL_CLOSE_EVENT`. But that's problematic because the `CTRL_CLOSE_EVENT` leads to a `ConsoleControl` call of type `ConsoleEndTask` which calls back into conhost's IO thread and so you got the IO thread waiting on itself to respond. Solution: Don't race conditions. ## Validation Steps Performed * `Enter-VsDevShell` and close the tab Everything exits after 5s ✅ * Run msys2 bash from within pwsh and close the tab Everything exits instantly ✅ * Run `cat bigfile.txt` and close the tab Everything exits instantly ✅ * Patch `conhost.exe` with `sfpcopy`, as well as `KernelBase.dll` with the recent changes to `winconpty`, then launch and exit shells and applications via VS Code's terminal ✅ * On the main branch without this modification remove the call to `TriggerTeardown` in `RundownAndExit` (this speeds up the shutdown). Run (msys2's) `bash.exe --login` and hold enter and then press Ctrl+Shift+W simultaneously. The tab should close and randomly OpenConsole should exit early while pwsh/bash keep running. Then retry this with this branch and observe how the child processes don't stick around forever anymore. ✅ (cherry picked from commit b6b1ff8) Service-Card-Id: 87207832 Service-Version: 1.16
🎉 Handy links: |
🎉 Handy links: |
2 new ConPTY APIs were added as part of this commit: * `ClosePseudoConsoleTimeout` Complements `ClosePseudoConsole`, allowing users to override the `INFINITE` wait for OpenConsole to exit with any arbitrary timeout, including 0. * `ConptyReleasePseudoConsole` This releases the `\Reference` handle held by the `HPCON`. While it makes launching further PTY clients via `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE` impossible, it does allow conhost/OpenConsole to exit naturally once all console clients have disconnected. This makes it unnecessary to having to monitor the exit of the spawned shell/application, just to then call `ClosePseudoConsole`, while carefully continuing to read from the output pipe. Instead users can now just read from the output pipe until it is closed, knowing for sure that no more data will come or clients connect. This is especially useful in combination with `ClosePseudoConsoleTimeout` and a timeout of 0, to allow conhost/OpenConsole to exit asynchronously. These new APIs are used to fix an array of bugs around Windows Terminal exiting either too early or too late. They also make usage of the ConPTY API simpler in most situations (when spawning a single application and waiting for it to exit). Depends on #13882, #14041, #14160, #14282 Closes #4564 Closes #14416 Closes MSFT-42244182 ## Validation Steps Performed * Calling `FreeConsole` in a handoff'd application closes the tab ✅ * Create a .bat file containing only `start /B cmd.exe`. If WT stable is the default terminal the tab closes instantly ✅ With these changes included the tab stays open with a cmd.exe prompt ✅ * New ConPTY tests ✅
#14160 didn't fix #14132 entirely. There seems to be a race condition left
where (on my system) 9 out of 10 times everything works correctly,
but sometimes OpenConsole exits, while pwsh and bash keep running.
My leading theory is that the new code is exiting OpenConsole faster than the
old code. This prevents clients from calling console APIs, etc. causing them
to get stuck. The old code (and new code) calls
ExitProcess
when the ConPTYpipes break and I think this is wrong: In conhost when you close the window we
only call
CloseConsoleProcessState
via theWM_CLOSE
event and that's it.Solution: Remove the call to
RundownAndExit
for ConPTY.During testing I found that continuously printing text inside msys2 will cause
child processes to only exit slowly one by one every 5 seconds.
This happens because
CloseConsoleProcessState
callsHandleCtrlEvent
withoutholding the console lock. This creates a race condition where most of the time
the console IO thread is the one picking up the
CTRL_CLOSE_EVENT
. But that'sproblematic because the
CTRL_CLOSE_EVENT
leads to aConsoleControl
call oftype
ConsoleEndTask
which calls back into conhost's IO thread andso you got the IO thread waiting on itself to respond.
Solution: Don't race conditions.
Validation Steps Performed
Enter-VsDevShell
and close the tabEverything exits after 5s ✅
Everything exits instantly ✅
cat bigfile.txt
and close the tabEverything exits instantly ✅
conhost.exe
withsfpcopy
, as well asKernelBase.dll
with the recent changes to
winconpty
, then launch and exitshells and applications via VS Code's terminal ✅
TriggerTeardown
inRundownAndExit
(this speeds up the shutdown).Run (msys2's)
bash.exe --login
and hold enter and then press Ctrl+Shift+Wsimultaneously. The tab should close and randomly OpenConsole should exit
early while pwsh/bash keep running. Then retry this with this branch and
observe how the child processes don't stick around forever anymore. ✅