Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConPty somethimes hangs when calling ClosePseudoConsole, (pseudo console created with PSUEDOCONSOLE_INHERIT_CURSOR) #17688

Closed
FrancescElies opened this issue Aug 9, 2024 · 7 comments
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting

Comments

@FrancescElies
Copy link

FrancescElies commented Aug 9, 2024

Other Software

wezterm 20240807-131622-ee063330

I am opening an issue here because as I understood conpty code lives here in this repo

Steps to reproduce

Sadly we couldn't manage to get consistent way to reproduce this.

We see wezterm in windows sometimes hanging when closing panes, e.g. wez/wezterm#5882

I am suspecting this might be related to #1810

this comment points out to a race condition, when ConPTY is being started with CreatePseudoConsole() with the flag PSEUDOCONSOLE_INHERIT_CURSOR (which wezterm does) this kind of fit with the fact we can't consistenly reproduce this issue.

Here the start call in wezterm.
image

Actual Behavior

ClosePseudoConsole hangs

@FrancescElies FrancescElies added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Aug 9, 2024
@lhecker
Copy link
Member

lhecker commented Aug 9, 2024

Regarding that PSEUDOCONSOLE_INHERIT_CURSOR comment:

In #17510 I added a timeout to cursor inheritance so in the worst case it will only block for 3 seconds. I was about to repurpose this issue to track the remaining problem: A shutdown signal should kill ConPTY instantly and not only after 3 seconds (hence the edits on your message). But then I realized that I also fixed that, so I think this has actually fundamentally been fixed in that PR.

However, then I also realized that this can't cause the freeze on shutdown. I've replaced ConPTY with a build from main in my local wezterm and it still happened. This indicates that the freeze occurs due to a different issue.


I don't know whether that causes your specific deadlock, but wezterm does definitely (and unfortunately) make the primary mistake when using the ConPTY API: It fails to read from the output pipe when calling ClosePseudoConsole. This is also documented here: https://learn.microsoft.com/en-us/windows/console/creating-a-pseudoconsole-session#preparing-the-communication-channels

To reproduce the issue, run a child process inside your shell (for instance WSL), print a ton of text in large chunks (for instance with cat = chunk size 128KiB) and close the tab. The deadlock inside OpenConsole then looks like this:

image

This is one of the reasons why I've chosen to remove the blocking wait from the ClosePseudoConsole API. Starting Windows 26100 (Windows 11 24H2) it'll now instantly return without waiting for the PTY to exit. (This reminds me that I need to update the winconpty package to also use that new, safer behavior.)

My suggestion for wezterm is that it reads from the output pipe until ReadFile either returns FALSE or lpNumberOfBytesRead is zero. I recommend against testing for an ERROR_BROKEN_PIPE error, because for regular pipes that's the same anyway, but for special pipes that support "graceful shutdown" (like TCP's FIN/ACK), ReadFile will return TRUE while lpNumberOfBytesRead is zero. This still means that the other side wants to end things.

With the next version of ConPTY at the end of this month, the new, safer API will be available. It'll allow you to safely call ClosePseudoConsole on the UI thread and/or in between Read/WriteFile calls. Even better, it'll support overlapped IO, which may be helpful for wezterm, considering that it uses poll() for UNIX as well, from what I can tell. That way both implementations can use async IO with the PTY.

P.S.: I just noticed that wezterm calls it "psuedo". Not just the flag, but also the entire file ("psuedocon.rs"). 😅

@FrancescElies
Copy link
Author

Thanks for your quick reply.

I’m a regular Wezterm user, what i noticed is that ClosePseudoConsole is blocking indefinitely (definitely more than 3 seconds).

Next week I should be able (hopefully will be easy) to compile conpty.dll from this repo from the main branch and reproduce the test wit ‘cat’ and closing the term.

Psuedocon typo i can report it back to wezterm devs, doing a pr with the rename might be received as inappropriate.

Regarding your reafile suggestion sounds something like I could try to make a patch for but apart from referring tour comment i am not sure i will be able to skillfully respond any questions reviewers might ask.

PS: how did you catch de deadlock stack? With procdump, could you give some details on that?

@lhecker
Copy link
Member

lhecker commented Aug 9, 2024

Next week I should be able (hopefully will be easy) to compile conpty.dll from this repo from the main branch [...]

If you hit any problems and got questions, please let me know! I'll try to get the ConPTY API updated by then.

[...] but apart from referring tour comment i am not sure i will be able to skillfully respond any questions reviewers might ask.

If the reviewers have any questions specific to ConPTY and you aren't sure about, please feel free to ping me. 🙂 (I'm familiar with Rust so that shouldn't be a problem either. 😅)

PS: how did you catch de deadlock stack?

That's Process Hacker, now renamed System Informer (probably to avoid getting banned by non-IT admins). It's similar to Microsoft's own Process Explorer, but in my opinion it's better because I find it visually more pleasing, it has more features, and it's open source on GitHub.

Stacks can be observed by double-clicking any application, clicking on "Threads", sorting by TID (so that their order is stable and stops changing), and click each individual thread. System Informer will automatically download PDBs and resolve symbols for you.

image

Of course, this only works for basic triage, but in my experience that's often enough to get an initial idea of the problem.

That aside, if you open its Options dialog, you'll find a ton of features that you can optionally enable. I wouldn't recommend enabling them outright, but I think it's worth toying around with them. Something I use for basic testing of our text renderer for instance, is under "ExtendedTools". There, you can check "Enable FPS monitoring" and you'll be able to see DWM present statistics for each individual application.

All of these are also super useful:

image

These are great:

image

You can alternatively also click this button:

image

These are completely nuts:

image

None of this requires a kernel driver, and it really only scratches the surface. System Informer is one of the greatest Windows apps I know.

@FrancescElies
Copy link
Author

FrancescElies commented Aug 12, 2024

Leonard thanks for the crash course on Process Informer, in the passt I installed but felt overwehlmed the the amount of options and I used to use only find handles or dlls, you convinced me to install it again and invest some time with it.

I am trying to give you some more info to make sure what we see it's the same issue you described, today it hanged again and decided to collect infos the way you showed me.

The main problem was identifying which OpenConsole to inspect, how do I know?. As you can see I started wezterm from windows term to see logs and had several sessions open.
image

Since I did not now which OpenConsole to inspect I decided to create a dump for each of them and wezterm too, in this case with procdump and a bit of shell ps | find opencons | each { procdump -ma $in.pid} and inspected these files with windbg.

image

Out of those dumps I could see
wezterm-gui
image

Openconsole (most of them having a similar stack)
image

Except one, which was showing something with ticket_lock, I suspect this was the one hanging?💥
image

Observations

  • Wezterm hangs forever after closing a pane
  • Wezterm hanged when closing pane with an interactive cli app, either broot or nvim (not sure right now)
  • wezterm delivers it's own copy of conpty.dll in assets folder version 1.19.240130002, couldn't see where the current version is defined in winconpty
  • with these infos do you still think this is the same issue you described me originally?

Next

  • I will try to build it the ReadFile into wezterm and see if that helps
  • I will do the experiment with cat 128KB and close pane

PS: I am happy to learn better ways of doing things, please feel free to suggest me at any time how you would do it if you know a nicer way.

@lhecker
Copy link
Member

lhecker commented Aug 12, 2024

When you check out these older OpenConsole versions (prior to #17510), you need to check the stack of the RenderThread::s_ThreadProc thread. That's the thread which calls WriteFile on the output pipe that wezterm is supposed to read from. While it's writing into the pipe it's holding the ticket lock and if no one reads from the pipe it holds the lock forever.

If I'm correct, then that's why the ConsoleIoThread (= handles incoming API calls) is stuck waiting to acquire the lock.

@FrancescElies
Copy link
Author

building conpty.dll

I will need some help building the latest conpty.dll, I read building.md and ran Invoke-OpenConsoleBuild

I got some errors

"C:\s\oss\terminal\OpenConsole.sln" (default target) (1) ->
"C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj" (default target) (114) ->
  C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj : error NU1101: Unable to find package Microsoft.Wi
ndowsDesktop.App.Ref. No packages exist with this id in source(s): C:\Program Files\dotnet\library-packs, TerminalDepe
ndencies
  C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj : error NU1101: Unable to find package Microsoft.NE
TCore.App.Ref. No packages exist with this id in source(s): C:\Program Files\dotnet\library-packs, TerminalDependencie
s
  C:\s\oss\terminal\src\tools\TerminalStress\TerminalStress.csproj : error NU1101: Unable to find package Microsoft.As
pNetCore.App.Ref. No packages exist with this id in source(s): C:\Program Files\dotnet\library-packs, TerminalDependen
cies

image

Despite the errors I got a conpty.dll built.

❯ : fd -HI conpty.dll
bin\x64\Debug\conpty.dll
obj\x64\Debug\winconpty.DLL\
obj\x64\Debug\winconpty.DLL\conpty.dll.recipe
obj\x64\Debug\winconpty.DLL\winconpty.DLL.tlog\
obj\x64\Debug\winconpty.DLL\winconpty.DLL.tlog\winconpty.DLL.lastbuildstate

terminal on  main [?] via .NET v8.0.303
❯ : sigcheck -nobanner -q -n bin\x64\Debug\conpty.dll
n/a

Sadly I can't see it's version, I guess because I am building directly from master..
I will put that dll into wezterm and and modify wezterm to call ReadFile before closing, let's see how that one works out.

cat bigfile experiment

I tried to cat and close a pane with a 1MB file big, yes the term blocked but eventually it got closed as soon as cat was done doing it's thing.

@lhecker
Copy link
Member

lhecker commented Aug 13, 2024

I will need some help building the latest conpty.dll, I read building.md and ran Invoke-OpenConsoleBuild

Despite the name, that command builds the entire solution including Windows Terminal and so it takes a long time to complete. It'll also build in Debug mode by default. Here's a command to compile it in Release mode and to only compile the two projects that are needed:

Invoke-OpenConsoleBuild '-p:Configuration=Release' '-t:Conhost\Host_EXE,Conhost\winconpty_DLL'

You can also compile it from within Visual Studio if you open the solution:
image

The output will be in bin\x64\Release.

Sadly I can't see it's version, I guess because I am building directly from master..

Yes, we add the version in our build pipelines as a separate step.

I will put that dll into wezterm and and modify wezterm to call ReadFile before closing, let's see how that one works out.

If I'm correct with my previous assumption, you don't need this new version to fix this issue in wezterm. The issue is merely that it needs to continue reading from the output pipe until after ClosePseudoConsole has returned. The new version is simply faster and more robust, but it's not required to fix this.

Also, yesterday I realized that wezterm can simply close/drop the output pipe handle before calling ClosePseudoConsole. That would also fix the issue.

@microsoft microsoft locked and limited conversation to collaborators Aug 14, 2024
@carlos-zamora carlos-zamora converted this issue into discussion #17716 Aug 14, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting
Projects
None yet
Development

No branches or pull requests

2 participants