Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing more than 4096 bytes to stdout results in deadlock on Windows #121

Closed
kbknapp opened this issue Aug 15, 2022 · 11 comments · Fixed by #123
Closed

Writing more than 4096 bytes to stdout results in deadlock on Windows #121

kbknapp opened this issue Aug 15, 2022 · 11 comments · Fixed by #123
Labels
bug Not as expected

Comments

@kbknapp
Copy link

kbknapp commented Aug 15, 2022

The company I work for is using trycmd and we noticed our CI timing out on Windows. After some investigation, it appears writing more than 4096 bytes to stdout will result in trycmd deadlocking on Windows. This is presumably related to #45572 in Rust (std::process::Command hangs if piped stdout buffer fills) and makes sense since Windows uses a much smaller buffer than Linux.

A reproducible test case on Windows:

// src/main.rs
fn main() {
    // Change to 4096 and no deadlock appears
    for _ in 0..4097 {
        print!("a");
    }
}

A simple trycmd is all that it takes because we aren't looking for anything to actually pass or fail.

// tests/bug.rs
#[test]
fn trycmd_bug() {
    let t = trycmd::TestCases::new();
    t.cases("tests/bug.md");
}

In bug.md:

$ trycmd-bug
a
@kbknapp
Copy link
Author

kbknapp commented Aug 15, 2022

Forgot to add version info:

  • Windows 10
  • trycmd v0.13.5 (also tested main@ea37f34)
  • tested both Rust stable 1.63.0 and Nightly

@epage epage added the bug Not as expected label Aug 15, 2022
@epage
Copy link
Contributor

epage commented Aug 15, 2022

Thanks for reporting!

@epage
Copy link
Contributor

epage commented Aug 15, 2022

Sincey you are testing .md files, that means stdout/stderr are merged. In that case, we are using os_pipe::piped to do the merging for us. In the code I stole for that, it had a comment about avoiding one type of deadlock by dropping one of the Arcs before reading, so we do that.

We then spawn threads for stdout and stderr to read them before we start writing to stdin.

All of this seems like we should be set up to avoid these kinds of deadlocks unless read_to_end isn't as smart as I would expect.

@epage
Copy link
Contributor

epage commented Aug 15, 2022

Is it possible to reproduce this on Linux with larger output values? Or are you able to get a backtrace of where the deadlock occurs? Those would be big helps for trying to narrow this down.

@kbknapp
Copy link
Author

kbknapp commented Aug 15, 2022

My debugging experience in Windows is...not so great...I don't even know what the equivalent of GDB/LLDB are? 😄 I'll look around and see what I can find.

So I'll try and see if I can get this same behavior from Linux with a larger output get you a backtrace and maybe that at least gets us something.

@epage
Copy link
Contributor

epage commented Aug 15, 2022

My debugging experience in Windows is...not so great...I don't even know what the equivalent of GDB/LLDB are? 😄 I'll look around and see what I can find.

If you aren't using Visual Studio's debugger, probably WinDBG (affectionately pronounced as "Wind Bag") though I have no idea how Rust support is for either. At least WinDBG isn't as stuck in the 90s as when I last used it.

@kbknapp
Copy link
Author

kbknapp commented Aug 15, 2022

While I figure out the Windows side of debugging, here is the Linux backtrace:

(lldb) bt
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
* thread #1, name = 'bug-4ccbe4ac60f', stop reason = signal SIGSTOP
  * frame #0: 0x00007ffff7d5673d libc.so.6`syscall at syscall.S:38
    frame #1: 0x0000555555c3bad5 bug-4ccbe4ac60f594b5`std::sys::unix::futex::futex_wait::hd704c4155d318c56 at futex.rs:62:21
    frame #2: 0x0000555555c2bbff bug-4ccbe4ac60f594b5`std::thread::park_timeout::h9eef80e39ca53d26 [inlined] std::sys_common::thread_parker::futex::Parker::park_timeout::h0b458ff71b965d71 at futex.rs:72:9
    frame #3: 0x0000555555c2bbda bug-4ccbe4ac60f594b5`std::thread::park_timeout::h9eef80e39ca53d26 at mod.rs:997
    frame #4: 0x0000555555c35c6f bug-4ccbe4ac60f594b5`std::sync::mpsc::blocking::WaitToken::wait_max_until::h29818c7eff829dfe at blocking.rs:78:13
    frame #5: 0x0000555555bdcefc bug-4ccbe4ac60f594b5`std::sync::mpsc::shared::Packet$LT$T$GT$::recv::h79f0e2d1b67a051a at shared.rs:230:34
    frame #6: 0x0000555555c0694b bug-4ccbe4ac60f594b5`test::run_tests::h53095661548092f5 [inlined] std::sync::mpsc::Receiver$LT$T$GT$::recv_deadline::h1458501570c81bf8 at mod.rs:1358:48
    frame #7: 0x0000555555c06749 bug-4ccbe4ac60f594b5`test::run_tests::h53095661548092f5 [inlined] std::sync::mpsc::Receiver$LT$T$GT$::recv_timeout::h402c8c1ae7f04795 at mod.rs:1277
    frame #8: 0x0000555555c0654b bug-4ccbe4ac60f594b5`test::run_tests::h53095661548092f5 at lib.rs:335
    frame #9: 0x0000555555bee48c bug-4ccbe4ac60f594b5`test::console::run_tests_console::h2597a829edaf8b36 at console.rs:293:5
    frame #10: 0x0000555555c032f7 bug-4ccbe4ac60f594b5`test::test_main::he73249663906ad00 at lib.rs:113:15
    frame #11: 0x0000555555c042a1 bug-4ccbe4ac60f594b5`test::test_main_static::ha3523c6644549016 at lib.rs:132:5
    frame #12: 0x00005555555e6f73 bug-4ccbe4ac60f594b5`bug::main::h0802a32688c95483 at bug.rs:1:1
    frame #13: 0x00005555555e715b bug-4ccbe4ac60f594b5`core::ops::function::FnOnce::call_once::hfd96b6971aca838f((null)=(bug-4ccbe4ac60f594b5`bug::main::h0802a32688c95483 at bug.rs:1), (null)=<unavailable>) at function.rs:248:5
    frame #14: 0x00005555555e71ae bug-4ccbe4ac60f594b5`std::sys_common::backtrace::__rust_begin_short_backtrace::hb663ec441010ca2e(f=(bug-4ccbe4ac60f594b5`bug::main::h0802a32688c95483 at bug.rs:1)) at backtrace.rs:122:18
    frame #15: 0x00005555555e7241 bug-4ccbe4ac60f594b5`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h7b46dab09d1c03be at rt.rs:145:18
    frame #16: 0x0000555555c2b12e bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h4937aaa125c8d4b2 at function.rs:280:13
    frame #17: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] std::panicking::try::do_call::h6f5c70e8b0a34f92 at panicking.rs:492
    frame #18: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] std::panicking::try::h68766ba264ecf2e2 at panicking.rs:456
    frame #19: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] std::panic::catch_unwind::hc36033d2f9cc04af at panic.rs:137
    frame #20: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h78c037f4a1a28ded at rt.rs:128
    frame #21: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] std::panicking::try::do_call::he6e1fffda4c750ee at panicking.rs:492
    frame #22: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] std::panicking::try::h48a77ddbb2f4c87a at panicking.rs:456
    frame #23: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 [inlined] std::panic::catch_unwind::hfa809b06a550a9e7 at panic.rs:137
    frame #24: 0x0000555555c2b127 bug-4ccbe4ac60f594b5`std::rt::lang_start_internal::h4db69ed48eaca005 at rt.rs:128
    frame #25: 0x00005555555e7210 bug-4ccbe4ac60f594b5`std::rt::lang_start::hab834fa8d26e2413(main=(bug-4ccbe4ac60f594b5`bug::main::h0802a32688c95483 at bug.rs:1), argc=1, argv=0x00007fffffffde08) at rt.rs:144:17
    frame #26: 0x00005555555e6f9c bug-4ccbe4ac60f594b5`main + 28
    frame #27: 0x00007ffff7c62083 libc.so.6`__libc_start_main(main=(bug-4ccbe4ac60f594b5`main), argc=1, argv=0x00007fffffffde08, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffddf8) at libc-start.c:308:16
    frame #28: 0x00005555555e6bce bug-4ccbe4ac60f594b5`_start + 46

From the following:

fn main() {
    for _ in 0..(128*1024) {
        print!("A");
    }
}

@epage
Copy link
Contributor

epage commented Aug 15, 2022

Ok, testing it on Linux.

We are hanging when we are waiting for the process to exit. In theory, the stdout/stderr threads should be doing everything we need. I might try changing those from read_to_end just to make debugging easier

@kbknapp
Copy link
Author

kbknapp commented Aug 15, 2022

Appreciate it! Also this isn't a major issue for us, so no need to prioritize over other things. We just have some UI snapshots disabled on Windows which isn't a super big deal at the moment 😉

@epage
Copy link
Contributor

epage commented Aug 15, 2022

Looks like I misunderstood how it worked and I wasn't reading from the merged pipe until everything was done. Should be easy to fix

epage added a commit to epage/snapbox that referenced this issue Aug 15, 2022
Apparently, I got confused between the different stdout's and was
reading the non-exixtent one from a thread rather than the one that
existed, causing it to deadlock on large buffers.

This mostly affects trycmd's testing with markdown files.

Fixes assert-rs#121
@epage
Copy link
Contributor

epage commented Aug 15, 2022

Well, consider me nerdsniped.

The fix is released in v0.13.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Not as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants