Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-open #2138, #11112 - WSL hangs on rsync/ssh #11157

Closed
1 of 2 tasks
bunnie opened this issue Feb 15, 2024 · 5 comments
Closed
1 of 2 tasks

Re-open #2138, #11112 - WSL hangs on rsync/ssh #11157

bunnie opened this issue Feb 15, 2024 · 5 comments

Comments

@bunnie
Copy link

bunnie commented Feb 15, 2024

Windows Version

Microsoft Windows [Version 10.0.22621.3085]

WSL Version

2.0.9.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

Linux version 4.4.0-19041-Microsoft ([email protected]) (gcc version 5.4.0 (GCC) ) #3996-Microsoft Thu Jan 18 16:36:00 PST 2024

Distro Version

Ubuntu 20.04

Other Software

Copying via ssh/rsync to another WSL instance, this one is WSL 2.0.9.0 on windows 10.0.19045.3996

Repro Steps

I run a script to rsync several files from one computer to another:

rsync --log-file=log.log -aiv --delete [email protected]:/mnt/c/dir .

The target computer is running sshd, with public key authentication (Ed25519-only).

Expected Behavior

The rsync process should run to completion.

Expected Behavior

The rsync process should run to completion.

Actual Behavior

The rsync process should run to completion.

Actual Behavior

After a couple of minutes (so, after transferring ~ few GiB or thousands of files -- I have seen it fail with few large files, and also fail with thousands of small files), the rsync process hangs. This is evidenced by:

  • Network traffic going from ~1Gbps to ~0
  • rsync process consuming no more CPU
  • rsync process still visible in the process table

The work-around I have for this, and have been using for years now, is to spawn in parallel a script that runs this:

while killall -CHLD ssh; do sleep 0.1; done

However, as recently as last month I forgot to spawn that and had an incomplete rsync several hours later (I seem to recall the rsync actually terminated with an error eventually, without copying all the files).

The prior issue #2138 has been open for a while and I have been holding out hoping there would be a fix for this, someday. The "keep killing SSH" helper process is a viable work-around but if I forget to run it, things fail.

Would really like my rsync to be more reliable...or at least have some satisfying explanation of root cause and why this might be specific just to rsync and not indicative of structural unsoundness inside WSL leading to race conditions.

Diagnostic Logs

Running strace -f on the job returns the following log, truncated to the last few megabytes because otherwise this would be a gigabyte-sized log. If you want the whole 643MiB ball of wax let me know.

submission-log.txt

Here's probably the relevant snippet of the log:

[pid  8411] read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 260096) = 260096
[pid  8411] read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 261120) = 261120
[pid  8411] read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 324608) = 324608
[pid  8411] select(7, [6], [5], [6], {tv_sec=60, tv_usec=0}) = 2 (in [6], out [5], left {tv_sec=60, tv_usec=0})
[pid  8411] read(6, "\177\334-\24\312\230\v\2257\310\200\247\7\367q\351\316\254$\204\202\27\314r\3\240\226\30\353<\243\273"..., 28968) = 28968
[pid  8411] write(5, "O\377\0\7\1\10\200\37\376\0\0\20\376\0\0\4\0\0\0 Z\0\0\377\377\377\377\376\377\377\377\375"..., 65363) = -1 EAGAIN (Resource temporarily unavailable)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)     << it's hanging here
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}) = 0 (Timeout)    << several minutes pass....
[pid  8411] select(6, [], [5], [], {tv_sec=60, tv_usec=0}strace: Process 8411 detached     << i killed the process here
 <detached ...>
strace: Process 8412 detached
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(644) [sender=3.1.3]
Copy link

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@acmuller
Copy link

I still have this problem. I don't know why this issue was closed.

@craigloewen-msft
Copy link
Member

I reopened to the original issue, thank you for filing this and letting us know!

@gitsquit
Copy link

Getting exacly the same problem. rsync hangs indefinitely after a minute or even a few seconds. With the -v option the last output is always like. ( always starting with 32,768 0% )
32,768 0% 35.60kB/s 0:03:25

@gitsquit
Copy link

... the workaround ( while killall -CHLD ssh; do sleep 0.1; done ) works for me too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants