Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH to remote causes disconnect with client_loop: send disconnect: Broken pipe #7966

Closed
1 of 2 tasks
rochecompaan opened this issue Jan 24, 2022 · 21 comments
Closed
1 of 2 tasks

Comments

@rochecompaan
Copy link

rochecompaan commented Jan 24, 2022

Version

Microsoft Windows [Version 10.0.22000.434]

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.10.60.1

Distro Version

Ubuntu 20.04

Other Software

No response

Repro Steps

ssh to any remote machine from Ubuntu 20.04 under WSL. ssh from Powershell works without issues.

Expected Behavior

ssh connection should persist and stay connected

Actual Behavior

Consistent disconnects after a few seconds:

$ ssh upfront4                                                                                                                  bash  19:28:08 
Linux upfront4 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
You have new mail.
Last login: Mon Jan 24 13:19:23 2022 from 102.39.132.97
roche@upfront4:~$ sudo -i
[sudo] password for roche: client_loop: send disconnect: Broken pipe

Diagnostic Logs

networking.bat.log
wsl.etl.zip

@eqn-group
Copy link

eqn-group commented Jan 25, 2022

read answer here: 616355

for per-user configuration, Edit the file ~/.ssh/config to have:

Host *
            ServerAliveInterval 60
            TCPKeepAlive no

this could be also set globally in /etc/ssh/ssh_config. This issue is not related to WSL.

ServerAliveInterval: number of seconds that the client will wait before sending a null packet to the server (to keep the connection alive).

@rochecompaan
Copy link
Author

I already had the following in ~/.ssh/config before logging this issue:

Host *
ServerAliveInterval 60

I just added TCPKeepAlive no and I still hit the same error. If it is not a WSL issue, why don't I experience the same issue when using ssh under Windows?

@rochecompaan
Copy link
Author

change ServerAliveInterval to 20

Looks promising, thanks! I'll do a few more tests and confirm that the error is gone.

@rochecompaan
Copy link
Author

Unfortunately, the issue is not resolved. SSH aside, if I run side-by-side pings from Windows and WSL to a remote host, I see up to 20% packet loss on the WSL network, but 0% on Windows.

@ghost ghost removed the needs-author-feedback label Jan 26, 2022
@rochecompaan
Copy link
Author

My issue seems similar to #7326 but updating the name server in resolv.conf does not resolve my issue. Other issues that seem related are #7254 and #6416.

@eqn-group
Copy link

eqn-group commented Jan 26, 2022

You can try in debug mode with command ssh -v [email protected] 2>result.txt and share the debug log here.
also did you try pinging other servers ?

@rochecompaan
Copy link
Author

ssh client:

dpkg -l | grep ssh-client
ii  openssh-client                       1:8.2p1-4ubuntu0.4                    amd64        secure shell (SSH) client, for secure access to remote machines

I manage clusters of servers on multiple continents and I hit the same issue no matter what server I ping or ssh.
result.txt

I honestly fail to see how this is an ssh issue, more likely an issue with TCP NAT again: #6416 (comment)

@victorpablosceruelo
Copy link

victorpablosceruelo commented Jan 26, 2022

Hi there.
I'm having the same issues.

My log with verbose:

vpablos@WKS0001108274:~$ ssh -v -o TCPKeepAlive=yes -o ServerAliveCountMax=20 -o ServerAliveInterval=15 localhost
OpenSSH_8.2p1 Ubuntu-4ubuntu0.4, OpenSSL 1.1.1f  31 Mar 2020
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug1: Connecting to localhost [127.0.0.1] port 22.
debug1: Connection established.
debug1: identity file /home/vpablos/.ssh/id_rsa type 0
debug1: identity file /home/vpablos/.ssh/id_rsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.4
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.2p1 Ubuntu-4ubuntu0.4
debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.4 pat OpenSSH* compat 0x04000000
debug1: Authenticating to localhost:22 as 'vpablos'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: client->server cipher: [email protected] MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:41liDTj8s2Y10brK8MN63e8eoMVs3q0GrKc/AVk+RDE
debug1: Host 'localhost' is known and matches the ECDSA host key.
debug1: Found key in /home/vpablos/.ssh/known_hosts:6
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: Will attempt key: /home/vpablos/.ssh/id_rsa RSA SHA256:pvyMfM8ata4DoXBG9jnPkTUHqrQIFoVBxGaS9L0sf0U agent
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,[email protected],ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,[email protected]>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password,keyboard-interactive
debug1: Next authentication method: publickey
debug1: Offering public key: /home/vpablos/.ssh/id_rsa RSA SHA256:pvyMfM8ata4DoXBG9jnPkTUHqrQIFoVBxGaS9L0sf0U agent
debug1: Authentications that can continue: publickey,password,keyboard-interactive
debug1: Next authentication method: keyboard-interactive
Password:
debug1: Authentication succeeded (keyboard-interactive).
Authenticated to localhost ([127.0.0.1]:22).
debug1: channel 0: new [client-session]
debug1: Requesting [email protected]
debug1: Entering interactive session.
debug1: pledge: exec
client_loop: send disconnect: Broken pipe
vpablos@WKS0001108274:~$

@rochecompaan
Copy link
Author

@equation-group can you explain why this would make a difference? We don't permit root logins on any of our servers in any event. I'm seeing disconnects on almost any utility or script that uses the network so this is clearly not an ssh issue.

@RandolphBack
Copy link

I was having the same problem setting up an sftp user. Every connection by sftp was immediately disconnected
with the "Broken pipe" message as above.

I discovered this message in the /var/log/message
fatal: bad ownership or modes for chroot directory "/srv/sftp/data/ftpuser"

The sshd_config directory specified that "/srv/sftp/data/ftpuser" in the ChrootDirectory.

The ftpuser directory was owned by ftpuser.
Changed that to be owned by root and created a subdirectory as an upload directory owned by ftpuser.

Now sftp connections are working.

@eqn-group
Copy link

eqn-group commented Jan 30, 2022

@rochecompaan @RandolphBack The fix is change MTU to 1350 using cmd netsh interface ipv4 set subinterface "vEthernet (WSL)" mtu=1350 store=persistent

and then do this: #4901 (comment)

@rochecompaan
Copy link
Author

@equation-group changing MTU or DNS settings does not work for me. I saw this comment explaining how to change "Auto metric" on the TCP/IP settings for the adapter to "1" and that seems to work! I will log ping stats for a few days to confirm.

@rochecompaan
Copy link
Author

I have been running a ping script for several hours now, and I do not see any packet loss, so this is resolved for me.

I just noticed that I have 2 WiFi adapters, which was probably causing the issue. After reading up on automatic metric feature , I guess setting the interface metric to 1 on one of the adapters forces all the network traffic through that adapter. I don't know enough, but with auto metric enabled, traffic might not always go through the correct bridge interface on WSL.

@svdHero
Copy link

svdHero commented Feb 7, 2022

@rochecompaan @RandolphBack The fix is change MTU to 1350 using cmd netsh interface set ipv4 subinterface "vEthernet (WSL)" mtu=1350 store=persistent

and then do this: #4901 (comment)

@equation-group
When I paste that command into a CMD, I get the error message

The following command was not found: interface set ipv4 subinterface "vEthernet (WSL)" mtu=1350 store=persistent.

What's wrong here? I can enter netsh in order to get to the REPL just fine.

@eqn-group
Copy link

@svdHero there was a typo in the command, i corrected it.

@jetnet
Copy link

jetnet commented Jun 23, 2022

what helped a little in my case (client_loop: send disconnect: Broken pipe) to keep a connection longer is to set the ServerAliveCountMax parameter to a higher value, e.g. 100 in .ssh/config:

Host *
    ForwardAgent yes
    AddressFamily inet
    TCPKeepAlive yes
    ServerAliveInterval 60
    ServerAliveCountMax 100
    IPQoS 0

The following did not help, but currently in place:
in WSL:

# echo 1 | sudo tee /proc/sys/net/ipv4/tcp_mtu_probing # did not help, commented out
sudo ifconfig eth0 mtu 1350

in Windows:

netsh interface ipv4 set subinterface "vEthernet (WSL)" mtu=1350 store=persistent

I have not changed the interface metric settings, as I use an Ethernet adapter on my working station.

This is definitely a WSL2 issue, since the same .ssh/config is used for WSL ssh and for the Windows OpenSSH client.

I don't understand, why the ticket has been closed.
Guys at MS, you cannot imaging how much time I waste on working around this issue.

@ghost
Copy link

ghost commented Jul 8, 2022

Using a proxy might help

@xanderificnl
Copy link

xanderificnl commented Oct 5, 2022

This is issue only started for me after enabling stricter cryptographic settings on the server. Connecting over the WireGuard tunnel went fine, straight via PowerShell too. Connecting to the servers' public IP didn't.

Changing the MTU did resolve the issue. I don't use a local ssh config for the client.

FWIW, it was coupled with a "sshd[1186]: fatal: mm_answer_sign: sign: error in libcrypto" on the server side.

@jetnet
Copy link

jetnet commented Oct 5, 2022

finally figured out: group policy update forces restart of WSL network interface: https://learn.microsoft.com/en-us/answers/questions/723272/group-policy-update-and-hyper-v-vethernet-disconne.html

@devloloper
Copy link

finally figured out: group policy update forces restart of WSL network interface: https://learn.microsoft.com/en-us/answers/questions/723272/group-policy-update-and-hyper-v-vethernet-disconne.html

Can confirm this, running "gpupdate /force" always leads to a ssh disconnect.

@jbowler
Copy link

jbowler commented Feb 18, 2023

Just a note: it's nothing to do with WSL. I can repro this on a Linux (gentoo) system with an ssh into localhost.

Clearly that is a bug, but my best guess is that it is caused by having a local system that is IPv6; if the IPv6 connection drops the local IPv6's change and I suspect that reveals a bug in ssh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants