[24.0 backport] ssh: fix error on commandconn close, add ping and default timeout #4395
+316
−110
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
backports:
- Related
- TLDR
During CLI initialization we ping the configured host. Initially, we set no context timeout for these connections, and after b397391 we set timeouts but only for TCP connections (which then got changed in #3722 to only specifically exclude SSH connections and not sockets).
This looks to be due to errory looking logs from
cli/connhelper/commandconn/commandconn.go
onClose()
– which gets called byhttp.Transport
when the request context times out.However, these logs are being thrown erroneously –
commandconn.kill()
states that it will return nil if the command terminates regardless of it's exit status:cli/cli/connhelper/commandconn/commandconn.go
Line 113 in 26a7357
But the implementation does not hold up to this contract:
cli/cli/connhelper/commandconn/commandconn.go
Line 130 in 26a7357
since
ProcessState.Exited()
returns true only if the program exited due to callingexit
and not if the program was terminated by a signal (this probably does works on Windows since hereProcessState.Exited()
returns true if the program has exited either way, but not on POSIXy OSs).If
commandconn.kill()
is fixed to always returnnil
when the command has terminated, setting timeouts on contexts passed to SSH connections works fine and we no longer need special handling.Other than that, I also added
-o ConnectTimeout=30
to the created SSH command, which serves to align with the timeout from the client returned for non-SSH connections incli/cli/context/docker/load.go
Line 134 in 26a7357
docker stats
or anattach
won't time out after 30s.During this I also found a racy call to
cmd.Wait()
betweenClose()
andonEOF()
, since callingClose
will close the command's stdin/out pipes, which will throw an EOF to pending reads, triggering a call toonEOF()
, but afterwardsClose
will also callkill()
which will also callcmd.Wait()
- What I did
cli/connhelper/commandconn/commandconn.go
:kill()
: Fix implementation so we don't return an error when process is successfully closedWrite()
/Read()
: Prevent race by checking if connection is being closedcli/command/cli.go
:initializeFromClient()
: remove SSH special handling so we also add timeout for SSH connectionscli/context/docker/load.go
:ClientOpts()
: add 30s timeout to align with non-SSH client- How to verify it
DOCKER_HOST=ssh://example.com docker
: no longer hangs forever, now it returns after the default 2s CLI init timeoutDOCKER_HOST=ssh://example.com docker version
: no longer hangs forever, now it will timeout after 30s (can be more if the host resolves to more than 1 IP address)- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)