-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental Managed Transport - Known Issues #636
Comments
Some of the SSH issues may be connected a concurrency issue calling |
The first release with the fixes is now out, to test them you must first opt-in to the managed transport by setting the environment variable You can do that directly in your patches:
- patch: |
- op: add
path: /spec/template/spec/containers/0/env/0
value:
name: EXPERIMENTAL_GIT_TRANSPORT
value: "true"
target:
kind: Deployment
name: "(image-automation-controller|source-controller)" Note that managed transport only works with the The official images with the fixes: source-controller -> |
Re-opening due to new issues reported (items 6 to 8) - issue description updated accordingly. |
Version v0.22 introduced an experimental managed transport to move towards fixing some stability issues when executing git network operations.
This issue catalogues all known issues with the new transport and their respective statuses. Please note that some of this issues could also be experienced with
go-git
and the non-managedlibgit2
implementations.1) ssh.Dial hangs indefinitely ✔️
SSH connections hang indefinitely during a
ssh.Dial
call. Behind the scenes the transport handshake seems to get stuck during key exchange (atkexLoop
). More information can be found at upstream issue.Fixed from:
source-controller
->ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
->image-automation-controller:v0.21.2
2) HTTP leaked connections ✔️
The controllers shown an ever increasing number of HTTP established connections (i.e.
netstat
).Upon investigation, some requests were not completely processed and closed, impairing the likelihood of the underlying connections to be reused. The transport instances were created per request and never shared.
Fixed from:
source-controller
->ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
->image-automation-controller:v0.21.2
3) SSH leaked connections ✔️
The controllers shown an ever increasing number of SSH established connections (i.e.
netstat
).SSH connections are now cached based on the remote target, meaning that all the operations take place as part of the same connection instead of the previous 1 connection per command (clone/push).
Fixed from:
source-controller
->ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
->image-automation-controller:v0.21.2
4) Intermittent SSH errors ✔️
The upstream git and crypto libraries do not support multiple and concurrent SSH connections very well (i.e. golang/go#27140).
An initial attempt to cache ssh connections and reuse them cross ssh commands completely eliminated intermittent errors (i.e. #439) during long-running tests.
Fixed from:
source-controller
->ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
->image-automation-controller:v0.21.2
5) Panic when closing SSH connections ✔️
The upstream git2go implementation was trying to call
.Wait()
and.Close()
in session or stdin objects that could benil
, leading to panic.Fixed from:
source-controller
->ghcr.io/fluxcd/source-controller:v0.22.5
image-automation-controller
->image-automation-controller:v0.21.3
6)
multi-ack
protocol over SSH ✔️Connecting to ssh servers that require Git's
multi-ack
feature (i.e. Azure DevOps) results in consistent errors:EOF
transport closed
This seems to occur due to the fact that the remote server closes the connection mid-flight.
Connections to Azure DevOps will fallback to unmanaged transport and users will also gain opt-in/out powers based on #662
7) BitBucket ✔️
Multiple concurrent Git connections (one per key type for example) lead to errors
ssh.Dial: dial tcp xxx.xxx.xxx.xxx:22: i/o timeout
orssh: rejected: administratively prohibited (cannot open additional channels)
.The removal of cached connections and servicing the PipeStdOut fast enough has fixed this.
8)
git2go
/libgit2
may panic and force the controller to crash ✔️9) Stale connections leading to continuous errors ✔️
Cached connections may stale over time. In some Git providers (e.g. GitLab) this may happen sooner than others.
Once the connections become stale, errors reconciling become common.
Fixed from:
source-controller
->ghcr.io/fluxcd/source-controller:v0.23.0
image-automation-controller
-> pendingThe text was updated successfully, but these errors were encountered: