-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.7.0-beta.0 talosctl health fails due to temporary error connect: connection refused #8552
Closed
Tracked by
#8549
Comments
smira
added a commit
to smira/talos
that referenced
this issue
Apr 12, 2024
Fixes siderolabs#8552 When `apid` notices update in the PKI, it flushes its client connections to other machines (used for proxying), as it might need to use new client certificate. While flushing, just calling `Close` might abort already running connections. So instead, try to close gracefully with a timeout when the connection is idle. Signed-off-by: Andrey Smirnov <[email protected]> (cherry picked from commit 336e611)
@smira, I've just tried 1.7.0-beta.1 and the error reported in this issue is still happening. |
yep, I saw that in the integration tests as well, probably the fix is not complete. |
smira
added a commit
to smira/talos
that referenced
this issue
Apr 16, 2024
Fixes siderolabs#8552 This fixes up the previous fix where `for` condition was inverted, and also updates the idle timeout, so that the transition to idle happens before the timeout expires. Signed-off-by: Andrey Smirnov <[email protected]>
smira
added a commit
to smira/talos
that referenced
this issue
Apr 19, 2024
Fixes siderolabs#8552 This fixes up the previous fix where `for` condition was inverted, and also updates the idle timeout, so that the transition to idle happens before the timeout expires. Signed-off-by: Andrey Smirnov <[email protected]> (cherry picked from commit 5d07ac5)
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Bug Report
Description
While trying the new v1.7.0-beta.0 release, the
talosctl health
command seems to have a regression relative to v1.6.7. I think it should have ignored the temporary errors, and only return when the cluster is healthy.I'm not sure if this is related to
spurious 'Connection closing' errors in integration tests
mentioned in #8549.Logs
Immediately after launching the cluster with terraform, calling
talosctl health
systematically fails with the following error.FWIW, after manually waiting for the cluster to be actually healthy, calling
talosctl health
works as expected.Comparing to v1.6.7, which also shows that error, but ignores it, here's the v1.6.7 output:
Environment
The full terraform program is at https://github.com/rgl/terraform-libvirt-talos/tree/upgrade-to-talos-1.7.0-beta.0.
The text was updated successfully, but these errors were encountered: