Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: retry on connection refused #1245

Merged
merged 2 commits into from
Feb 22, 2023
Merged

helm: retry on connection refused #1245

merged 2 commits into from
Feb 22, 2023

Conversation

3u13r
Copy link
Member

@3u13r 3u13r commented Feb 21, 2023

Proposed change(s)

  • Let the helm installer retry on "connection refused" errors

Problem:
constellation init fails with:

Your Constellation master secret was successfully written to ./constellation-mastersecret.json
Initializing cluster   
Cluster initialization failed. This error is not recoverable.
Terminate your cluster and try again.
Error: init call: rpc error: code = Internal desc = initializing cluster: installing pod network: helm install: Kubernetes cluster unreachable: Get "https://X.X.X.X:6443/version": dial tcp X.X.X.X:6443: connect: connection refused

Detailed description:

What is interesting that after kubeadm init we have an explicit kubewaiter, which waits until it can list all K8s namespaces using the kubeclient and further kubeclient requests such as annotating the current node. But the execution of the first helm command fails.

The kubeclient automatically retries on almost all error cases. The helm client just on timeout errors (which the error we see is not, it is a "connection refused" error).
During all those requests mentioned above all targets (i.e. nodes) are unhealthy. This means that the AWS LB "fails open", routing to all targets.
This becomes most apparent on AWS since the switch from unhealthy to healthy takes at least 20 seconds. On GCP and Azure it should take 2-5.

Checklist

  • Add labels (e.g., for changelog category)
  • Link to Milestone

@3u13r 3u13r added the bug fix Fixing a bug label Feb 21, 2023
@3u13r 3u13r added this to the v2.6.0 milestone Feb 21, 2023
@3u13r 3u13r requested a review from derpsteb February 21, 2023 14:53
@3u13r 3u13r assigned katexochen and unassigned katexochen Feb 21, 2023
@netlify
Copy link

netlify bot commented Feb 21, 2023

Deploy Preview for constellation-docs canceled.

Name Link
🔨 Latest commit c9da251
🔍 Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/63f5d720af82ac000737ece9

@derpsteb
Copy link
Member

derpsteb commented Feb 21, 2023

🟢 AWS, 1.25, 2+3

@katexochen
Copy link
Member

katexochen commented Feb 22, 2023

Copy link
Member

@katexochen katexochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@katexochen katexochen added the needs backport This PR needs to be backported to a previous release label Feb 22, 2023
@3u13r 3u13r merged commit 3339ae2 into main Feb 22, 2023
@3u13r 3u13r deleted the fix/helmRetrierAWS branch February 22, 2023 08:58
derpsteb pushed a commit that referenced this pull request Feb 22, 2023
* bootstrapper: directly return kubewaiter error

* helm: retry on connection refused
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixing a bug needs backport This PR needs to be backported to a previous release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants