Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Add refreshes and retries to server-acl-init job into release/1.2.x #3249

Conversation

hc-github-team-consul-core
Copy link
Collaborator

Backport

This PR is auto-generated from #3137 to be assessed for backporting due to the inclusion of the label backport/1.2.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@curtbushko
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/consul-k8s/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


Changes proposed in this PR:

  • When the server-acl-init job runs it boostraps ACL tokens and sets a bunch of things on the server up
  • It gets the static IPs for the servers, creates a boostrap token and sets a policy using that IP
  • Then consul-server-connection-manager runs and other ACLs get set up using the consul client
  • Usually this is fine.
  • But during upgrades the servers statefulsets can change and thus the server IPs change.
  • Our code was set to retry over and over again and would eventually timeout after 10 minutes
  • This does not work as it forces customers to run the upgrades several times in hopes that the server IPs do not change.
  • I've created a wrapper around the consul client called DynamicClient that allows you to refresh the IPs using consul-server-connection-manager
  • The DynamicClient will be useful in other place also.
  • A customer reported a single error but in my testing I discovered about 5 other places where the IPs would be incorrect and we needed to refresh.
  • The key areas are around 'untilSucceeds()' as these consul calls would fail forever

How I've tested this PR:

  • Added a unit test to DynamicClient that simulates a bad server ip, a refresh with a good ip and a client call.
  • The race condition does not occur all the time but I have been able to reproduce it manually by upgrading over and over again, which sometimes forces the servers to change IPs.

How I expect reviewers to test this PR:

👀

Checklist:


Overview of commits

@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/NET-5991-acl-init-upgrade/similarly-quality-griffon branch 2 times, most recently from 32ffe63 to 96ae13d Compare November 21, 2023 20:40
@hashicorp-cla
Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


temp seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA. If you already have a GitHub account, please add the email address used for this commit to your account.

Have you signed the CLA already but the status is still pending? Recheck it.

@curtbushko curtbushko closed this Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants