Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): leader election preventing two controllers running and gracefully shutting down #2291

Merged
merged 29 commits into from
Oct 13, 2022

Conversation

zachaller
Copy link
Collaborator

@zachaller zachaller commented Oct 6, 2022

Fixes #2117

This change has a behavior change in that we now always spin up metrics server even on standby's they just have zero values because they are not processing anything.

@zachaller zachaller changed the title Fix leader election bug ctx fix:(controller) leader election bug with ctx Oct 6, 2022
@zachaller zachaller changed the title fix:(controller) leader election bug with ctx fix(controller): leader election bug with ctx Oct 6, 2022
Signed-off-by: zachaller <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

Go Published Test Results

1 775 tests   1 775 ✔️  2m 31s ⏱️
   101 suites         0 💤
       1 files           0

Results for commit 273ac79.

♻️ This comment has been updated with latest results.

@codecov
Copy link

codecov bot commented Oct 6, 2022

Codecov Report

Base: 82.38% // Head: 82.75% // Increases project coverage by +0.37% 🎉

Coverage data is based on head (273ac79) compared to base (33ddaf5).
Patch coverage: 93.51% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2291      +/-   ##
==========================================
+ Coverage   82.38%   82.75%   +0.37%     
==========================================
  Files         121      121              
  Lines       18476    18511      +35     
==========================================
+ Hits        15221    15319      +98     
+ Misses       2468     2408      -60     
+ Partials      787      784       -3     
Impacted Files Coverage Δ
utils/controller/controller.go 82.77% <50.00%> (ø)
rollout/trafficrouting/istio/controller.go 53.36% <72.22%> (+2.55%) ⬆️
controller/controller.go 91.41% <95.04%> (+1.96%) ⬆️
analysis/controller.go 61.53% <100.00%> (+13.05%) ⬆️
controller/metrics/metrics.go 100.00% <100.00%> (ø)
experiments/controller.go 71.42% <100.00%> (+5.67%) ⬆️
ingress/ingress.go 74.00% <100.00%> (+12.94%) ⬆️
rollout/controller.go 80.77% <100.00%> (+2.77%) ⬆️
service/service.go 78.51% <100.00%> (+10.13%) ⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: zachaller <[email protected]>
Signed-off-by: zachaller <[email protected]>
Signed-off-by: zachaller <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

E2E Tests Published Test Results

  1 files    1 suites   52m 44s ⏱️
89 tests 82 ✔️ 3 💤 4
93 runs  86 ✔️ 3 💤 4

For more details on these failures, see this check.

Results for commit 273ac79.

♻️ This comment has been updated with latest results.

@zachaller zachaller changed the title fix(controller): leader election bug with ctx fix(controller): leader election allowing two controllers to run and gracefully shutting down Oct 7, 2022
@zachaller zachaller added the ready-for-review Ready for final review label Oct 7, 2022
@zachaller zachaller changed the title fix(controller): leader election allowing two controllers to run and gracefully shutting down fix(controller): leader election preventing two controllers running and gracefully shutting down Oct 7, 2022
@zachaller zachaller marked this pull request as ready for review October 7, 2022 12:25
@sonarcloud
Copy link

sonarcloud bot commented Oct 7, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
2.9% 2.9% Duplication

@zachaller zachaller requested a review from leoluz October 7, 2022 15:59
@zachaller zachaller added this to the v1.4 milestone Oct 11, 2022
Copy link
Contributor

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair reviewed this with @zachaller
LGTM

@leoluz leoluz merged commit 35da539 into argoproj:master Oct 13, 2022
jenciso pushed a commit to jenciso/argo-rollouts that referenced this pull request Oct 25, 2022
…nd gracefully shutting down (argoproj#2291)

* WIP on fixing leader election fix

Signed-off-by: zachaller <[email protected]>

* Start and stop informers as well

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* Remove tests that do not test anything

Signed-off-by: zachaller <[email protected]>

* fix lint

Signed-off-by: zachaller <[email protected]>

* github trigger re-run

Signed-off-by: zachaller <[email protected]>

* Cleanup

Signed-off-by: zachaller <[email protected]>

* cleanup

Signed-off-by: zachaller <[email protected]>

* Add back one test

Signed-off-by: zachaller <[email protected]>

* remove secondary metric server

Signed-off-by: zachaller <[email protected]>

* Remove secondary metric test

Signed-off-by: zachaller <[email protected]>

* Add single instance test to catch log lines

Signed-off-by: zachaller <[email protected]>

* We should shutdown if we can not sync

Signed-off-by: zachaller <[email protected]>

* fix lint

Signed-off-by: zachaller <[email protected]>

* Redo for loop will have another pr that stops via context

Signed-off-by: zachaller <[email protected]>

* Fix comment

Signed-off-by: zachaller <[email protected]>

* Add context and graceful shutdown

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* Fix test

Signed-off-by: zachaller <[email protected]>

* github trigger re-run

Signed-off-by: zachaller <[email protected]>

* add more time for startup

Signed-off-by: zachaller <[email protected]>

* add individual controller startup tests

Signed-off-by: zachaller <[email protected]>

* standardize shutdown

Signed-off-by: zachaller <[email protected]>

* Standardize leader test

Signed-off-by: zachaller <[email protected]>

* fix test

Signed-off-by: zachaller <[email protected]>

* We can not turn on release on cancel

Signed-off-by: zachaller <[email protected]>

* fix release on cancel

Signed-off-by: zachaller <[email protected]>

Signed-off-by: zachaller <[email protected]>
jandersen-plaid pushed a commit to jandersen-plaid/argo-rollouts that referenced this pull request Nov 8, 2022
…nd gracefully shutting down (argoproj#2291)

* WIP on fixing leader election fix

Signed-off-by: zachaller <[email protected]>

* Start and stop informers as well

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* Remove tests that do not test anything

Signed-off-by: zachaller <[email protected]>

* fix lint

Signed-off-by: zachaller <[email protected]>

* github trigger re-run

Signed-off-by: zachaller <[email protected]>

* Cleanup

Signed-off-by: zachaller <[email protected]>

* cleanup

Signed-off-by: zachaller <[email protected]>

* Add back one test

Signed-off-by: zachaller <[email protected]>

* remove secondary metric server

Signed-off-by: zachaller <[email protected]>

* Remove secondary metric test

Signed-off-by: zachaller <[email protected]>

* Add single instance test to catch log lines

Signed-off-by: zachaller <[email protected]>

* We should shutdown if we can not sync

Signed-off-by: zachaller <[email protected]>

* fix lint

Signed-off-by: zachaller <[email protected]>

* Redo for loop will have another pr that stops via context

Signed-off-by: zachaller <[email protected]>

* Fix comment

Signed-off-by: zachaller <[email protected]>

* Add context and graceful shutdown

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* Fix test

Signed-off-by: zachaller <[email protected]>

* github trigger re-run

Signed-off-by: zachaller <[email protected]>

* add more time for startup

Signed-off-by: zachaller <[email protected]>

* add individual controller startup tests

Signed-off-by: zachaller <[email protected]>

* standardize shutdown

Signed-off-by: zachaller <[email protected]>

* Standardize leader test

Signed-off-by: zachaller <[email protected]>

* fix test

Signed-off-by: zachaller <[email protected]>

* We can not turn on release on cancel

Signed-off-by: zachaller <[email protected]>

* fix release on cancel

Signed-off-by: zachaller <[email protected]>

Signed-off-by: zachaller <[email protected]>
jandersen-plaid pushed a commit to jandersen-plaid/argo-rollouts that referenced this pull request Nov 26, 2022
…nd gracefully shutting down (argoproj#2291)

* WIP on fixing leader election fix

Signed-off-by: zachaller <[email protected]>

* Start and stop informers as well

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* Remove tests that do not test anything

Signed-off-by: zachaller <[email protected]>

* fix lint

Signed-off-by: zachaller <[email protected]>

* github trigger re-run

Signed-off-by: zachaller <[email protected]>

* Cleanup

Signed-off-by: zachaller <[email protected]>

* cleanup

Signed-off-by: zachaller <[email protected]>

* Add back one test

Signed-off-by: zachaller <[email protected]>

* remove secondary metric server

Signed-off-by: zachaller <[email protected]>

* Remove secondary metric test

Signed-off-by: zachaller <[email protected]>

* Add single instance test to catch log lines

Signed-off-by: zachaller <[email protected]>

* We should shutdown if we can not sync

Signed-off-by: zachaller <[email protected]>

* fix lint

Signed-off-by: zachaller <[email protected]>

* Redo for loop will have another pr that stops via context

Signed-off-by: zachaller <[email protected]>

* Fix comment

Signed-off-by: zachaller <[email protected]>

* Add context and graceful shutdown

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* Fix test

Signed-off-by: zachaller <[email protected]>

* github trigger re-run

Signed-off-by: zachaller <[email protected]>

* add more time for startup

Signed-off-by: zachaller <[email protected]>

* add individual controller startup tests

Signed-off-by: zachaller <[email protected]>

* standardize shutdown

Signed-off-by: zachaller <[email protected]>

* Standardize leader test

Signed-off-by: zachaller <[email protected]>

* fix test

Signed-off-by: zachaller <[email protected]>

* We can not turn on release on cancel

Signed-off-by: zachaller <[email protected]>

* fix release on cancel

Signed-off-by: zachaller <[email protected]>

Signed-off-by: zachaller <[email protected]>
@zachaller zachaller mentioned this pull request Nov 30, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-review Ready for final review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

argo-rollouts controller leader resumes working after it stops leading, running alongside the new leader
2 participants