Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent hot loop when fully promoted rollout is aborted #3064

Merged
merged 2 commits into from
Sep 29, 2023

Conversation

jessesuen
Copy link
Member

@jessesuen jessesuen commented Sep 28, 2023

Resolves #2982

It was possible that if someone ran kubectl argo rollouts abort against a rollout that was fully promoted, it could get the controller into a reconciliation hot loop where it kept adding and removing scale-down-deadline on a replicaset.

This fix will detect if we are both aborted but also fully promoted, and will remove the abort condition. If we are in a fully promoted state, then I do not think it makes sense for a rollout to ever be aborted.

When testing, I reproduced the hot loop, then ran my version of the controller and it prevented the loop.

This PR also improves a bunch of tests to use assert.JSONEq instead of assert.Equal, which will produce much better diffs upon errors. e.g.:

           	            	--- Expected
            	            	+++ Actual
            	            	@@ -3,3 +3,3 @@
            	            	   (string) (len=5) "abort": (interface {}) <nil>,
            	            	-  (string) (len=4) "asdf": (interface {}) <nil>,
            	            	+  (string) (len=9) "abortedAt": (interface {}) <nil>,
            	            	   (string) (len=10) "conditions": ([]interface {}) (len=3) {
            	Test:       	TestHandleCanaryAbort/Do_not_reset_currentStepCount_and_reset_abort_if_newRS_is_stableRS
FAIL

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional with a list of types and scopes found here, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 28, 2023

Go Published Test Results

2 045 tests   2 045 ✔️  2m 40s ⏱️
   118 suites         0 💤
       1 files           0

Results for commit 2c6e8b5.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 28, 2023

E2E Tests Published Test Results

    4 files      4 suites   4h 29m 12s ⏱️
102 tests   82 ✔️   6 💤 14
440 runs  386 ✔️ 24 💤 30

For more details on these failures, see this check.

Results for commit 2c6e8b5.

♻️ This comment has been updated with latest results.

@jessesuen jessesuen marked this pull request as draft September 28, 2023 03:49
@jessesuen jessesuen force-pushed the fix/scale-down-deadline-hot-loop branch from 607fa12 to d5d88e2 Compare September 28, 2023 05:08
@codecov
Copy link

codecov bot commented Sep 28, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (f650a1f) 81.75% compared to head (2c6e8b5) 81.74%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3064      +/-   ##
==========================================
- Coverage   81.75%   81.74%   -0.02%     
==========================================
  Files         134      134              
  Lines       20395    20398       +3     
==========================================
  Hits        16674    16674              
- Misses       2865     2866       +1     
- Partials      856      858       +2     
Files Coverage Δ
rollout/controller.go 81.65% <100.00%> (+0.08%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jessesuen jessesuen force-pushed the fix/scale-down-deadline-hot-loop branch from d5d88e2 to 2c6e8b5 Compare September 28, 2023 18:01
@sonarcloud
Copy link

sonarcloud bot commented Sep 28, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
7.8% 7.8% Duplication

@zachaller zachaller merged commit 5ac4a48 into argoproj:master Sep 29, 2023
26 checks passed
@jessesuen jessesuen deleted the fix/scale-down-deadline-hot-loop branch October 3, 2023 01:27
phclark pushed a commit to phclark/argo-rollouts that referenced this pull request Oct 13, 2023
…j#3064)

* fix: prevent hot loop when fully promoted rollout is aborted

Signed-off-by: Jesse Suen <[email protected]>

* test: change expectations of abort tests

Signed-off-by: Jesse Suen <[email protected]>

---------

Signed-off-by: Jesse Suen <[email protected]>
Signed-off-by: Philip Clark <[email protected]>
phclark pushed a commit to phclark/argo-rollouts that referenced this pull request Oct 15, 2023
…j#3064)

* fix: prevent hot loop when fully promoted rollout is aborted

Signed-off-by: Jesse Suen <[email protected]>

* test: change expectations of abort tests

Signed-off-by: Jesse Suen <[email protected]>

---------

Signed-off-by: Jesse Suen <[email protected]>
Signed-off-by: Philip Clark <[email protected]>
phclark pushed a commit to phclark/argo-rollouts that referenced this pull request Oct 15, 2023
…j#3064)

* fix: prevent hot loop when fully promoted rollout is aborted

Signed-off-by: Jesse Suen <[email protected]>

* test: change expectations of abort tests

Signed-off-by: Jesse Suen <[email protected]>

---------

Signed-off-by: Jesse Suen <[email protected]>
Signed-off-by: Philip Clark <[email protected]>
zachaller pushed a commit that referenced this pull request Oct 25, 2023
* fix: prevent hot loop when fully promoted rollout is aborted

Signed-off-by: Jesse Suen <[email protected]>

* test: change expectations of abort tests

Signed-off-by: Jesse Suen <[email protected]>

---------

Signed-off-by: Jesse Suen <[email protected]>
zachaller pushed a commit that referenced this pull request Oct 25, 2023
* fix: prevent hot loop when fully promoted rollout is aborted

Signed-off-by: Jesse Suen <[email protected]>

* test: change expectations of abort tests

Signed-off-by: Jesse Suen <[email protected]>

---------

Signed-off-by: Jesse Suen <[email protected]>
Signed-off-by: zachaller <[email protected]>
zachaller pushed a commit that referenced this pull request Oct 25, 2023
* fix: prevent hot loop when fully promoted rollout is aborted

Signed-off-by: Jesse Suen <[email protected]>

* test: change expectations of abort tests

Signed-off-by: Jesse Suen <[email protected]>

---------

Signed-off-by: Jesse Suen <[email protected]>
Signed-off-by: zachaller <[email protected]>
@zachaller zachaller added the cherry-pick-completed Used once we have cherry picked the PR to all requested releases label Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aborting a fully-promoted traffic-routed rollout causes controller scale-down-deadline hot-loop
2 participants