-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not perform explicit cluster deletion in prod e2e #3513
Do not perform explicit cluster deletion in prod e2e #3513
Conversation
/azp run ci,e2e |
/azp run ci,e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
pkg/util/cluster/cluster.go
Outdated
errs = append(errs, err) | ||
} | ||
} | ||
|
||
if c.ci { | ||
// Only perform explicit cluster deletion when in local development mode, otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't deleting the resource group only work in local development mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed during sync call: The implementation here, with the two boolean flags (CI and localDevelopmentMode) leads to confusion on which flags are set and what behavior we expect for each permutation. We should follow-up this change with a refactor to make it more explicit and clear what steps are being performed in what contexts, even if it does lead to code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - nit: would it be better to use %w
vs %v
in all these cases? idk if it matters in a logging context or not?
Perhaps a better change here is to wrap the errors themselves using |
I pushed up an opinionated refactor to address some legibility/intent concerns - we can back this change out or defer it to a follow-up PR if desired. |
/azp run ci,e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to switch the Prod E2E case to use the deleteCluster call again in a future PR. We don't test our cluster deletion code the way it is.
Approving anyway to unblock the next release.
Which issue this PR addresses:
Fixes no Jira yet
What this PR does / why we need it:
Skips explicit cluster deletion when CI=true and RP_MODE!=development, and let deletion of the cluster's resource group take care of cluster deletion for us. This is required in production e2e contexts to avoid a race condition between us and ARM performing cluster deletion simultaneously and causing one of the requests to fail.
Test plan for issue:
Is there any documentation that needs to be updated for this PR?
No
How do you know this will function as expected in production?