Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not perform explicit cluster deletion in prod e2e #3513

Merged

Conversation

tsatam
Copy link
Collaborator

@tsatam tsatam commented Apr 10, 2024

Which issue this PR addresses:

Fixes no Jira yet

What this PR does / why we need it:

Skips explicit cluster deletion when CI=true and RP_MODE!=development, and let deletion of the cluster's resource group take care of cluster deletion for us. This is required in production e2e contexts to avoid a race condition between us and ARM performing cluster deletion simultaneously and causing one of the requests to fail.

Test plan for issue:

  • E2E should still work on this PR

Is there any documentation that needs to be updated for this PR?

No

How do you know this will function as expected in production?

  • E2E should still work in releases

@tsatam
Copy link
Collaborator Author

tsatam commented Apr 10, 2024

/azp run ci,e2e

@tsatam
Copy link
Collaborator Author

tsatam commented Apr 10, 2024

/azp run ci,e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@tsatam tsatam added next-release To be included in the next RP release rollout ready-for-review labels Apr 15, 2024
errs = append(errs, err)
}
}

if c.ci {
// Only perform explicit cluster deletion when in local development mode, otherwise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't deleting the resource group only work in local development mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed during sync call: The implementation here, with the two boolean flags (CI and localDevelopmentMode) leads to confusion on which flags are set and what behavior we expect for each permutation. We should follow-up this change with a refactor to make it more explicit and clear what steps are being performed in what contexts, even if it does lead to code duplication.

SudoBrendan
SudoBrendan previously approved these changes Apr 17, 2024
Copy link
Collaborator

@SudoBrendan SudoBrendan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nit: would it be better to use %w vs %v in all these cases? idk if it matters in a logging context or not?

@tsatam
Copy link
Collaborator Author

tsatam commented Apr 17, 2024

nit: would it be better to use %w vs %v in all these cases? idk if it matters in a logging context or not?

%w only has special behavior if within a fmt.Errorf call - it will implement the Unwrap method on the returned error that returns the error(s) included in any %ws specified, otherwise it behaves identically to %v. https://pkg.go.dev/fmt#Errorf

Perhaps a better change here is to wrap the errors themselves using fmt.Errorf and both log and return the wrapped error, but without being 100% certain what downstream consumers of the error are expecting I did not want to make that change.

@tsatam
Copy link
Collaborator Author

tsatam commented Apr 17, 2024

I pushed up an opinionated refactor to address some legibility/intent concerns - we can back this change out or defer it to a follow-up PR if desired.

@tsatam
Copy link
Collaborator Author

tsatam commented Apr 17, 2024

/azp run ci,e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Collaborator

@UlrichSchlueter UlrichSchlueter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to switch the Prod E2E case to use the deleteCluster call again in a future PR. We don't test our cluster deletion code the way it is.
Approving anyway to unblock the next release.

@UlrichSchlueter UlrichSchlueter merged commit 5da8670 into master Apr 25, 2024
20 checks passed
@mociarain mociarain deleted the tsatam/hotfix-fix-e2e-cluster-deletion-race-condition branch April 25, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next-release To be included in the next RP release rollout ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants