Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize cleanup of created resources #261

Merged
merged 11 commits into from
Aug 4, 2020

Conversation

timoreimann
Copy link
Contributor

@timoreimann timoreimann commented Apr 27, 2020

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change revamps the way resources (like volumes and now also snapshots) are managed in tests with regards to cleaning up. Instead of putting the onus of cleaning up on the test author, we extend Cleanup to automatically (un-)register resources as they are being used.

Cleanup now exposes a single API that implements both ControllerClient and NodeClient to make it easier for all garbage collection-worthy requests to be funnelled through the new API. The way this is implemented in Cleanup is by embedding both ControllerClient and NodeClient, and proxying to the actual methods before registering cleanup tasks and returning the results.

Consequently, we can throw away large chunks of cleanup test code and unify all {Controller,Node}Client access to the Cleanup variable. In essence, this makes it much easier to do the right thing as a test author since each existing Describe context will provide a single
interaction point to the CSI APIs only.

For frequently used resource creation operations, we also provide Must* equivalents that fail the test if the results are unexpected. This makes our test code even more streamlined by DRYing out the number of assertions called.

List of other changes:

cleanup.go:

  • Key volume and snapshot objects by ID instead of name. We have a few tests that omit or reuse the name, which makes it impossible to do automatic cleanup. Not printing the name of the resource as we clean up is a small price we have to pay for this adjustment, though.
  • Fail tests when any cleanup operation errors out, except when we see error codes indicating that the resource is already cleaned up. Using a small logger wrapper to simplify automatic test failure.
  • Rename DeleteVolumes to Cleanup.
  • Provide convenience method MustCreateSnapshotFromVolumeRequest to create a sourcing volume and a snapshot in one go.

controller.go, node.go:

  • Change all tests to use the API exposed by Cleanup only. (That is, do not offer ControllerClient and NodeClient directly anymore.)
  • Register Cleanup.Cleanup in AfterEach where missing.
  • Drop cleanup steps from various tests as this is now being taken care of by Cleanup.
  • Use Must* equivalents were applicable.
  • Use HaveLen to simplify length assertions.
  • Make order of Cleanup variable initialization consistent.
  • Minor cosmetic improvements.

Rename Cleanup to Resources and the file name accordingly.

Which issue(s) this PR fixes:

Fixes #260

Special notes for your reviewer:

Cleanup probably deserves a more generic name as this point, like Resources. I hesitated from renaming the variable (and the hosting file name) though to ease diffing the change. If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR. (As agreed on during the review, this PR now also does the rename.)

Does this PR introduce a user-facing change?:

Rename Cleanup to Resources and unexport cleanup (un-)registration, which is now handled implicitly and automatically.

/cc @pohly

@k8s-ci-robot k8s-ci-robot requested a review from pohly April 27, 2020 14:43
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 27, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @timoreimann. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 27, 2020
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. Much cleaner (no pun intended, or did I?) cleanup code...

The csi-test repo is also a Go module that others may import and call directly (we do that in PMEM-CSI), so this is a breaking API change which must be dealt with accordingly:

  • announce it in the release note
  • bump the version of the repo to v4.0.0, which implies updating the import paths

If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR.

I think we should do that in a separate commit.

NodeClient csi.NodeClient
Context *TestContext
// ControllerClient is meant for struct-internal use only
csi.ControllerClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant for that, but we don't enforce that because the embedded member is getting exported, right?

I'm on the edge whether the API should prevent access by making it an unexpected member (controllerClient csi.ControllerClient). There may be valid cases where a user may want to call the methods that aren't wrapped.

I think I prefer keeping it like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment only because unexporting wouldn't help: the (repo-internal) consumers are all part of the same sanity package, so they'd still be able to access controllerClient. It'd have to be moved into a sub-package, but I didn't want to go that far in my PR.

I can't think of too many reasons why users wouldn't want to go through the Cleanup layer: unless csi-test was buggy, it should do the right thing. The one case I can think of is when the Cleanup() part shouldn't be executed. Maybe that's reasonable, so let's stick to keeping it exported. I updated the comment to make the implications more clear.

},
); err != nil {
logger.Printf("warning: NodeUnstageVolume: %s", err)
if status.Code(err) != codes.NotFound {
Fail(fmt.Sprintf("NodeUnpublishVolume failed: %s", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't Fail during cleanup while there is still work to do. Instead the code should try to execute all operations, log and/or collect failures, and then in the end fail the test.

Otherwise cleaning up stops early although some other volumes perhaps could be deleted successfully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I created a small logger wrapper that tries to simplify the task.

// successfully created.
func (cl *Cleanup) MustCreateSnapshot(ctx context.Context, req *csi.CreateSnapshotRequest) *csi.CreateSnapshotResponse {
snap, err := cl.createSnapshot(ctx, req)
Expect(err).NotTo(HaveOccurred())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem with assertions in helper functions is that Ginkgo only reports one source line for the error by default; even with -trace, that initial line is typically not very informative.

Better use ExpectWithOffset and add an offset parameter to the MustCreateSnapshot parameters so that this helper function an also be called indirectly through some other helper functions.

Also, NotTo(HaveOccurred()) without additional explanation is potentially problematic, depending on how much information is in the error. Much too often the error is very generic, in which case the assertion produced by Gomega doesn't say anything about what failed.

Better always use NoTo(HaveOccurred(), "create snapshot", potentially even with further parameters.

I know, much of the existing code doesn't do that properly either 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I remember you previously mentioned how To() should have a description but forgot about it again. I made sure it's now set.

I also updated the code to specify and/or pass through the offset everywhere. It's not a beauty though, I wonder if ginkgo could do better here by deriving the offset automatically (at least after the top-level t.Helper()-like indicator).

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2020
@timoreimann
Copy link
Contributor Author

@pohly all comments addressed, except for the two regarding the breaking change because I have one more dependent question:

I might not be fully familiar with how consumers of csi-test are expected to be given access. For the DigitalOcean CSI driver, we merely configure and start the tests. Do you (or others) go beyond that, specifically by accessing exported variables from the Cleanup / Resources struct?

The reason I'm asking is that I'm wondering if we should take advantage of the breaking change and start hiding parts of the package under an internal package. What are your thoughts on that?

@pohly
Copy link
Contributor

pohly commented May 11, 2020

Do you (or others) go beyond that, specifically by accessing exported variables from the Cleanup / Resources struct?

Yes, in PMEM-CSI we do have custom tests that are built on top of the sanity infrastructure, in addition to running the pre-defined tests: https://github.com/intel/pmem-csi/blob/018313154dff214da21fe39e6902d87857bc26e8/test/e2e/storage/sanity.go#L191-L230

The reason I'm asking is that I'm wondering if we should take advantage of the breaking change and start hiding parts of the package under an internal package. What are your thoughts on that?

Please don't 😅

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 11, 2020
@timoreimann
Copy link
Contributor Author

@pohly alrighty, I added a release note and moved v3 to v4 while also updating the links. Let me know if I missed something.

@timoreimann
Copy link
Contributor Author

Would love to see this getting approved and merged soonish because it touches a fair amount of existing tests, so other merges happening meanwhile stand a fair chance of generating merge conflicts.

@pohly
Copy link
Contributor

pohly commented May 18, 2020

/kind api-change

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label May 18, 2020
@pohly
Copy link
Contributor

pohly commented May 18, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 18, 2020

// Assert fails the spec if any error was logged.
func (l *logger) Assert(offset int) {
ExpectWithOffset(offset+1, l.failed).To(BeFalse())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly calling https://pkg.go.dev/github.com/onsi/ginkgo?tab=doc#Fail with a caller skip parameter and a suitable message is probably going to look better in the resulting test failure.

If you want to make the failure message more informative, count errors and include the count here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer logger.Assert() looks odd. Assert - assert what?

Perhaps logger.CheckForErrors() or (similar to framework.ExpectNoError) logger.ExpectNoErrors()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented both.

@timoreimann
Copy link
Contributor Author

timoreimann commented May 20, 2020

I rebased once more. Also noticed I missed some changes needed for the v3->v4 transition, so updated that as well.

It seems like I had to run go mod vendor as well. Did that with Go v1.12 as that seems to be the minimum Go version according to go.mod.

@pohly
Copy link
Contributor

pohly commented May 25, 2020

It seems like I had to run go mod vendor as well. Did that with Go v1.12 as that seems to be the minimum Go version according to go.mod.

I think go.mod specifies what we are compatible with. However, in practice this isn't getting tested: we only test with the Go version specified in

As long as Go 1.12 and 1.13 produce the same output, that doesn't matter. It worked here, so this is FYIO. However, I have seen cases where it didn't work and the pre-merge check with Go 1.13 complained.

This is true for all Kubernetes-CSI repos. I wonder whether we should:

  • extend our testing to cover building and testing with several Go releases or
  • bump up the version in go.mod.

The latter has the problem that it prevents downstream users from using an older Go even when that would still technically work. This is only an issue for repos that may get imported by others as a dependency (csi-test, csi-lib-utils).

@msau42: any thoughts on this?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2020
@pohly
Copy link
Contributor

pohly commented May 25, 2020

@timoreimann please rebase.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2020
@timoreimann
Copy link
Contributor Author

@pohly rebased. I also verified that 1.13 does not differ with regards to go mod tidying / vendoring, even though I think we figured that already.

@timoreimann timoreimann force-pushed the cleanup-consistently branch 4 times, most recently from 36e1977 to 2e872cb Compare July 29, 2020 23:43
@timoreimann
Copy link
Contributor Author

timoreimann commented Jul 30, 2020

@pohly I figured out why the tests were failing: two AfterEach() blocks to clean up after creating and deleting snapshot tests were missing, so the left around snapshots affected other tests. I suppose it worked locally for me because of different execution orders.

I pushed a fixing commit and rebased from master. From my point of view, the PR is now good to move on.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2020
@pohly
Copy link
Contributor

pohly commented Aug 4, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, timoreimann

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2020
@k8s-ci-robot k8s-ci-robot merged commit 09bd3cf into kubernetes-csi:master Aug 4, 2020
@timoreimann timoreimann deleted the cleanup-consistently branch August 11, 2020 07:48
timoreimann added a commit to timoreimann/csi-test that referenced this pull request Aug 31, 2020
[1] accidentally swapped the cleanup order which represents a deviation
to the previous behavior that not all CSI drivers may be able to handle.
This change restores the original order.

[1]: kubernetes-csi#261
timoreimann added a commit to timoreimann/csi-test that referenced this pull request Sep 29, 2020
This addresses a regression in [1] causing plugins to return an in-use
error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot
or a volume) is deleted before the sourced volume is.

[1]: kubernetes-csi#261
timoreimann added a commit to timoreimann/csi-test that referenced this pull request Jan 20, 2021
This addresses a regression in [1] causing plugins to return an in-use
error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot
or a volume) is deleted before the sourced volume is.

[1]: kubernetes-csi#261
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed tests leak CSI resources
4 participants