Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Filter cluster-wide objects asserted in ResourceVersion tests to exclude objects of parallel tests #10560

Merged

Conversation

chrischdi
Copy link
Member

What this PR does / why we need it:

In a rare test case where k8s-upgrade-with-runtime-sdk and the quickstart tests's assertion on ValidateResourceVersionStable are run in-parallel, we can have a flaky test because:

Getting the owner graph included the RuntimeExtension. When k8s-upgrade-with-runtime-sdk finishes, it deleted the CR again, so the assertion in ValidateResourceVersionStable failed and the test failed.

xref: failure occurency

Also: enables the ValidateResourceVersionStable test for the k8s-upgrade-with-runtime-sdk to also test the ExtensionConfig.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area e2e-testing

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 6, 2024
@chrischdi chrischdi force-pushed the pr-e2e-fix-resourceversion-tests branch from a3cf70c to 4dbe1f7 Compare May 6, 2024 15:46
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 6, 2024
@chrischdi
Copy link
Member Author

/test help

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-dualstack-and-ipv6-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-30-1-31-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member Author

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-apidiff-main 4dbe1f7 link false /test pull-cluster-api-apidiff-main
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

This is fine, the changed function signature is explicitly marked not as stable, because of only being used in e2e tests.

@chrischdi chrischdi force-pushed the pr-e2e-fix-resourceversion-tests branch from 4dbe1f7 to 1588a2c Compare May 6, 2024 16:20
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@chrischdi chrischdi changed the title [WIP] 🐛 test: filter cluster-wide objects asserted in ResourceVersion tests to exclude objects of parallel tests 🐛 test: filter cluster-wide objects asserted in ResourceVersion tests to exclude objects of parallel tests May 6, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 6, 2024
@chrischdi
Copy link
Member Author

/assign sbueringer fabriziopandini

@chrischdi
Copy link
Member Author

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
Test name Commit Details Required Rerun command
pull-cluster-api-apidiff-main 4dbe1f7 link false /test pull-cluster-api-apidiff-main
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

This is fine, the changed function signature is explicitly marked not as stable, because of only being used in e2e tests.

/override pull-cluster-api-apidiff-main

@k8s-ci-robot
Copy link
Contributor

@chrischdi: chrischdi unauthorized: /override is restricted to Repo administrators.

In response to this:

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
Test name Commit Details Required Rerun command
pull-cluster-api-apidiff-main 4dbe1f7 link false /test pull-cluster-api-apidiff-main
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

This is fine, the changed function signature is explicitly marked not as stable, because of only being used in e2e tests.

/override pull-cluster-api-apidiff-main

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

@chrischdi: chrischdi unauthorized: /override is restricted to Repo administrators.

Let's ignore this, its optional anyway :D

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one nit (naming 😄 )

// to filter out cluster-wide objects which don't have the clusterName in their
// object name. This avoids assertions on objects which are part of in-parallel
// running tests like ExtensionConfig.
func GetOwnerGraphFilterByClusterNameFunc(clusterName string) func(u unstructured.Unstructured) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:
SkipClusterObjectsWithoutNamePrefix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkipClusterObjectsWithoutNameContains

Because we don't match prefix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkipClusterObjectsWithNameFilter

@chrischdi chrischdi added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 7, 2024
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member Author

Note: #10530

was cherry-picked to v1.7 so we should fix at least the resourceVersion test in v1.7 too.

I suggest for v1.7 to just filter inside the new function to not change the function signatures there.

@chrischdi chrischdi force-pushed the pr-e2e-fix-resourceversion-tests branch from fc5e8e6 to e183cb2 Compare May 7, 2024 08:48
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

cmd/clusterctl/client/cluster/ownergraph.go Outdated Show resolved Hide resolved
test/e2e/cluster_upgrade_runtimesdk.go Show resolved Hide resolved
test/e2e/cluster_upgrade_runtimesdk.go Outdated Show resolved Hide resolved
test/framework/ownerreference_helpers.go Outdated Show resolved Hide resolved
test/framework/ownerreference_helpers.go Outdated Show resolved Hide resolved
test/framework/ownerreference_helpers.go Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member

I suggest for v1.7 to just filter inside the new function to not change the function signatures there.

WDYT about checking via cs.k8s.io if someone even uses those funcs? Because I think if they don't, let's just keep it simple and backport this PR (CAPV can easily be adjusted)

@sbueringer
Copy link
Member

sbueringer commented May 7, 2024

Just checked. Only CAPI & CAPV. I would simply cherry-pick. We also have a disclaimer for our test packages regarding changes and it seems very unlikely that anyone is affected anyway. (+ it's easy to just pass in nil to get the previous behavior)

@chrischdi
Copy link
Member Author

/cherry-pick release-1.7

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.7 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-apidiff-main 73cbfb8 link false /test pull-cluster-api-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sbueringer
Copy link
Member

Thx

/lgtm

/assign @fabriziopandini

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 998456d3d9cad07d1854d8eea2a5f768465a9a6b

@fabriziopandini
Copy link
Member

/lgtm
/approve

great job in catching this race condition while running E2E tests!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2024
@k8s-ci-robot k8s-ci-robot merged commit 99866da into kubernetes-sigs:main May 7, 2024
20 of 21 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.8 milestone May 7, 2024
@k8s-infra-cherrypick-robot

@chrischdi: new pull request created: #10570

In response to this:

/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sbueringer sbueringer changed the title 🐛 test: filter cluster-wide objects asserted in ResourceVersion tests to exclude objects of parallel tests 🐛 filter cluster-wide objects asserted in ResourceVersion tests to exclude objects of parallel tests Jul 19, 2024
@sbueringer sbueringer changed the title 🐛 filter cluster-wide objects asserted in ResourceVersion tests to exclude objects of parallel tests 🐛 Filter cluster-wide objects asserted in ResourceVersion tests to exclude objects of parallel tests Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants