Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPVE-619: Refactor E2E Tests to be context aware #378

Merged

Conversation

jakobmoellerdev
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev commented Aug 8, 2023

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

With the given guide and tests, I can get the entire snapshot test suite to run locally in about 10 minutes Right now. The rest of the latency fixes has to come from a follow-up PR that investigates minDeviceAge (opened up OCPVE-622 for this). After some testing, I believe we could get the test runtime down to sub 5 minutes with some small optimizations.

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:

  • Bumps Ginkgo CLI tool download to 2.9.5 to avoid warning of mismatch between go and cli version
  • Fixes Kubeconfig Lookup in E2E Tests as it was not returning the error in case the Kubeconfig was not found, now its displayed with fmt.Errorf
  • Fixes Klog Crashing in Tests because logger was not redirected correctly
  • Fixes Seccomp Profile on test pods which causes warnings during creation during tests in KubeAPI Warning Logger
  • Moves Testing Timeout from 15 minutes to 2 minutes
  • Moves Testing Polling Interval from 15 seconds to 3 seconds
  • Changes Deletion Pending Check interval from 1 Minute to 10 seconds
  • Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)
  • Moves HACKING.md into the docs folder and combines it with the E2E guide

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 8, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 8, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

TODO:

  • change tests for ephemeral

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 8, 2023
@openshift-ci openshift-ci bot requested review from jerpeter1 and qJkee August 8, 2023 15:57
test/e2e/validation.go Show resolved Hide resolved
test/e2e/validation.go Show resolved Hide resolved
test/e2e/validation.go Outdated Show resolved Hide resolved
test/e2e/validation.go Outdated Show resolved Hide resolved
@jakobmoellerdev jakobmoellerdev force-pushed the OCPVE-619-e2e-refactor branch 3 times, most recently from 9acc2d5 to 92477b2 Compare August 8, 2023 20:33
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 8, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 8, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

TODO:

  • change tests for ephemeral

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

TODO:

  • change tests for ephemeral

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:
Moves Testing Timeout from 15 minutes to 2 minutes
Moves Testing Polling Interval from 15 seconds to 3 seconds
Changes Deletion Pending Check interval from 1 Minute to 10 seconds

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jakobmoellerdev jakobmoellerdev force-pushed the OCPVE-619-e2e-refactor branch 2 times, most recently from 45d76ef to 36a23a4 Compare August 9, 2023 10:45
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

TODO:

  • change tests for ephemeral

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:
Moves Testing Timeout from 15 minutes to 2 minutes
Moves Testing Polling Interval from 15 seconds to 3 seconds
Changes Deletion Pending Check interval from 1 Minute to 10 seconds
Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:

  • Moves Testing Timeout from 15 minutes to 2 minutes
  • Moves Testing Polling Interval from 15 seconds to 3 seconds
  • Changes Deletion Pending Check interval from 1 Minute to 10 seconds
  • Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:

  • Fixes Klog Crashing in Tests because logger was not redirected correctly
  • Fixes Seccomp Profile on test pods which causes warnings during creation during tests in KubeAPI Warning Logger
  • Moves Testing Timeout from 15 minutes to 2 minutes
  • Moves Testing Polling Interval from 15 seconds to 3 seconds
  • Changes Deletion Pending Check interval from 1 Minute to 10 seconds
  • Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:

  • Bumps Ginkgo CLI tool download to 2.9.5 to avoid warning of mismatch between go and cli version
  • Fixes Klog Crashing in Tests because logger was not redirected correctly
  • Fixes Seccomp Profile on test pods which causes warnings during creation during tests in KubeAPI Warning Logger
  • Moves Testing Timeout from 15 minutes to 2 minutes
  • Moves Testing Polling Interval from 15 seconds to 3 seconds
  • Changes Deletion Pending Check interval from 1 Minute to 10 seconds
  • Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:

  • Bumps Ginkgo CLI tool download to 2.9.5 to avoid warning of mismatch between go and cli version
  • Fixes Kubeconfig Lookup in E2E Tests as it was not returning the error in case the Kubeconfig was not found, now its displayed with fmt.Errorf
  • Fixes Klog Crashing in Tests because logger was not redirected correctly
  • Fixes Seccomp Profile on test pods which causes warnings during creation during tests in KubeAPI Warning Logger
  • Moves Testing Timeout from 15 minutes to 2 minutes
  • Moves Testing Polling Interval from 15 seconds to 3 seconds
  • Changes Deletion Pending Check interval from 1 Minute to 10 seconds
  • Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

With the given guide and tests, I can get the entire snapshot test suite to run locally in about 10 minutes Right now. The rest of the latency fixes has to come from a follow-up PR that investigates minDeviceAge (opened up OCPVE-622 for this). After some testing, I believe we could get the test runtime down to sub 5 minutes with some small optimizations.

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:

  • Bumps Ginkgo CLI tool download to 2.9.5 to avoid warning of mismatch between go and cli version
  • Fixes Kubeconfig Lookup in E2E Tests as it was not returning the error in case the Kubeconfig was not found, now its displayed with fmt.Errorf
  • Fixes Klog Crashing in Tests because logger was not redirected correctly
  • Fixes Seccomp Profile on test pods which causes warnings during creation during tests in KubeAPI Warning Logger
  • Moves Testing Timeout from 15 minutes to 2 minutes
  • Moves Testing Polling Interval from 15 seconds to 3 seconds
  • Changes Deletion Pending Check interval from 1 Minute to 10 seconds
  • Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 9, 2023

@jakobmoellerdev: This pull request references OCPVE-619 which is a valid jira issue.

In response to this:

Also introduces flag to skip Snapshot Tests on clusters without Snapshot CRDs by default (Openshift Local).

With the given guide and tests, I can get the entire snapshot test suite to run locally in about 10 minutes Right now. The rest of the latency fixes has to come from a follow-up PR that investigates minDeviceAge (opened up OCPVE-622 for this). After some testing, I believe we could get the test runtime down to sub 5 minutes with some small optimizations.

Known Issues:
In PVC and ephemeral tests the cleanup is run as a test. This causes leftover resources when the test fails and the suite cannot cleanup properly. This would require a bigger rewrite to the tests to fix so Im leaving this out.

Also:

  • Bumps Ginkgo CLI tool download to 2.9.5 to avoid warning of mismatch between go and cli version
  • Fixes Kubeconfig Lookup in E2E Tests as it was not returning the error in case the Kubeconfig was not found, now its displayed with fmt.Errorf
  • Fixes Klog Crashing in Tests because logger was not redirected correctly
  • Fixes Seccomp Profile on test pods which causes warnings during creation during tests in KubeAPI Warning Logger
  • Moves Testing Timeout from 15 minutes to 2 minutes
  • Moves Testing Polling Interval from 15 seconds to 3 seconds
  • Changes Deletion Pending Check interval from 1 Minute to 10 seconds
  • Introduces a E2E testing documentation with OpenShift Local that is easy to follow and understand, also explains how to enable CRC containers for snapshot and cloning support (not supported out of the box, one has to deploy volume snapshot capabilities themselves)
  • Moves HACKING.md into the docs folder and combines it with the E2E guide

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jakobmoellerdev
Copy link
Contributor Author

/cc @suleymanakbas91

docs/debugging/hacking.md Outdated Show resolved Hide resolved
r.Log.Info("topolvm volume snapshot class is deleted", "VolumeSnapshotClass", vscName)
return nil
}
if runtime.IsNotRegisteredError(err) || meta.IsNoMatchError(err) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have the same check in ensureCreated as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are completely right. This didnt get catched because a failure will actually not change the LVMCluster to failed it will just log "failed to reconcile". This has to be reflected properly in the status, but that will be tackled with https://issues.redhat.com/browse/OCPVE-627 when we revisit the resource creation / status checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anyhow added the check now also to create obviously

@@ -126,82 +127,91 @@ func lvmClusterTest() {
Describe("Filesystem Type", Serial, func() {

var clusterConfig *v1alpha1.LVMCluster
ctx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few more context.Background() calls left in setupPodAndPVC and cleanupPVCAndPod functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Actually these methods were never used (thats why I didnt catch it) so i simply removed them.

@suleymanakbas91
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jakobmoellerdev, suleymanakbas91

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #378 (b831233) into main (a962b90) will increase coverage by 39.93%.
Report is 16 commits behind head on main.
The diff coverage is 60.00%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #378       +/-   ##
===========================================
+ Coverage   16.59%   56.52%   +39.93%     
===========================================
  Files          24       26        +2     
  Lines        2061     2259      +198     
===========================================
+ Hits          342     1277      +935     
+ Misses       1693      897      -796     
- Partials       26       85       +59     
Files Changed Coverage Δ
controllers/topolvm_snapshotclass.go 61.22% <14.28%> (+61.22%) ⬆️
pkg/cluster/leaderelection.go 65.21% <65.21%> (ø)
controllers/lvmcluster_controller.go 60.00% <100.00%> (+60.00%) ⬆️

... and 12 files with indirect coverage changes

@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

1 similar comment
@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD bb74044 and 2 for PR HEAD b831233 in total

@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2023

@jakobmoellerdev: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 610fa3a into openshift:main Aug 15, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants