Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STRMCMP-1639] Integration Test Fix #277

Merged
merged 68 commits into from
Mar 31, 2023
Merged

Conversation

sethsaperstein-lyft
Copy link
Contributor

@sethsaperstein-lyft sethsaperstein-lyft commented Mar 21, 2023

overview

Integrations tests working in CI again

Additional Info

  • Change to minikube from microk8s. Largely due to being able to reproduce the integration env locally. Iterating bugs that occurred only in CI on microk8s due to memory was inefficient.
  • Upgrade k8s to 1.20 from 1.13. Changes for v1 CRD's were introduced to support 1.22 k8s however k8s client, on 1.14, for integration tests only support v1beta1 which was deprecated in 1.22. Upgrading the client is non-trivial and will be done separately. Using older k8s versions met issues with docker and local system cgroup. 1.20 became the decided upon version.
  • Create service account, role, and role binding in CI since 1.20 has RBAC by default. Disabling rbac is hacky and resulted in issues with the kube proxy
  • Remove base mount of tmp directory due to permission issues. Flink application writes 0755. Integration test downgrades 0777 to 0755. Using minikube ssh directly to create and delete the necessary files. This creates a strong dependency on minikube which is acceptable for now.
  • Comment out test for rescale given that default github runner only has 8gb memory and 2 cpu. Rescale needs 2 flink clusters up with above average size which busts limits. Separate ticket created to address using larger runners (in beta). Flink test apps were also scaled down to parallelism 2 using 1 TM to fit on the github runner
  • Changes made to build the test app image locally as opposed to pull from dockerhub. Not in use currently as the tests are remaining on 1.8 due to simpler memory configuration until larger github runners are used. Additional changes made to upgrade flink app from 1.8 to 1.11.
  • Upgrade ubuntu for CI as ubuntu version is deprecated on april 1st 2023 by github

integ/README.md Outdated Show resolved Hide resolved
integ/README.md Outdated Show resolved Hide resolved
integ/README.md Outdated
8. Set the following for the Go test:
Package path: github.com/lyft/flinkk8soperator/integ
Env: INTEGRATION=true;OPERATOR_IMAGE=flinkk8soperator:local;RUN_DIRECT=true
Program Args: -timeout 40m -check.vv IntegTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought IntegTest is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this when attempting to target a single test. No luck there for now so leaving as is since this is what was there for the integration test prior. In the future we'll figure out how to target a single test as I'll likely write an integration test for fallback without savepoint

@sethsaperstein-lyft sethsaperstein-lyft marked this pull request as ready for review March 30, 2023 13:05
@sethsaperstein-lyft sethsaperstein-lyft changed the title [TEST] Integration Test Fix Integration Test Fix Mar 30, 2023
@sethsaperstein-lyft sethsaperstein-lyft changed the title Integration Test Fix [STRMCMP-1639] Integration Test Fix Mar 30, 2023
premsantosh
premsantosh previously approved these changes Mar 30, 2023
integ/README.md Outdated
is non-trivial due to cgroup configurations. Instead, we will use a version
that is compatible with v1beta1 CRD's which corresponds to <1.22. CRD's v1
is only available with client >=1.16, however, the client used here is 1.14
and the upgrade is non-trivial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add a TODO here with the jira to upgrade client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a ticket for the 1.22 upgrade and added

failingJobTest(s, c, "taskfailure", func() {
f, err := os.OpenFile(s.Util.CheckpointDir+"/fail", os.O_RDONLY|os.O_CREATE, 0666)
err := s.Util.ExecuteCommand("minikube", "ssh", "touch /tmp/checkpoints/fail && chmod 0644 /tmp/checkpoints/fail")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this test, you're creating and modifying privs on a file and then just asserting the commands ran without errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -163,7 +166,7 @@ func (s *IntegSuite) TestCancelledJobWithoutSavepoint(c *C) {
c.Assert(err, IsNil)

// wait a bit
time.Sleep(1 * time.Second)
time.Sleep(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have to change anything but generally speaking I am never a fan of such sleep timers in tests (just makes it flakey).
I would always promote to change this to an polling method with exponential delay (and an exit criteria). Again, don't need to do that now but just wanted to point it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will leave as a future update or at least keep in mind while writing future integration tests for this repo

log.Info("All pods torn down")
}
// TODO: https://github.com/lyft/flinkk8soperator/issues/278
//func (s *IntegSuite) TestInPlaceScaleUp(c *C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of commenting all this out I believe you can just do
c.Skip("Skipping because of blah")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added! Didn't know this existed! Thx!

leoluoInSea
leoluoInSea previously approved these changes Mar 30, 2023
@leoluoInSea
Copy link
Contributor

Tons of thanks for tackling down complicated problem!

@sethsaperstein-lyft
Copy link
Contributor Author

/PTAL @leoluoInSea

@sethsaperstein-lyft sethsaperstein-lyft merged commit d3a6fe4 into master Mar 31, 2023
@sethsaperstein-lyft sethsaperstein-lyft deleted the integration_test_fix branch March 31, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants