Split functional testings via github action matrix #14078

mcmilk · 2022-10-23T19:26:58Z

Motivation and Context

The functional testimgs are timing out more and more ... the max. time for execution is 360m ... this seems to be reached :(

Description

This commit changes the workflow of the github actions.

We split the workflow into different parts:

build zfs modules for Ubuntu 20.04 and 22.04 (~25m)
2x zloop test (~10m) + 2x sanity test (~25m)
functional testings in parts 1..5 (each ~1h)
- these could be triggered, when sanity tests are ok
- currently I just start them all in the same time
cleanup and create summary

When everything is fine, the full run with all testings
should be done in around 2 hours.

The codeql.yml and checkstyle.yml are not part in this circle.

The testings are also modfied a bit:

report info about CPU and checksum benchmarks
reset the debugging logs for each test
- when some error occurred, we call dmesg with -c to get
  only the log output for the last failed test
- we empty also the dbgsys

The testings are done this way

flowchart TB
subgraph CleanUp and Summary
  Part1-20.04-->CleanUp+nice+Summary
  Part2-20.04-->CleanUp+nice+Summary
  PartN-20.04-->CleanUp+nice+Summary
  Part1-22.04-->CleanUp+nice+Summary
  Part2-22.04-->CleanUp+nice+Summary
  PartN-22.04-->CleanUp+nice+Summary
end

subgraph Testings Ubuntu 20.04
  sanity-checks-20.04
  functional-testing-20.04-->Part1-20.04
  functional-testing-20.04-->Part2-20.04
  functional-testing-20.04-->PartN-20.04
  zloop-checks-20.04
end

subgraph Testings Ubuntu 22.04
  sanity-checks-22.04
  functional-testing-22.04-->Part1-22.04
  functional-testing-22.04-->Part2-22.04
  functional-testing-22.04-->PartN-22.04
  zloop-checks-22.04
end

subgraph Code Checking + Building
  codeql.yml
  Build-Ubuntu-20.04-->sanity-checks-20.04
  Build-Ubuntu-22.04-->sanity-checks-22.04
  Build-Ubuntu-20.04-->zloop-checks-20.04
  Build-Ubuntu-22.04-->zloop-checks-22.04
  checkstyle.yml
end

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

.github/workflows/zfs-build.yaml

.github/workflows/test-dependencies.txt

gmelikov · 2022-10-24T10:20:45Z

1h 49m looks like a great result for full test suite run even as a start!

mcmilk · 2022-10-24T10:34:09Z

1h 49m looks like a great result for full test suite run even as a start!

Yes ;-)

I have removed the Ubuntu 22.04 sanity and functional tests, so this PR could maybe merged a bit faster.
The commit message should also be fine now and also some ZFS module cleanup was added as last step.

.github/workflows/build-dependencies.txt

gmelikov · 2022-10-24T10:45:52Z

Overall thoughts:

speed gain is really a good reason to use it, and we won't get into timeouts in the foreseeable future
one workflow file makes it harder to read manifest, but we can impove it in future by moving common parts in scripts
many additional checks in PR's status worsens its readability, but looks like there is no other way to parallelize now https://github.com/orgs/community/discussions/26246

andrewc12 · 2022-10-25T10:32:19Z

I like this idea of splitting it up like this in a matrix. tried it myself just didn't end up writing a script. :)

mcmilk · 2022-10-25T12:28:08Z

I like this idea of splitting it up like this in a matrix. tried it myself just didn't end up writing a script. :)

We should use this for MacOS and Windows as well... when you have problems with this yaml stuff, I could help a bit.

andrewc12 · 2022-10-25T21:30:56Z

We should use this for MacOS and Windows as well... when you have problems with this yaml stuff, I could help a bit.

Heh, I'm usually fine with that stuff, but honestly if you think you could help with anything I'd be glad to take it.

behlendorf · 2022-10-27T00:28:46Z

we can exclude sanity tests group from full runners after all

I agree, since the sanity tests are just a subset of the full test run there's not much value in running them, unless they're used to gate a full test run.

If we parallelize the tests like this we could also split them up in some meaningful way. Perhaps "cli tests" (cli_root, cli_user), "feature tests" (initialize, trim, compression, fallocate, quota, send/recv, snapshots, checkpoints, etc), and "resilience tests" (raidz, redundancy, removal, replacement, slog, online_offline, scrub_mirror, etc). Or some other relatively equal split. We'd just need to settle on the group names and add them to the tags sections of the run file.

My inclination would be to try and split it only 3 or 4 ways to cut down a bit on the number of status reports and try and keep it as readable as possible.

mcmilk · 2022-10-27T05:39:26Z

we can exclude sanity tests group from full runners after all

I agree, since the sanity tests are just a subset of the full test run there's not much value in running them, unless they're used to gate a full test run.

Then I will remove it.

If we parallelize the tests like this we could also split them up in some meaningful way. Perhaps "cli tests" (cli_root, cli_user), "feature tests" (initialize, trim, compression, fallocate, quota, send/recv, snapshots, checkpoints, etc), and "resilience tests" (raidz, redundancy, removal, replacement, slog, online_offline, scrub_mirror, etc). Or some other relatively equal split. We'd just need to settle on the group names and add them to the tags sections of the run file.

Okay, i will try that ;)

My inclination would be to try and split it only 3 or 4 ways to cut down a bit on the number of status reports and try and keep it as readable as possible.

We could also collect each summary of the tags and create a big one in the last step.

behlendorf · 2022-10-27T17:04:14Z

We could also collect each summary of the tags and create a big one in the last step.

That would be pretty cool. Personally what I've always found to be the most useful as a starting point when a test run fails in the buildbot summary page. It attempts to grab the meaningful log sections for FAILED and KILLED tests so they're easier to pull up an spot check. That's handled by just a little bit of shell here.

https://github.com/openzfs/zfs-buildbot/blob/master/scripts/bb-test-zfstests.sh#L33-L38

mcmilk · 2022-10-30T11:06:39Z

If we parallelize the tests like this we could also split them up in some meaningful way. Perhaps "cli tests" (cli_root, cli_user), "feature tests" (initialize, trim, compression, fallocate, quota, send/recv, snapshots, checkpoints, etc), and "resilience tests" (raidz, redundancy, removal, replacement, slog, online_offline, scrub_mirror, etc). Or some other relatively equal split. We'd just need to settle on the group names and add them to the tags sections of the run file.

I created a helper script for modifying the common.run file: https://gist.github.com/mcmilk/acc54a671ad2164ce98a0996a2ad6562

@behlendorf - maybe you want to have some changes on this initial grouping ?

gmelikov · 2022-10-30T11:11:15Z

If we go into hardcoding tags, then we may want to at least find non-described tags and run it in one of parts

.github/workflows/README.md

.github/workflows/scripts/generate-summary.sh

.github/workflows/build-dependencies.txt

.github/workflows/scripts/generate-summary.sh

mcmilk · 2023-03-13T11:06:25Z

Okay, the main thing is done now.

Sometimes the functional testings (part3) will fail. This seems unrelated to this functional tests splitting.
Also the old workflows seem to hang sometimes on the slog tests. The test slog_014_pos (run as root) [00:21] seems fine, and the next one is timing out ....

I splitted the "Cleanup and summary" into 5 calls -because we have a limit of 1 MiB per step which can be put into the summary. The first call does the summary things and the next 4 calls will print debugging details (dmesg, dbgmsg) of the failed tests.

This commit changes the workflow of the github actions. We split the workflow into different parts: 1) build zfs modules for Ubuntu 20.04 and 22.04 (~25m) 2) 2x zloop test (~10m) + 2x sanity test (~25m) 3) functional testings in parts 1..5 (each ~1h) - these could be triggered, when sanity tests are ok - currently I just start them all in the same time 4) cleanup and create summary When everything is fine, the full run with all testings should be done in around 2 hours. The codeql.yml and checkstyle.yml are not part in this circle. The testings are also modfied a bit: - report info about CPU and checksum benchmarks - reset the debugging logs for each test - when some error occurred, we call dmesg with -c to get only the log output for the last failed test - we empty also the dbgsys Signed-off-by: Tino Reichardt <[email protected]>

mcmilk · 2023-03-14T14:20:14Z

This is request for review now?

We went down from >4h to 1h 45m ;-)
We also have a sumary in the end and some first debugging can be done with the debug files in the end. These files are truncated to ~1MB ... but this should be okay for most cases.

behlendorf

Having a summary page where we can easily see the logs from tests which failed is real quality of life improvement! As are the faster test run times. I'm good with this, let's go ahead and merge this and refine as needed. Thanks.

gmelikov

LGTM, let's try and use it! Great work @mcmilk , thank you!

gmelikov · 2023-03-15T09:48:08Z

.github/workflows/zfs-linux-tests.yml

+          /var/tmp/zloop/*/vdev/
+        if-no-files-found: ignore
+
+  sanity:


IIRC we discussed that now we may not even need sanity tests, but this PR is already to good to be true (thanks @mcmilk ) so let it be

Yes I lknow, I should remove them. But it's an first most things seem to work or do not work.
When some test within sanity will fail ... then the errors are also listed like in the functional testings.
But in good pull requests, this should never happen. So when it's happening, it's a first sth. is not okay here info.

gmelikov · 2023-03-15T09:50:07Z

.github/workflows/zfs-linux.yml

+      with:
+        name: Summary Files
+        path: Summary/
+    - uses: geekyeggo/delete-artifact@v2


We may leave artifacts, it may be convenient for someone to download already built debug build-release, but just loud thoughts, never mind for now.

Yes, the generated modules will be deleted, but all logs are saved now.
Currently the logs are only saved, when sth. failed .... this is changed to we log always.

behlendorf · 2023-03-15T17:41:33Z

Merged. Let's see how it goes and refine as needed. Thanks!

mcmilk · 2023-03-15T19:08:38Z

There is an error, when the images rotate: https://github.com/actions/runner-images#available-images

So currently some runners have 20230313, and some have 20230305 .... I checked this via the file tests/ImageOS.txt ... and the module should compile if this applies. But of cause, we didn't checkout the current release with git ... I need to add this.

I will create a pull request for it... sorry, but this issue could not be testet, cause all runners had the same version ...

gmelikov · 2023-03-15T19:27:25Z

@mcmilk that was fast to get into this corner case! Glad we can just rerun whole pipeline

mcmilk · 2023-03-15T19:30:57Z

The simple fix for this, is to add the repo checkout on each functional testing ... this could be replace by one single line within the script ... but this needs to be tested a bit more then.

This commit changes the workflow of the github actions. We split the workflow into different parts: 1) build zfs modules for Ubuntu 20.04 and 22.04 (~25m) 2) 2x zloop test (~10m) + 2x sanity test (~25m) 3) functional testings in parts 1..5 (each ~1h) - these could be triggered, when sanity tests are ok - currently I just start them all in the same time 4) cleanup and create summary When everything is fine, the full run with all testings should be done in around 2 hours. The codeql.yml and checkstyle.yml are not part in this circle. The testings are also modified a bit: - report info about CPU and checksum benchmarks - reset the debugging logs for each test - when some error occurred, we call dmesg with -c to get only the log output for the last failed test - we empty also the dbgsys Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes openzfs#14078

gmelikov reviewed Oct 23, 2022

View reviewed changes

.github/workflows/zfs-build.yaml Outdated Show resolved Hide resolved

gmelikov reviewed Oct 23, 2022

View reviewed changes

.github/workflows/test-dependencies.txt Outdated Show resolved Hide resolved

mcmilk force-pushed the split-actions branch 5 times, most recently from 44580e1 to 00a75d1 Compare October 24, 2022 06:33

mcmilk force-pushed the split-actions branch from 00a75d1 to 4b66402 Compare October 24, 2022 10:31

gmelikov reviewed Oct 24, 2022

View reviewed changes

.github/workflows/build-dependencies.txt Outdated Show resolved Hide resolved

gmelikov reviewed Oct 24, 2022

View reviewed changes

.github/workflows/build-dependencies.txt Outdated Show resolved Hide resolved

mcmilk force-pushed the split-actions branch from 4b66402 to 237f3c7 Compare October 24, 2022 10:40

mcmilk mentioned this pull request Oct 24, 2022

Update github tests-functional-ubuntu actions #14033

Closed

14 tasks

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 27, 2022

mcmilk force-pushed the split-actions branch 6 times, most recently from 2850c9b to 51775d7 Compare March 3, 2023 20:20

mcmilk force-pushed the split-actions branch 3 times, most recently from 055230c to 4e88c94 Compare March 9, 2023 17:00

behlendorf reviewed Mar 9, 2023

View reviewed changes

behlendorf reviewed Mar 10, 2023

View reviewed changes

.github/workflows/scripts/generate-summary.sh Outdated Show resolved Hide resolved

mcmilk force-pushed the split-actions branch 7 times, most recently from 476fe47 to 54b7072 Compare March 13, 2023 10:55

mcmilk force-pushed the split-actions branch from 54b7072 to 8c84396 Compare March 14, 2023 10:54

behlendorf approved these changes Mar 14, 2023

View reviewed changes

behlendorf requested a review from gmelikov March 14, 2023 21:25

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 14, 2023

gmelikov approved these changes Mar 15, 2023

View reviewed changes

behlendorf merged commit b7bc334 into openzfs:master Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split functional testings via github action matrix #14078

Split functional testings via github action matrix #14078

mcmilk commented Oct 23, 2022 •

edited

Loading

gmelikov commented Oct 24, 2022

mcmilk commented Oct 24, 2022

gmelikov commented Oct 24, 2022

andrewc12 commented Oct 25, 2022

mcmilk commented Oct 25, 2022

andrewc12 commented Oct 25, 2022

behlendorf commented Oct 27, 2022

mcmilk commented Oct 27, 2022

behlendorf commented Oct 27, 2022

mcmilk commented Oct 30, 2022

gmelikov commented Oct 30, 2022

mcmilk commented Mar 13, 2023

mcmilk commented Mar 14, 2023

behlendorf left a comment

gmelikov left a comment

gmelikov Mar 15, 2023

mcmilk Mar 15, 2023 •

edited

Loading

gmelikov Mar 15, 2023

mcmilk Mar 15, 2023

behlendorf commented Mar 15, 2023

mcmilk commented Mar 15, 2023

gmelikov commented Mar 15, 2023

mcmilk commented Mar 15, 2023 •

edited

Loading

Split functional testings via github action matrix #14078

Split functional testings via github action matrix #14078

Conversation

mcmilk commented Oct 23, 2022 • edited Loading

Motivation and Context

Description

The testings are done this way

Types of changes

Checklist:

gmelikov commented Oct 24, 2022

mcmilk commented Oct 24, 2022

gmelikov commented Oct 24, 2022

andrewc12 commented Oct 25, 2022

mcmilk commented Oct 25, 2022

andrewc12 commented Oct 25, 2022

behlendorf commented Oct 27, 2022

mcmilk commented Oct 27, 2022

behlendorf commented Oct 27, 2022

mcmilk commented Oct 30, 2022

gmelikov commented Oct 30, 2022

mcmilk commented Mar 13, 2023

mcmilk commented Mar 14, 2023

behlendorf left a comment

Choose a reason for hiding this comment

gmelikov left a comment

Choose a reason for hiding this comment

gmelikov Mar 15, 2023

Choose a reason for hiding this comment

mcmilk Mar 15, 2023 • edited Loading

Choose a reason for hiding this comment

gmelikov Mar 15, 2023

Choose a reason for hiding this comment

mcmilk Mar 15, 2023

Choose a reason for hiding this comment

behlendorf commented Mar 15, 2023

mcmilk commented Mar 15, 2023

gmelikov commented Mar 15, 2023

mcmilk commented Mar 15, 2023 • edited Loading

mcmilk commented Oct 23, 2022 •

edited

Loading

mcmilk Mar 15, 2023 •

edited

Loading

mcmilk commented Mar 15, 2023 •

edited

Loading