Merge summaries #1

tonyhutter · 2024-06-20T22:59:35Z

Just testing, please ignore

mcmilk · 2024-06-21T04:57:34Z

Thank you a lot, I will use it.

mcmilk · 2024-06-21T17:44:54Z

The FresBSD images have the repository now: https://github.com/mcmilk/openzfs-freebsd-images/releases

mcmilk · 2024-06-25T20:08:27Z

FreeBSD 13 has problems with the virtio nic. Just use the e1000 nic, like I gave done it here: https://github.com/mcmilk/zfs/tree/qemu-machines2

mcmilk · 2024-06-25T20:15:35Z

FreeBSD 13 has problems with the virtio nic. Just use the e1000 nic, like I gave done it here: https://github.com/mcmilk/zfs/tree/qemu-machines2

Ah, some other problem :(

The timezone "US/Mountain" isn't supported on newer linux versions. Using the correct timezone "America/Denver" like it's done in FreeBSD will fix this. Older Linux distros should behave also okay with this. Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: George Melikov <[email protected]>

This test was failing before: - FAIL cli_root/zfs_copies/zfs_copies_006_pos (expected PASS) Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: George Melikov <[email protected]>

This includes the last 12.x release (now EOL) and 13.0 development versions (<1300139). Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>

tonyhutter · 2024-08-06T00:47:02Z

@mcmilk the zpool_status_008_pos failures are just timing. This fixes it:

diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_008_pos.ksh b/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_008_pos.ksh
index 6be2ad5a7..70f480cbb 100755
--- a/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_008_pos.ksh
+++ b/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_008_pos.ksh
@@ -69,12 +69,12 @@ for raid_type in "draid2:3d:6c:1s" "raidz2"; do
        log_mustnot eval "zpool status -e $TESTPOOL2 | grep ONLINE"
 
        # Check no ONLINE slow vdevs are show.  Then mark IOs greater than
-       # 10ms slow, delay IOs 20ms to vdev6, check slow IOs.
+       # 40ms slow, delay IOs 80ms to vdev6, check slow IOs.
        log_must check_vdev_state $TESTPOOL2 $TESTDIR/vdev6 "ONLINE"
        log_mustnot eval "zpool status -es $TESTPOOL2 | grep ONLINE"
 
-       log_must set_tunable64 ZIO_SLOW_IO_MS 10
-       log_must zinject -d $TESTDIR/vdev6 -D20:100 $TESTPOOL2
+       log_must set_tunable64 ZIO_SLOW_IO_MS 40
+       log_must zinject -d $TESTDIR/vdev6 -D80:100 $TESTPOOL2
        log_must mkfile 1048576 /$TESTPOOL2/testfile
        sync_pool $TESTPOOL2
        log_must set_tunable64 ZIO_SLOW_IO_MS $OLD_SLOW_IO

I'm still trying to figure out why raidz_expand_001_pos.ksh is reporting errors. That seems to be the last test that is failing on QEMU.

This commit adds functional tests for these systems: - AlmaLinux 8, AlmaLinux 9 - ArchLinux - CentOS Stream 9 - Fedora 39, Fedora 40 - Debian 11, Debian 12 - FreeBSD 13, FreeBSD 14, FreeBSD 15 - Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04 Workflow for each operating system: - install QEMU on the github runner - download current cloud image - start and init that image via cloud-init - install deps and poweroff system - start system and build openzfs and then poweroff again - clone the system and start qemu workers for parallel testings - do the functional testings, hopefully < 3h Signed-off-by: Tino Reichardt <[email protected]> Signed-off-by: Tony Hutter <[email protected]>

mcmilk · 2024-08-06T17:53:51Z

20240806T173422/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [01:15] [PASS]
20240806T173539/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [00:53] [PASS]
20240806T173633/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [01:19] [PASS]
20240806T173755/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [03:14] [PASS]
20240806T174243/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [02:19] [PASS]
20240806T174503/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [00:55] [PASS]
[root@vm1 test_results]# grep -r ".*raidz_expand_001_pos.ksh.*\[FAIL\]$"|grep result
20240806T170150/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [02:55] [FAIL]
20240806T170812/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [02:57] [FAIL]
20240806T172246/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [03:04] [FAIL]
20240806T173017/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [03:02] [FAIL]
20240806T174111/results:Test: /usr/share/zfs/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh (run as root) [01:29] [FAIL]
[root@vm1 test_results]# grep -r ".*raidz_expand_001_pos.ksh.*\[FAIL\]$"|grep result |wc -l
5
[root@vm1 test_results]# grep -r ".*raidz_expand_001_pos.ksh.*\[PASS\]$"|grep result |wc -l
40

The error comes from the command zpool scrub -w testpool within the ksh function test_scrub:

SUCCESS: zpool import -o cachefile=none -d /var/tmp testpool
SUCCESS: zpool scrub -w testpool
SUCCESS: zpool clear testpool
SUCCESS: zpool export testpool
124+0 records in
124+0 records out
130023424 bytes (130 MB, 124 MiB) copied, 0.295698 s, 440 MB/s
124+0 records in
124+0 records out
130023424 bytes (130 MB, 124 MiB) copied, 0.266317 s, 488 MB/s
124+0 records in
124+0 records out
130023424 bytes (130 MB, 124 MiB) copied, 0.198369 s, 655 MB/s
SUCCESS: zpool import -o cachefile=none -d /var/tmp testpool
SUCCESS: zpool scrub -w testpool
ERROR: check_pool_status testpool errors No known data errors exited 1
NOTE: Performing test-fail callback (/usr/share/zfs/zfs-tests/callbacks/zfs_dbgmsg.ksh)

I think. the scrub -w wants start scrubbing on a pool, which does this already.

Option 1: check the status first, and if already some a scrub started, than just wait for it
Option 2: always stop scrubbing via zpool scrub -s and ignore the exit status of it

What would you prefer?

tonyhutter · 2024-08-06T18:00:29Z

@mcmilk I'm currently testing with this:

diff --git a/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh b/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh
index 063d7fa73..167f39cfc 100755
--- a/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh
+++ b/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh
@@ -153,8 +153,12 @@ function test_scrub # <pool> <parity> <dir>
        done
 
        log_must zpool import -o cachefile=none -d $dir $pool
+       if is_pool_scrubbing $pool ; then
+               wait_scrubbed $pool
+       fi
 
        log_must zpool scrub -w $pool
+
        log_must zpool clear $pool
        log_must zpool export $pool
 
@@ -165,7 +169,9 @@ function test_scrub # <pool> <parity> <dir>
        done
 
        log_must zpool import -o cachefile=none -d $dir $pool
-
+       if is_pool_scrubbing $pool ; then
+               wait_scrubbed $pool
+       fi
        log_must zpool scrub -w $pool
 
        log_must check_pool_status $pool "errors" "No known data errors"
diff --git a/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh b/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh
index 004f3d1f9..e416926d1 100755
--- a/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh
+++ b/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh
@@ -105,6 +105,10 @@ for disk in ${disks[$(($nparity+2))..$devs]}; do
                log_fail "pool $pool not expanded"
        fi
 
+       # It's possible the pool could be auto scrubbing here.  If so, wait.
+       if is_pool_scrubbing $pool ; then
+               wait_scrubbed $pool
+       fi
        verify_pool $pool
 
        pool_size=$expand_size

I think that might help some of the raidz_expand_001_pos failures, but not eliminate it completely. There may actually be a legitimate problem that's just being exposed but I'm not sure yet. I want to get some more test runs done to see.

Also, I tweaked my commit a little to add more time in zpool_status_008_pos and fix a rare timing bug in crtime_001_pos.

mcmilk · 2024-08-06T18:25:07Z

@mcmilk I'm currently testing with this:

diff --git a/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh b/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh
index 063d7fa73..167f39cfc 100755
--- a/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh
+++ b/tests/zfs-tests/tests/functional/raidz/raidz_expand_001_pos.ksh
@@ -153,8 +153,12 @@ function test_scrub # <pool> <parity> <dir>
        done
 
        log_must zpool import -o cachefile=none -d $dir $pool
+       if is_pool_scrubbing $pool ; then
+               wait_scrubbed $pool
+       fi
 
        log_must zpool scrub -w $pool
+
        log_must zpool clear $pool
        log_must zpool export $pool
 
@@ -165,7 +169,9 @@ function test_scrub # <pool> <parity> <dir>
        done
 
        log_must zpool import -o cachefile=none -d $dir $pool
-
+       if is_pool_scrubbing $pool ; then
+               wait_scrubbed $pool
+       fi
        log_must zpool scrub -w $pool
 
        log_must check_pool_status $pool "errors" "No known data errors"
diff --git a/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh b/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh
index 004f3d1f9..e416926d1 100755
--- a/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh
+++ b/tests/zfs-tests/tests/functional/raidz/raidz_expand_002_pos.ksh
@@ -105,6 +105,10 @@ for disk in ${disks[$(($nparity+2))..$devs]}; do
                log_fail "pool $pool not expanded"
        fi
 
+       # It's possible the pool could be auto scrubbing here.  If so, wait.
+       if is_pool_scrubbing $pool ; then
+               wait_scrubbed $pool
+       fi
        verify_pool $pool
 
        pool_size=$expand_size

I think that might help some of the raidz_expand_001_pos failures, but not eliminate it completely. There may actually be a legitimate problem that's just being exposed but I'm not sure yet. I want to get some more test runs done to see.

Also, I tweaked my commit a little to add more time in zpool_status_008_pos and fix a rare timing bug in crtime_001_pos.

I will test run this with a for loop, I think 50 times should be a good start.
Only these two via -t raidz_expand_001_pos + -t raidz_expand_002_pos

It's running with -I 55 option:
https://github.com/mcmilk/zfs/actions/runs/10272187541

mcmilk · 2024-08-06T19:33:46Z

Hm, the raidz_expand_001_pos tests are failing, even if the wait for scrub thing :(

I used this is_pool_scrubbing $pool && wait_scrubbed $pool
Diff is here: openzfs@02a14e7

mcmilk · 2024-08-06T19:57:23Z

Almalinux 8 ~~and Ubuntu 20.04~~ are fine.
Maybe it's bclone related - cause this is disabled on the older kernels?

mcmilk · 2024-08-12T07:12:39Z

Some points to the raidz_expand_001_pos testing problem:

I limited the code to run only on scalar speed (exclude possible assembly failures)
FreeBSD 13/14 does not have this issue et all, all tests run fine at around 2m 30s
FreeBSD 15 does not have this issue et all, all tests run fine at around 4m
on Linux 5.4: timings are around 3m and the faling test rate is: 1/120
on Linux 6.x: timings are around 5m 30s and the faling test rate is: 1/2 maybe 1/3

Special test run with only raidz_expand_001_pos: https://github.com/mcmilk/zfs/actions/runs/10346648527

So the raid code is maybe okay... but some spl thing?
Should we run against zfs-2.2.5 ??

tonyhutter force-pushed the qemu2 branch from 7d27054 to a17e50f Compare June 21, 2024 17:34

tonyhutter force-pushed the qemu2 branch 9 times, most recently from 80640e1 to aa4953b Compare June 21, 2024 23:46

tonyhutter force-pushed the qemu branch from 704ae68 to b21f6b7 Compare June 24, 2024 22:25

tonyhutter force-pushed the qemu2 branch from aa4953b to 63d277a Compare June 24, 2024 22:25

tonyhutter force-pushed the qemu branch from b21f6b7 to e4f4605 Compare June 25, 2024 18:11

tonyhutter force-pushed the qemu2 branch 10 times, most recently from 88ef792 to a401588 Compare June 25, 2024 20:00

tonyhutter force-pushed the qemu2 branch 3 times, most recently from 136cf60 to 73f5c57 Compare June 25, 2024 20:42

tonyhutter force-pushed the qemu2 branch 4 times, most recently from 3c65b4d to 2d028b0 Compare August 5, 2024 22:44

mcmilk and others added 3 commits August 5, 2024 16:17

tonyhutter force-pushed the qemu2 branch from cd7e1fd to 4662de7 Compare August 6, 2024 00:05

tonyhutter force-pushed the qemu2 branch 2 times, most recently from e9cbf8b to ba24656 Compare August 6, 2024 16:49

tonyhutter force-pushed the qemu2 branch from ba24656 to a3a2282 Compare August 6, 2024 17:05

tonyhutter force-pushed the qemu2 branch 9 times, most recently from a0063d0 to 836a672 Compare August 8, 2024 18:07

Testing raidz_expand_001_pos failure

b4a5b64

tonyhutter force-pushed the qemu2 branch from 836a672 to b4a5b64 Compare August 8, 2024 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge summaries #1

Merge summaries #1

tonyhutter commented Jun 20, 2024

mcmilk commented Jun 21, 2024

mcmilk commented Jun 21, 2024

mcmilk commented Jun 25, 2024

mcmilk commented Jun 25, 2024

tonyhutter commented Aug 6, 2024

mcmilk commented Aug 6, 2024 •

edited

Loading

tonyhutter commented Aug 6, 2024

mcmilk commented Aug 6, 2024 •

edited

Loading

mcmilk commented Aug 6, 2024

mcmilk commented Aug 6, 2024 •

edited

Loading

mcmilk commented Aug 12, 2024 •

edited

Loading

Merge summaries #1

Are you sure you want to change the base?

Merge summaries #1

Conversation

tonyhutter commented Jun 20, 2024

mcmilk commented Jun 21, 2024

mcmilk commented Jun 21, 2024

mcmilk commented Jun 25, 2024

mcmilk commented Jun 25, 2024

tonyhutter commented Aug 6, 2024

mcmilk commented Aug 6, 2024 • edited Loading

tonyhutter commented Aug 6, 2024

mcmilk commented Aug 6, 2024 • edited Loading

mcmilk commented Aug 6, 2024

mcmilk commented Aug 6, 2024 • edited Loading

mcmilk commented Aug 12, 2024 • edited Loading

mcmilk commented Aug 6, 2024 •

edited

Loading

mcmilk commented Aug 6, 2024 •

edited

Loading

mcmilk commented Aug 6, 2024 •

edited

Loading

mcmilk commented Aug 12, 2024 •

edited

Loading