Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807

nabijaczleweli · 2021-03-27T01:47:00Z

Motivation and Context

Confer #11769, also zed(8) and fsck.zfs(8) are the only manpages with BUGS sexions.

Description

The first patch replaces the weird waitpid(2)/nanosleep(2) loop with alarm(2), allowing zedlets to cancel or ignore the signal if they can usefully live longer than 10 seconds.

The third patch launches zedlets without blocking and starts a separate reaping thread to wait on them asynchronously.

The fourth patch effectively undoes the first one – we don't need a time-out anymore since long-running zedlets aren't going to hold up execution!

These can be squashed in 4 sensible ways and I'm not in a state to figure out which one is best, so I'll leave that decision to a maintainer.

The second patch names all the zed threads – upstream OpenZFS only supports FreeBSD and Linux, which both have pthread_setname_np(3); NetBSD has an additional void * for formatting; OpenBSD has pthread_set_name_np(3) with the same signature as FreeBSD; the only problematic(ish) platform is Darwin, whose pthread_setname_np(3) takes only the name and sets the name for the current thread. These are all trivial patches for distributions to make.

How Has This Been Tested?

I ran it a bunch in all the failure modes I could think of.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

behlendorf · 2021-03-28T22:59:13Z

The async reaping itself looks great. However, I'm concerned that after this change there wouldn't be any form of ratelimitting if a large number of events are generated. I think we need to add a limit to the maximum number of threads created at once.

nabijaczleweli · 2021-03-28T23:09:13Z

There's exactly one reaper thread (but it gets started lazily on first entry to zed_exec_process(), which is why that may've been confusing, I guess).

behlendorf · 2021-03-29T02:52:27Z

Sure, that all makes good sense. But if I'm not mistaken all of the zedlets are now run asynchronously with this change. So if say the kernel posts 1000 events then the ZED will fork and exec 1000 threads to handle them, whereas before they'd be done serially. Clearly doing them serially was never ideal, but neither is potentially kicking off an unbounded number of threads. The kernel has its own rate limiting on events to prevent this to some degree. However, there could be a backlog of events which haven't been processed yet, so it seems like a reasonable precaution to limit the number of concurrent zedlets to say 16 or 32.

nabijaczleweli · 2021-03-29T09:40:22Z

Oh! right, you meant processes, not threads. That makes sense, yeah. I'll add a child limit.

nabijaczleweli · 2021-03-29T13:25:51Z

Updated: 16 max jobs by default, tunable with -j argument

nabijaczleweli · 2021-03-29T14:54:50Z

Uhh, the FreeBSD build ran out of disk space?

behlendorf · 2021-03-29T20:00:54Z

Uhh, the FreeBSD build ran out of disk space?

Ugh, yeah. Feel free to ignore that, it's actually for FreeBSD 14 which I'll disable until we sort that out.

However, this events/zed_rc_filter ZTS failure you'll need to sort out. There are several ZED tests under events/ and they expect the zedlets to run in a specific order. That's really not the case any more with your change and the default value. The easy fix here would be to add -j1 to the zed_start ksh function in `tests/zfs-tests/include/libtest.shlib".

http://build.zfsonlinux.org/builders/Fedora%2033%20x86_64%20%28TEST%29/builds/1619/steps/shell_4/logs/summary

nabijaczleweli · 2021-03-30T22:22:55Z

Hm; I've added -j 1, as you suggested, then an explicit zed_stop before it in the test itself since some test runs failed on that. The first round all worked (well, of the ones didn't ENOSPC), but now Debian 10 appears to be failing with

16:41:31.32 NOTE: Stopping ZED
16:41:31.33 SUCCESS: zed_stop
16:41:31.34 NOTE: ZED already running
16:41:31.34 SUCCESS: zed_start

and then timing out, which makes no sense to me? How can it be killed (and zed_stop does wait for it to actually exit) and then be running again, not a second later (since pgrep -x zed returned 0), and hence not started? The Fedora 33 build worked. Dunno, don't get it. A short sleep after zed_stop and before zed_start maybe? Or is there something starting the daemon with different arguments there?

behlendorf · 2021-03-30T23:12:15Z

Or is there something starting the daemon with different arguments there?

That's may be possible. For the the CI we build the packages then, install them, and then kick off the testing. But I believe on Debian in particular daemons are started by default on installation. This different than on some others distributions, so that may be what's happening here. I can take a look if that's happening in our CI environment.

edit: It looks like the same thing may be happening on Ubuntu.

behlendorf · 2021-03-31T03:29:25Z

Can you try including this change in the PR to update the workflow to stop the ZED daemon which may be started after the package installation. This won't resolve the Debian issue but I've opened a PR against the buildbot for that we can merge if this does resolve the issue for the Ubuntu GitHub actions builders.

diff --git a/.github/workflows/zfs-tests-functional.yml b/.github/workflows/zfs>
index 631f174b7..79aa8a25b 100644
--- a/.github/workflows/zfs-tests-functional.yml
+++ b/.github/workflows/zfs-tests-functional.yml
@@ -44,6 +44,8 @@ jobs:
         sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
         sudo depmod
         sudo modprobe zfs
+        # Stop the ZED daemon so it won't conflict with the ZTS event tests
+        sudo systemctl stop zfs-zed
     - name: Tests
       run: |
         /usr/share/zfs/zfs-tests.sh -v -s 3G
diff --git a/.github/workflows/zfs-tests-sanity.yml b/.github/workflows/zfs-tes>
index e03399757..09f317142 100644
--- a/.github/workflows/zfs-tests-sanity.yml
+++ b/.github/workflows/zfs-tests-sanity.yml
@@ -40,6 +40,8 @@ jobs:
         sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
         sudo depmod
         sudo modprobe zfs
+        # Stop the ZED daemon so it won't conflict with the ZTS event tests
+        sudo systemctl stop zfs-zed
     - name: Tests
       run: |
         /usr/share/zfs/zfs-tests.sh -v -s 3G -r sanity

nabijaczleweli · 2021-03-31T14:00:15Z

I've added your patch and it still fails (not sure why, I personally thought it would, but we may never know). I've added a variation to actually just murder all ZEDs instead of flakily detecting the pidfile and log the PIDs it's killing.

behlendorf · 2021-03-31T16:22:30Z

Let's see how it goes. That's probably fine for the tests.

nabijaczleweli · 2021-04-01T11:49:36Z

FreeBSD 12 failed with filesystem full somewhere, CentOS Stream 8 dropped off, and GitHub Actions Ubuntu 20 ENOSPCed.

But on the buildds that didn't fail for unrelated reasons, it seems to have worked! As a reference, consider this log and jump to "Test (Linux): /usr/share/zfs/zfs-tests/tests/functional/events/zed_rc_filter (run as root) [00:14] [PASS]".

behlendorf · 2021-04-01T23:22:27Z

@nabijaczleweli OK. Tests are passing now and the change itself makes sense to me. I resubmitted the failed builds except for the ENOSPC Ubuntu 20 which has a separate fix to the GitHub workflow was merged for today.

That leaves us with how best to squash these commits. I'd suggesting something like this, and just squash away the alarm changes.

zed: set names for all threads
zed: use separate reaper thread and collect ZEDLETs asynchronously
zed: remove unused zed_file_close_on_exec()
zed: allow limiting concurrent jobs

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

The FIXME comment was there since the initial implementation in 2014, there are no users Signed-off-by: Ahelenia Ziemiańska <[email protected]>

200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Signed-off-by: Ahelenia Ziemiańska <[email protected]>

nabijaczleweli · 2021-04-02T00:13:01Z

Agree, that does seem best. Squashed onto current master to hopefully catch #11826 as well.

behlendorf

Looks great. I resubmitted the failed builds (which failed for unrelated reasons) and when they come back clean this should be ready to merge. Thanks for making this improvement, it feels like its been on the TODO list forever.

nabijaczleweli · 2021-04-02T19:37:34Z

Since the initial implementation in January of 2014, it seems? My pleasure.

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11807

The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11807

200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11807

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

nabijaczleweli changed the title ~~Proper async reaping (or alarm(2)-based delooping at least) for zed~~ Proper async reaping (or alarm(2)-based delooping at least) for ZED Mar 27, 2021

nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from 577b927 to 4525fb0 Compare March 27, 2021 14:16

behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 28, 2021

nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from 4525fb0 to 3abccd6 Compare March 29, 2021 13:24

nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch 2 times, most recently from 3b1ba79 to 108f9c7 Compare March 29, 2021 13:48

nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch 3 times, most recently from 2499c78 to f64422b Compare March 30, 2021 14:56

nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from 0212c14 to c7e9c97 Compare March 31, 2021 19:40

nabijaczleweli added 4 commits April 2, 2021 02:08

zed: set names for all threads

45d0c9e

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

zed: use separate reaper thread and collect ZEDLETs asynchronously

3de14ba

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

zed: remove unused zed_file_close_on_exec()

5785220

The FIXME comment was there since the initial implementation in 2014, there are no users Signed-off-by: Ahelenia Ziemiańska <[email protected]>

zed: allow limiting concurrent jobs

932077b

200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Signed-off-by: Ahelenia Ziemiańska <[email protected]>

nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from c7e9c97 to 932077b Compare April 2, 2021 00:09

behlendorf approved these changes Apr 2, 2021

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 2, 2021

nabijaczleweli mentioned this pull request Apr 2, 2021

Assorted cleanup for ZED, but mostly just the thousand-fold speed improvement #11834

Closed

13 tasks

behlendorf closed this in 3ef80ee Apr 2, 2021

behlendorf pushed a commit that referenced this pull request Apr 2, 2021

zed: use separate reaper thread and collect ZEDLETs asynchronously

ca2ce9c

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11807

adamdmoss pushed a commit to adamdmoss/zfs that referenced this pull request Apr 10, 2021

zed: set names for all threads

a5e959e

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

adamdmoss pushed a commit to adamdmoss/zfs that referenced this pull request Apr 10, 2021

zed: use separate reaper thread and collect ZEDLETs asynchronously

1079b32

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021

zed: set names for all threads

84c1652

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021

zed: use separate reaper thread and collect ZEDLETs asynchronously

9160c44

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021

zed: set names for all threads

a3d316d

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11807

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807

Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807

nabijaczleweli commented Mar 27, 2021

behlendorf commented Mar 28, 2021

nabijaczleweli commented Mar 28, 2021 •

edited

Loading

behlendorf commented Mar 29, 2021

nabijaczleweli commented Mar 29, 2021

nabijaczleweli commented Mar 29, 2021

nabijaczleweli commented Mar 29, 2021

behlendorf commented Mar 29, 2021

nabijaczleweli commented Mar 30, 2021

behlendorf commented Mar 30, 2021 •

edited

Loading

behlendorf commented Mar 31, 2021

nabijaczleweli commented Mar 31, 2021 •

edited

Loading

behlendorf commented Mar 31, 2021

nabijaczleweli commented Apr 1, 2021

behlendorf commented Apr 1, 2021

nabijaczleweli commented Apr 2, 2021

behlendorf left a comment •

edited

Loading

nabijaczleweli commented Apr 2, 2021 •

edited

Loading

Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807

Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807

Conversation

nabijaczleweli commented Mar 27, 2021

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

behlendorf commented Mar 28, 2021

nabijaczleweli commented Mar 28, 2021 • edited Loading

behlendorf commented Mar 29, 2021

nabijaczleweli commented Mar 29, 2021

nabijaczleweli commented Mar 29, 2021

nabijaczleweli commented Mar 29, 2021

behlendorf commented Mar 29, 2021

nabijaczleweli commented Mar 30, 2021

behlendorf commented Mar 30, 2021 • edited Loading

behlendorf commented Mar 31, 2021

nabijaczleweli commented Mar 31, 2021 • edited Loading

behlendorf commented Mar 31, 2021

nabijaczleweli commented Apr 1, 2021

behlendorf commented Apr 1, 2021

nabijaczleweli commented Apr 2, 2021

behlendorf left a comment • edited Loading

Choose a reason for hiding this comment

nabijaczleweli commented Apr 2, 2021 • edited Loading

nabijaczleweli commented Mar 28, 2021 •

edited

Loading

behlendorf commented Mar 30, 2021 •

edited

Loading

nabijaczleweli commented Mar 31, 2021 •

edited

Loading

behlendorf left a comment •

edited

Loading

nabijaczleweli commented Apr 2, 2021 •

edited

Loading