-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted cleanup for ZED, but mostly just the thousand-fold speed improvement #11834
Assorted cleanup for ZED, but mostly just the thousand-fold speed improvement #11834
Conversation
Uh-oh! Posted this early by accident (this includes #11807 and needs to be rebased when that's applied), soz |
fad68e4
to
1f881cb
Compare
Should be good to go, assuming the test suite doesn't rely on something weird again? |
uhh, the Fedora 33 tests hit an assert in zdb?
I didn't touch zdb, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bulk of these all make good sense, thanks for the cleanup. I've added a few comments inline with the relevant patch.
/* | ||
* Absolute path for the default zed configuration file. | ||
*/ | ||
#define ZED_CONF_FILE SYSCONFDIR "/zfs/zed.conf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is functionality we'd always intended to add and thus has stubbed out. Though clearly we haven't gotten to it. In particular, in zfs_diagnosis.c
there are two functions fmd_prop_get_int32
and fmd_prop_get_int64
which return hard coded values we thought could be set in the config file. That said, at this point since we're only talking about a couple of value it might make more sense to add a few command line options to override the defaults. So I'm OK with shedding this long dormant and partial config file support.
1f881cb
to
7ab5cb5
Compare
Hopefully addressed all review comments, rebased on |
7ab5cb5
to
706e675
Compare
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
/dev/fd on Darwin Consider the following strace output: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 Yes, that is well over a million file descriptors! This reduces the ZED start-up time from "at least a second" to "instantaneous", and, under strace, from "don't even try" to "usable" by simple virtue of doing five syscalls instead of over a million; in most cases the main loop does nothing Recent Linuxes (5.8+) have close_range(2) for this, but that's an overoptimisation (and libcs don't have wrappers for it yet) This is also run by the ZEDLET pre-exec. Compare: Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0 to Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0 lol Signed-off-by: Ahelenia Ziemiańska <[email protected]>
There simply isn't a need for one, since the flags the daemon takes are all short (mostly just toggles) and administrative in nature, and are therefore better served by the age-old tradition of sourcing an environment file and preparing the cmdline in the init-specific handler itself, if needed at all Signed-off-by: Ahelenia Ziemiańska <[email protected]>
And add a note on /why/ ZEDLETs need to be owned by root Quoth chown(2), Linux man-pages project: Only a privileged process (Linux: one with the CAP_CHOWN capability) may change the owner of a file. Quoth chown(2), FreeBSD: [EPERM] The operation would change the ownership, but the effective user ID is not the super-user. Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Same deal as zed_file_close_on_exec() Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Nobody generates them anyway, I'm pretty sure Also untag _zed_event_add_nvpair() from merge with zpool_do_events_nvprint() ‒ they serve different purposes (machine, usually script vs human consumption) and format the output differently as it stands Signed-off-by: Ahelenia Ziemiańska <[email protected]>
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the point of needing to clear it in the reaper, since it interferes with pause(2)), which is the only error zed_file_write_n() actually handled (plus, the pid write is no bigger than 12 bytes anyway) Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur Signed-off-by: Ahelenia Ziemiańska <[email protected]>
6f377cb
to
62d353c
Compare
CI failures make no sense to me (an OOM in zfs_mount_and_share(), Some™ error in zfs_destroy_dev_removal_condense and auto_spare_shared), but all appear unrelated |
All previously observed occasional unrelated failures. This is ready to merge. |
These events should currently never be generated. Also untag _zed_event_add_nvpair() from merge with zpool_do_events_nvprint() ‒ they serve different purposes (machine, usually script vs human consumption) and format the output differently as it stands Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the point of needing to clear it in the reaper, since it interferes with pause(2)), which is the only error zed_file_write_n() actually handled (plus, the pid write is no bigger than 12 bytes anyway) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
/dev/fd on Darwin Consider the following strace output: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 Yes, that is well over a million file descriptors! This reduces the ZED start-up time from "at least a second" to "instantaneous", and, under strace, from "don't even try" to "usable" by simple virtue of doing five syscalls instead of over a million; in most cases the main loop does nothing Recent Linuxes (5.8+) have close_range(2) for this, but that's an overoptimisation (and libcs don't have wrappers for it yet) This is also run by the ZEDLET pre-exec. Compare: Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0 to Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0 lol Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
There simply isn't a need for one, since the flags the daemon takes are all short (mostly just toggles) and administrative in nature, and are therefore better served by the age-old tradition of sourcing an environment file and preparing the cmdline in the init-specific handler itself, if needed at all Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
And add a note on /why/ ZEDLETs need to be owned by root Quoth chown(2), Linux man-pages project: Only a privileged process (Linux: one with the CAP_CHOWN capability) may change the owner of a file. Quoth chown(2), FreeBSD: [EPERM] The operation would change the ownership, but the effective user ID is not the super-user. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Same deal as zed_file_close_on_exec() Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
These events should currently never be generated. Also untag _zed_event_add_nvpair() from merge with zpool_do_events_nvprint() ‒ they serve different purposes (machine, usually script vs human consumption) and format the output differently as it stands Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the point of needing to clear it in the reaper, since it interferes with pause(2)), which is the only error zed_file_write_n() actually handled (plus, the pid write is no bigger than 12 bytes anyway) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
/dev/fd on Darwin Consider the following strace output: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 Yes, that is well over a million file descriptors! This reduces the ZED start-up time from "at least a second" to "instantaneous", and, under strace, from "don't even try" to "usable" by simple virtue of doing five syscalls instead of over a million; in most cases the main loop does nothing Recent Linuxes (5.8+) have close_range(2) for this, but that's an overoptimisation (and libcs don't have wrappers for it yet) This is also run by the ZEDLET pre-exec. Compare: Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0 to Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0 lol Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
There simply isn't a need for one, since the flags the daemon takes are all short (mostly just toggles) and administrative in nature, and are therefore better served by the age-old tradition of sourcing an environment file and preparing the cmdline in the init-specific handler itself, if needed at all Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
And add a note on /why/ ZEDLETs need to be owned by root Quoth chown(2), Linux man-pages project: Only a privileged process (Linux: one with the CAP_CHOWN capability) may change the owner of a file. Quoth chown(2), FreeBSD: [EPERM] The operation would change the ownership, but the effective user ID is not the super-user. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Same deal as zed_file_close_on_exec() Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
These events should currently never be generated. Also untag _zed_event_add_nvpair() from merge with zpool_do_events_nvprint() ‒ they serve different purposes (machine, usually script vs human consumption) and format the output differently as it stands Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the point of needing to clear it in the reaper, since it interferes with pause(2)), which is the only error zed_file_write_n() actually handled (plus, the pid write is no bigger than 12 bytes anyway) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
/dev/fd on Darwin Consider the following strace output: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 Yes, that is well over a million file descriptors! This reduces the ZED start-up time from "at least a second" to "instantaneous", and, under strace, from "don't even try" to "usable" by simple virtue of doing five syscalls instead of over a million; in most cases the main loop does nothing Recent Linuxes (5.8+) have close_range(2) for this, but that's an overoptimisation (and libcs don't have wrappers for it yet) This is also run by the ZEDLET pre-exec. Compare: Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0 to Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0 lol Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
/dev/fd on Darwin Consider the following strace output: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 Yes, that is well over a million file descriptors! This reduces the ZED start-up time from "at least a second" to "instantaneous", and, under strace, from "don't even try" to "usable" by simple virtue of doing five syscalls instead of over a million; in most cases the main loop does nothing Recent Linuxes (5.8+) have close_range(2) for this, but that's an overoptimisation (and libcs don't have wrappers for it yet) This is also run by the ZEDLET pre-exec. Compare: Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0 to Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0 lol Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11834
Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11834
/dev/fd on Darwin Consider the following strace output: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 Yes, that is well over a million file descriptors! This reduces the ZED start-up time from "at least a second" to "instantaneous", and, under strace, from "don't even try" to "usable" by simple virtue of doing five syscalls instead of over a million; in most cases the main loop does nothing Recent Linuxes (5.8+) have close_range(2) for this, but that's an overoptimisation (and libcs don't have wrappers for it yet) This is also run by the ZEDLET pre-exec. Compare: Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0 to Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0 lol Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#11834
Motivation and Context
See individual commits for analysis where applicable.
Description
The first patch prints out the sum times from the accounting information we get out of wait().
The second patch makes ZED start-up and ZEDLET pre-exec roughly a thousand times faster lol.
The third patch cleans out all traces of the config file that never was.
The fourth patch removes a long and confusing (also wrong) paragraph from the man-page.
The fifth patch implements the suggested
zfs_zevent_len_max
bump when we drop events.The sixth and eighth patches dump
zed_file_{read,write}_n()
.The seventh patch makes
_zed_event_add_nvpair()
slightly more similar tozpool_do_events_nvprint()
I guess, in a way?The ninth patch removes the blatant lie that the Diagnosis Engine doesn't exist. It might be prudent to link to
cmd/zed/agents/README.md
from the manpage somewhere under some key words, but I don't understand a damn thing about it (that might be because I missed Illumos by a few years), so I defer to someone who does to determine those.The tenth patch depessimises the close_from() no-/proc fallback.
How Has This Been Tested?
Dunno, just ran it a bunch.
Types of changes
Checklist:
Signed-off-by
.