Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted cleanup for ZED, but mostly just the thousand-fold speed improvement #11834

Conversation

nabijaczleweli
Copy link
Contributor

@nabijaczleweli nabijaczleweli commented Apr 2, 2021

Motivation and Context

See individual commits for analysis where applicable.

Description

The first patch prints out the sum times from the accounting information we get out of wait().

The second patch makes ZED start-up and ZEDLET pre-exec roughly a thousand times faster lol.

The third patch cleans out all traces of the config file that never was.

The fourth patch removes a long and confusing (also wrong) paragraph from the man-page.

The fifth patch implements the suggested zfs_zevent_len_max bump when we drop events.

The sixth and eighth patches dump zed_file_{read,write}_n().

The seventh patch makes _zed_event_add_nvpair() slightly more similar to zpool_do_events_nvprint() I guess, in a way?

The ninth patch removes the blatant lie that the Diagnosis Engine doesn't exist. It might be prudent to link to cmd/zed/agents/README.md from the manpage somewhere under some key words, but I don't understand a damn thing about it (that might be because I missed Illumos by a few years), so I defer to someone who does to determine those.

The tenth patch depessimises the close_from() no-/proc fallback.

How Has This Been Tested?

Dunno, just ran it a bunch.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the OpenZFS code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes. – none apply
  • I have run the ZFS Test Suite with this change applied. – CI take my hand
  • All commit messages are properly formatted and contain Signed-off-by.

@nabijaczleweli nabijaczleweli marked this pull request as draft April 2, 2021 20:31
@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Apr 2, 2021

Uh-oh! Posted this early by accident (this includes #11807 and needs to be rebased when that's applied), soz

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Apr 3, 2021
@nabijaczleweli nabijaczleweli force-pushed the about-average-gamer-skill-above-average-gamer-commentary branch 2 times, most recently from fad68e4 to 1f881cb Compare April 3, 2021 10:24
@nabijaczleweli
Copy link
Contributor Author

Should be good to go, assuming the test suite doesn't rely on something weird again?

@nabijaczleweli nabijaczleweli marked this pull request as ready for review April 3, 2021 10:42
@nabijaczleweli
Copy link
Contributor Author

uhh, the Fedora 33 tests hit an assert in zdb?

loading concrete vdev 1, metaslab 14 of 15 ...zcb->zcb_checkpoint_size == spa_get_checkpoint_space(spa) (0x169e9400 == 0xb1e4000) ASSERT at zdb.c:6006:zdb_leak_init()

I didn't touch zdb, so

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bulk of these all make good sense, thanks for the cleanup. I've added a few comments inline with the relevant patch.

cmd/zed/zed_file.c Outdated Show resolved Hide resolved
/*
* Absolute path for the default zed configuration file.
*/
#define ZED_CONF_FILE SYSCONFDIR "/zfs/zed.conf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is functionality we'd always intended to add and thus has stubbed out. Though clearly we haven't gotten to it. In particular, in zfs_diagnosis.c there are two functions fmd_prop_get_int32 and fmd_prop_get_int64 which return hard coded values we thought could be set in the config file. That said, at this point since we're only talking about a couple of value it might make more sense to add a few command line options to override the defaults. So I'm OK with shedding this long dormant and partial config file support.

man/man8/zed.8.in Show resolved Hide resolved
cmd/zed/zed_event.c Outdated Show resolved Hide resolved
cmd/zed/zed_event.c Outdated Show resolved Hide resolved
@nabijaczleweli nabijaczleweli force-pushed the about-average-gamer-skill-above-average-gamer-commentary branch from 1f881cb to 7ab5cb5 Compare April 7, 2021 09:45
@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Apr 7, 2021

Hopefully addressed all review comments, rebased on master.

@nabijaczleweli nabijaczleweli force-pushed the about-average-gamer-skill-above-average-gamer-commentary branch from 7ab5cb5 to 706e675 Compare April 7, 2021 09:57
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Same deal as zed_file_close_on_exec()

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Nobody generates them anyway, I'm pretty sure

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
@nabijaczleweli nabijaczleweli force-pushed the about-average-gamer-skill-above-average-gamer-commentary branch 2 times, most recently from 6f377cb to 62d353c Compare April 7, 2021 14:37
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Work in Progress Not yet ready for general review labels Apr 7, 2021
@nabijaczleweli
Copy link
Contributor Author

CI failures make no sense to me (an OOM in zfs_mount_and_share(), Some™ error in zfs_destroy_dev_removal_condense and auto_spare_shared), but all appear unrelated

@behlendorf
Copy link
Contributor

All previously observed occasional unrelated failures. This is ready to merge.

@behlendorf behlendorf closed this in 3bc3eef Apr 7, 2021
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Apr 10, 2021
These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Apr 10, 2021
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Apr 10, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Apr 10, 2021
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
Same deal as zed_file_close_on_exec()

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Same deal as zed_file_close_on_exec()

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
rincebrain pushed a commit to rincebrain/zfs that referenced this pull request Oct 2, 2021
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
rincebrain pushed a commit to rincebrain/zfs that referenced this pull request Oct 2, 2021
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
tonyhutter pushed a commit that referenced this pull request Nov 1, 2021
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #11834
tonyhutter pushed a commit that referenced this pull request Nov 1, 2021
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #11834
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 13, 2021
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 13, 2021
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11834
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants