You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason for the intrigue re: prefetch is because some Ubuntu 22.04 hosts are running into #14516, #14120 (want #11980), which was causing the oom killer to run about and kill database processes:
root@ip-172-31-84-153:/proc/spl/kstat/zfs# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
root@ip-172-31-84-153:/proc/spl/kstat/zfs# zfs version
zfs-2.1.5-1ubuntu6~22.04.4
zfs-kmod-2.2.0-0ubuntu1~23.10.2
root@ip-172-31-84-153:/proc/spl/kstat/zfs# uname -r
6.5.0-1020-aws
Also, we think we were hitting #14686, which was fixed in #14692 (and why we're upgrading to 24.04):
But, on the 24.04 systems, we still see significant prefetch activity:
The workload is interesting because this feature store runs on top of CockroachDB. When one feature finishes, the workload moves to the next feature, and the DB needs to fault in ~100% net-new data. This appears to be triggering the prefetch activity, which leads the host to exceed its ARC max (which leads to unnecessary OOMs, which triggered the upgrade from 22.04 to 24.04).
#15214 looks promising, except we have prefetch disabled but still see similar reported behavior regarding arc_anon usage (not necessarily the pegged core, however).
But, why is prefetch activity happening in the first place? logbias=throughput or something else?
The text was updated successfully, but these errors were encountered:
zfs_prefetch_disable=1 disables only speculative prefetches, based on tracked activity. Some prefetches ZFS executes internally, based on hard-coded logic. Many of those are accounted as prescient prefetch, that means ZFS reliably knows that data will be needed soon, but it some cases prefetches can be predictive. IIRC marking prefetches as prescient could take some love. If you have a workload when predictive or especially prescient prefetch is unreasonable, it would be good to analyze it.
I have a curious situation involving
zfs_prefetch_disable=1
, but prefetch counters are non-zero:The reason for the intrigue re: prefetch is because some Ubuntu 22.04 hosts are running into #14516, #14120 (want #11980), which was causing the oom killer to run about and kill database processes:
Also, we think we were hitting #14686, which was fixed in #14692 (and why we're upgrading to 24.04):
But, on the 24.04 systems, we still see significant prefetch activity:
The workload is interesting because this feature store runs on top of CockroachDB. When one feature finishes, the workload moves to the next feature, and the DB needs to fault in ~100% net-new data. This appears to be triggering the prefetch activity, which leads the host to exceed its ARC max (which leads to unnecessary OOMs, which triggered the upgrade from 22.04 to 24.04).
#15214 looks promising, except we have prefetch disabled but still see similar reported behavior regarding arc_anon usage (not necessarily the pegged core, however).
But, why is prefetch activity happening in the first place?
logbias=throughput
or something else?The text was updated successfully, but these errors were encountered: