-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FreeBSD: Fix zvol_*_open() lock inversion #12934
Conversation
I need to write the body of the commit message for the main commit before this goes in, but I'll let ZTS run for now. |
module/os/freebsd/zfs/zvol_os.c
Outdated
if (!mutex_tryenter(&spa_namespace_lock)) { | ||
mutex_exit(&zv->zv_state_lock); | ||
rw_exit(&zv->zv_suspend_lock); | ||
/* XXX: Should we delay like Linux does here? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to delay here, but you may want to schedule()
. The situation may be different on FreeBSD, but on Linux at least it was possible to get stuck spinning here on a single processor system without the schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kern_yield(PRI_USER), I suppose. But it will still spin if some other thread is holding spa_namespace_lock, just probably not deadlock on UP, indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not intimate with all the multiple locks there, but it looks wrong to me to first check for mutex_owned(&spa_namespace_lock), that assumes the namespace lock may be held from outside the function, but then try to enforce lock ordering with zv_suspend_lock taken inside this function.
@behlendorf ^^^ the same issue I suppose in Linux.
module/os/freebsd/zfs/zvol_os.c
Outdated
if (!mutex_tryenter(&spa_namespace_lock)) { | ||
mutex_exit(&zv->zv_state_lock); | ||
rw_exit(&zv->zv_suspend_lock); | ||
/* XXX: Should we delay like Linux does here? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kern_yield(PRI_USER), I suppose. But it will still spin if some other thread is holding spa_namespace_lock, just probably not deadlock on UP, indeed.
These are the changes for FreeBSD corresponding to the changes made for Linux in openzfs#12863, see that PR for details. Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Signed-off-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
The lock ordering requirement is that we need to make sure the take the |
But in your commit you've moved zv_suspend_lock acquisition up, before the spa_namespace_lock. I am at loss. |
Sorry, I really should have proofread my comment better, I got it exactly backwards. We need to take the |
I have the commit updated and am letting it run through 50 iterations of the rsend tag in ZTS to be safe. The locking is certainly spaghetti code here but it seems correct to me, or at least it obeys the ordering described in the comment at the top of module/zfs/zvol.c (which doesn't mention spa_namespace_lock, fwiw). |
@behlendorf That resolves my confusion, but does not answer my question: Can zvol_open()/zvol_*_open() be called with spa_namespace_lock held (I can think of only nested pools)? If yes, then there is still chance of lock order reversal, since caller won't hold zv_suspend_lock/zv_state_lock for the zvol in advance. If no, then check for mutex_owned(&spa_namespace_lock) makes no sense to me. |
I don't see any way to enter the zvol open functions with spa_namespace_lock held. My understanding is that the mutex_owned check is there for the first open retry logic, which should acquire the locks in the correct order with spa_namespace_lock taken last. |
That would be true before the change(s), since previously spa_namespace_lock was acquired before the retry. But with the change(s) retry is done without the lock acquisition, spinning in retrial instead of waiting for the lock once. |
It can, but nested pools are the only scenario I'm aware of where it's possible. Sadly you're right, this change doesn't handle that particular case. It only covers the more common case we happened were seeing with some |
Yes, I can confirm, tried this (actually one half of a mirror vdev on a zvol ) on SmartOS some years ago and ran into deadlocks every now and then. Although I can't find any pointers right now, I vaguely remember discussing nested zfs years ago. If memory serves me well, the standpoint was that while not officially supported, the team tries to not break it lightly. Indeed nesting a pool on another can come in handy on rare occasions ,but I agree that the effort has to be weight up. |
@freqlabs @amotin if you guys are happy with this I think it's reasonable to merge. |
It's okay with me. Fixing the locking for nested pools is not where want to go with this. |
I am not happy with that custom "spinlock", but I accept there is a reason for such a mess. |
Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12934
These are the changes for FreeBSD corresponding to the changes made for Linux in openzfs#12863, see that PR for details. Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#12934
These are the changes for FreeBSD corresponding to the changes made for Linux in openzfs#12863, see that PR for details. Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#12934
These are the changes for FreeBSD corresponding to the changes made for Linux in openzfs#12863, see that PR for details. Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#12934
These are the changes for FreeBSD corresponding to the changes made for Linux in #12863, see that PR for details. Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12934
These are the changes for FreeBSD corresponding to the changes made for Linux in openzfs#12863, see that PR for details. Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#12934
Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#12934
These are the changes for FreeBSD corresponding to the changes made for Linux in openzfs#12863, see that PR for details. Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#12934
Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#12934
This is not associated with a specific upstream commit but apparently a local diff applied as part of: commit e92ffd9b626833ebdbf2742c8ffddc6cd94b963e Merge: 3c3df3660072 17b2ae0 Author: Martin Matuska <[email protected]> Date: Sat Jan 22 23:05:15 2022 +0100 zfs: merge openzfs/zfs@17b2ae0b2 (master) into main Notable upstream pull request merges: openzfs#12766 Fix error propagation from lzc_send_redacted openzfs#12805 Updated the lz4 decompressor openzfs#12851 FreeBSD: Provide correct file generation number openzfs#12857 Verify dRAID empty sectors openzfs#12874 FreeBSD: Update argument types for VOP_READDIR openzfs#12896 Reduce number of arc_prune threads openzfs#12934 FreeBSD: Fix zvol_*_open() locking openzfs#12947 lz4: Cherrypick fix for CVE-2021-3520 openzfs#12961 FreeBSD: Fix leaked strings in libspl mnttab openzfs#12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD openzfs#12981 Introduce a flag to skip comparing the local mac when raw sending openzfs#12985 Avoid memory allocations in the ARC eviction thread Obtained from: OpenZFS OpenZFS commit: 17b2ae0
Motivation and Context
These are the changes for FreeBSD corresponding to the changes made for Linux in #12863, see that PR for motivation.
Description
Changes from #12863 applied for zvol_geom_open and zvol_cdev_open on FreeBSD.
This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode.
While porting the changes I noticed the comments in this code look a bit nicer on Linux so went ahead and added a second commit to clean them up a bit for FreeBSD as well.
How Has This Been Tested?
ZTS passed on FreeBSD.
Types of changes
Checklist:
Signed-off-by
.