Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeBSD: Fix zvol_*_open() lock inversion #12934

Closed
wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jan 4, 2022

Motivation and Context

These are the changes for FreeBSD corresponding to the changes made for Linux in #12863, see that PR for motivation.

Description

Changes from #12863 applied for zvol_geom_open and zvol_cdev_open on FreeBSD.
This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode.
While porting the changes I noticed the comments in this code look a bit nicer on Linux so went ahead and added a second commit to clean them up a bit for FreeBSD as well.

How Has This Been Tested?

ZTS passed on FreeBSD.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@ghost ghost requested review from amotin and behlendorf January 4, 2022 22:24
@ghost ghost added the Status: Code Review Needed Ready for review and testing label Jan 4, 2022
@ghost
Copy link
Author

ghost commented Jan 4, 2022

I need to write the body of the commit message for the main commit before this goes in, but I'll let ZTS run for now.

if (!mutex_tryenter(&spa_namespace_lock)) {
mutex_exit(&zv->zv_state_lock);
rw_exit(&zv->zv_suspend_lock);
/* XXX: Should we delay like Linux does here? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to delay here, but you may want to schedule(). The situation may be different on FreeBSD, but on Linux at least it was possible to get stuck spinning here on a single processor system without the schedule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kern_yield(PRI_USER), I suppose. But it will still spin if some other thread is holding spa_namespace_lock, just probably not deadlock on UP, indeed.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not intimate with all the multiple locks there, but it looks wrong to me to first check for mutex_owned(&spa_namespace_lock), that assumes the namespace lock may be held from outside the function, but then try to enforce lock ordering with zv_suspend_lock taken inside this function.

@behlendorf ^^^ the same issue I suppose in Linux.

if (!mutex_tryenter(&spa_namespace_lock)) {
mutex_exit(&zv->zv_state_lock);
rw_exit(&zv->zv_suspend_lock);
/* XXX: Should we delay like Linux does here? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kern_yield(PRI_USER), I suppose. But it will still spin if some other thread is holding spa_namespace_lock, just probably not deadlock on UP, indeed.

Ryan Moeller added 2 commits January 5, 2022 15:04
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Signed-off-by: Ryan Moeller <[email protected]>
@behlendorf
Copy link
Contributor

The lock ordering requirement is that we need to make sure the take the spa_namespace_lock before the zv_suspend_lock. Otherwise, it's possible to encounter this lock inversion, #12863 (comment), between zvol_suspend() and zvol_open()/zvol_geom_open().

@amotin
Copy link
Member

amotin commented Jan 5, 2022

The lock ordering requirement is that we need to make sure the take the spa_namespace_lock before the zv_suspend_lock.

But in your commit you've moved zv_suspend_lock acquisition up, before the spa_namespace_lock. I am at loss.

@behlendorf
Copy link
Contributor

But in your commit you've moved zv_suspend_lock acquisition up, before the spa_namespace_lock. I am at loss.

Sorry, I really should have proofread my comment better, I got it exactly backwards. We need to take the spa_namespace_lock after the zv_suspend_lock since this is the order required by the zvol_suspend call path. Altering that ordering would be invasive, and if you look at the comment at the top of spa_open_common() it does already handle the case where the spa_namespace_lock is already held.

@ghost
Copy link
Author

ghost commented Jan 5, 2022

I have the commit updated and am letting it run through 50 iterations of the rsend tag in ZTS to be safe. The locking is certainly spaghetti code here but it seems correct to me, or at least it obeys the ordering described in the comment at the top of module/zfs/zvol.c (which doesn't mention spa_namespace_lock, fwiw).

@amotin
Copy link
Member

amotin commented Jan 5, 2022

Sorry, I really should have proofread my comment better, I got it exactly backwards. We need to take the spa_namespace_lock after the zv_suspend_lock since this is the order required by the zvol_suspend call path. Altering that ordering would be invasive, and if you look at the comment at the top of spa_open_common() it does already handle the case where the spa_namespace_lock is already held.

@behlendorf That resolves my confusion, but does not answer my question: Can zvol_open()/zvol_*_open() be called with spa_namespace_lock held (I can think of only nested pools)? If yes, then there is still chance of lock order reversal, since caller won't hold zv_suspend_lock/zv_state_lock for the zvol in advance. If no, then check for mutex_owned(&spa_namespace_lock) makes no sense to me.

@ghost
Copy link
Author

ghost commented Jan 5, 2022

I don't see any way to enter the zvol open functions with spa_namespace_lock held. My understanding is that the mutex_owned check is there for the first open retry logic, which should acquire the locks in the correct order with spa_namespace_lock taken last.

@amotin
Copy link
Member

amotin commented Jan 5, 2022

mutex_owned check is there for the first open retry logic, which should acquire the locks in the correct order with spa_namespace_lock taken last.

That would be true before the change(s), since previously spa_namespace_lock was acquired before the retry. But with the change(s) retry is done without the lock acquisition, spinning in retrial instead of waiting for the lock once.

@behlendorf
Copy link
Contributor

Can zvol_open()/zvol_*_open() be called with spa_namespace_lock held

It can, but nested pools are the only scenario I'm aware of where it's possible. Sadly you're right, this change doesn't handle that particular case. It only covers the more common case we happened were seeing with some zvol consumers. Fixing both looked like it would require far more disruptive locking changes that I was trying to avoid. To my knowledge layering a zpool on zvol has always been a bit dodgy on every platform so I wasn't too focused on it.

@AttilaFueloep
Copy link
Contributor

To my knowledge layering a zpool on zvol has always been a bit dodgy on every platform so I wasn't too focused on it.

Yes, I can confirm, tried this (actually one half of a mirror vdev on a zvol ) on SmartOS some years ago and ran into deadlocks every now and then.

Although I can't find any pointers right now, I vaguely remember discussing nested zfs years ago. If memory serves me well, the standpoint was that while not officially supported, the team tries to not break it lightly. Indeed nesting a pool on another can come in handy on rare occasions ,but I agree that the effort has to be weight up.

@behlendorf
Copy link
Contributor

behlendorf commented Jan 13, 2022

@freqlabs @amotin if you guys are happy with this I think it's reasonable to merge.

@ghost
Copy link
Author

ghost commented Jan 13, 2022

It's okay with me. Fixing the locking for nested pools is not where want to go with this.

@amotin
Copy link
Member

amotin commented Jan 13, 2022

I am not happy with that custom "spinlock", but I accept there is a reason for such a mess.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jan 14, 2022
behlendorf pushed a commit that referenced this pull request Jan 14, 2022
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12934
@ghost ghost deleted the zvol_open_fbsd branch January 14, 2022 21:54
ghost pushed a commit to truenas/zfs that referenced this pull request Jan 19, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
ghost pushed a commit to truenas/zfs that referenced this pull request Jan 20, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
ghost pushed a commit to truenas/zfs that referenced this pull request Feb 2, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
tonyhutter pushed a commit that referenced this pull request Feb 3, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in #12863, see that PR for details.

Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12934
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
bsdjhb pushed a commit to CTSRD-CHERI/zfs that referenced this pull request Jul 13, 2023
This is not associated with a specific upstream commit but apparently
a local diff applied as part of:

commit e92ffd9b626833ebdbf2742c8ffddc6cd94b963e
Merge: 3c3df3660072 17b2ae0
Author: Martin Matuska <[email protected]>
Date:   Sat Jan 22 23:05:15 2022 +0100

    zfs: merge openzfs/zfs@17b2ae0b2 (master) into main

    Notable upstream pull request merges:
      openzfs#12766 Fix error propagation from lzc_send_redacted
      openzfs#12805 Updated the lz4 decompressor
      openzfs#12851 FreeBSD: Provide correct file generation number
      openzfs#12857 Verify dRAID empty sectors
      openzfs#12874 FreeBSD: Update argument types for VOP_READDIR
      openzfs#12896 Reduce number of arc_prune threads
      openzfs#12934 FreeBSD: Fix zvol_*_open() locking
      openzfs#12947 lz4: Cherrypick fix for CVE-2021-3520
      openzfs#12961 FreeBSD: Fix leaked strings in libspl mnttab
      openzfs#12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD
      openzfs#12981 Introduce a flag to skip comparing the local mac when raw sending
      openzfs#12985 Avoid memory allocations in the ARC eviction thread

    Obtained from:  OpenZFS
    OpenZFS commit: 17b2ae0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants