Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zvol_open() lock inversion #12863

Merged
merged 1 commit into from
Dec 17, 2021

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

While investigating the cause of issue #12849 I was able to reproduce
this deadlock by running the rsend tests repeatedly. There's a decent
chance this explains the occasional hangs in the Ubuntu 20.04 CI.

Description

When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced. In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

NOTE: The Linux code was modeled after the current FreeBSD code
which which is why I wasn't concerned with changing the ordering.
It sure looks to me like it is possible to hit this on FreeBSD, but I've
seen no reports of it happening. Someone more familiar with FreeBSD
should take a close look at the deadlock to see if it is possible.

How Has This Been Tested?

Locally by running the rsend tests in a loop. As expected after
correcting the lock ordering no deadlock have been observed.
Pending full results from the CI testing.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 16, 2021
@behlendorf behlendorf requested review from rincebrain, a user and tonyhutter December 16, 2021 01:36
When restructing the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#12849
@behlendorf
Copy link
Contributor Author

From the test results this did resolve the deadlock. The other failures here are unrelated. That said, I'm going to run it through the GitHub actions builders again just to be sure.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Dec 17, 2021
@behlendorf behlendorf merged commit 8a02d01 into openzfs:master Dec 17, 2021
@ghost
Copy link

ghost commented Dec 17, 2021

I am thinking whether it would make sense to apply similar changes to the FreeBSD zvol open code, or is the locking only different on Linux for some reason?

@behlendorf
Copy link
Contributor Author

From my reading of the code I don't see why this couldn't occur on FreeBSD as well. That said, it's been this way in the OpenZFS FreeBSD platform code since it was added so if it can happen it's thankfully very unlikely. I think it would be a good idea to make the same logical change on the FreeBSD side. I just didn't want to tackle that in this PR since it was addressing a regression introduced by a prior Linux specific commit.

On Linux I was able to hit this eventually by running the rsend test group in a loop since it includes tests which send/recv volumes. You might be able to make it more likely to hit by adding a delay for testing.

tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 22, 2021
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#12863
@numinit
Copy link
Contributor

numinit commented Dec 25, 2021

Could this happen on 2.1.2? I've seen these failures on a raw recv on both 2.0.x and 2.1.x:

https://www.reddit.com/r/zfs/comments/rkuf7i/comment/hpcj7v9/

@ghost ghost mentioned this pull request Jan 4, 2022
13 tasks
ghost pushed a commit to truenas/zfs that referenced this pull request Jan 6, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Signed-off-by: Ryan Moeller <[email protected]>
behlendorf pushed a commit that referenced this pull request Jan 14, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in #12863, see that PR for details.

Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12934
ghost pushed a commit to truenas/zfs that referenced this pull request Jan 19, 2022
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#12863
ghost pushed a commit to truenas/zfs that referenced this pull request Jan 19, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
ghost pushed a commit to truenas/zfs that referenced this pull request Jan 20, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
ghost pushed a commit to truenas/zfs that referenced this pull request Feb 2, 2022
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#12863
ghost pushed a commit to truenas/zfs that referenced this pull request Feb 2, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
tonyhutter pushed a commit that referenced this pull request Feb 3, 2022
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12863
tonyhutter pushed a commit that referenced this pull request Feb 3, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in #12863, see that PR for details.

Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12934
Carlo-Hu pushed a commit to Carlo-Hu/zfsonlinux that referenced this pull request Feb 23, 2022
the changes to zvol_open added to 2.1.2 (for coping with kernel
changes in 5.13) seem to have introduced a lock order inversion [0].

(noticed while reviewing the 2.0.6->2.0.7 changes (the patch was
applied after 2.1.2 was already tagged)

[0] openzfs/zfs#12863
Signed-off-by: Stoiko Ivanov <[email protected]>
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#12863
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#12863
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
These are the changes for FreeBSD corresponding to the changes made for
Linux in openzfs#12863, see that PR for details.

Changes from openzfs#12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12934
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants