Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Again fix race between zil_commit() and zil_suspend(). #14979

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

amotin
Copy link
Member

@amotin amotin commented Jun 14, 2023

With zl_suspend read in zil_commit() not protected by any locks it is possible for new ZIL writes to be in progress while zil_destroy() called by zil_suspend() freeing them. This patch closes the race by taking zl_issuer_lock in zil_suspend() and adding the second zl_suspend check to zil_get_commit_list(), protected by the lock. It allows all already queued transactions to be logged normally, while blocks any new ones, calling txg_wait_synced() for the TXGs.

Comparing to #14514 this does not introduce any new locks in the hot path, so if nothing else it should be faster.

How Has This Been Tested?

Not really. I have no reproduction to the original issue, so just looked on the code on @grwilson request.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

With zl_suspend read in zil_commit() not protected by any locks it
is possible for new ZIL writes to be in progress while zil_destroy()
called by zil_suspend() freeing them.  This patch closes the race
by taking zl_issuer_lock in zil_suspend() and adding the second
zl_suspend check to zil_get_commit_list(), protected by the lock.
It allows all already queued transactions to be logged normally,
while blocks any new ones, calling txg_wait_synced() for the TXGs.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
@behlendorf
Copy link
Contributor

@amotin could this potentially explain #14872? If so we could revert e085e98 to test the fix.

@amotin
Copy link
Member Author

amotin commented Jun 15, 2023

could this potentially explain #14872?

@behlendorf As I can see, it should fix only snapshot creation on old pre-SPA_VERSION_FAST_SNAP pools and some very few specific vdev manipulations: mirror splitting, log vdev offlining/removal, vdev passivating. So I can only speculate that it can be a delayed effect from some other tests, but that is probably a stretch.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 30, 2023
@behlendorf behlendorf merged commit 233425a into openzfs:master Jun 30, 2023
@amotin amotin deleted the zil_suspend branch June 30, 2023 16:00
amotin added a commit to amotin/zfs that referenced this pull request Jul 24, 2023
This locking was recently added as part of openzfs#14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second throught though, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from openzfs#14979.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
behlendorf pushed a commit that referenced this pull request Jul 25, 2023
This locking was recently added as part of #14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from #14979.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes #15103
amotin added a commit to amotin/zfs that referenced this pull request Jul 25, 2023
This locking was recently added as part of openzfs#14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from openzfs#14979.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15103
behlendorf pushed a commit that referenced this pull request Jul 25, 2023
This locking was recently added as part of #14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from #14979.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes #15103
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
With zl_suspend read in zil_commit() not protected by any locks it
is possible for new ZIL writes to be in progress while zil_destroy()
called by zil_suspend() freeing them.  This patch closes the race
by taking zl_issuer_lock in zil_suspend() and adding the second
zl_suspend check to zil_get_commit_list(), protected by the lock.
It allows all already queued transactions to be logged normally,
while blocks any new ones, calling txg_wait_synced() for the TXGs.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#14979
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
This locking was recently added as part of openzfs#14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from openzfs#14979.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15103
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants