Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.1.12] Fix concurrent resilvers initiated at same time #14895

Merged

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

Backport of #14892.

Description

For draid vdevs it was possible to initiate both the sequential and healing resilver at same time.

This fixes the following two scenarios.

  1. There's a window where a sequential rebuild can be started via ZED even if a healing resilver has been scheduled. This is fixed by adding additional check in spa_vdev_attach() for any scheduled resilver and return appropriate error code when a resilver is already in progress.

  2. It was possible for zpool clear to start a healing resilver when it wasn't needed at all. This occurs because during a vdev_open() the device is presumed to be healthy not until the device is validated by vdev_validate() and it's set unavailable. However, by this point an async resilver will have already been requested if the DTL isn't empty. This is fixed by cancelling the SPA_ASYNC_RESILVER request immediately at the end of vdev_reopen() when a resilver is unneeded.

Finally, added a testcase in ZTS for verification.

How Has This Been Tested?

Manually tested, new test case added.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

For draid vdevs it was possible to initiate both the
sequential and healing resilver at same time.

This fixes the following two scenarios.
     1) There's a window where a sequential rebuild can
be started via ZED even if a healing resilver has been
scheduled.
	- This is fixed by adding additional check in
spa_vdev_attach() for any scheduled resilver and return
appropriate error code when a resilver is already in
progress.

     2) It was possible for zpool clear to start a healing
resilver when it wasn't needed at all. This occurs because
during a vdev_open() the device is presumed to be healthy not
until the device is validated by vdev_validate() and it's set
unavailable. However, by this point an async resilver will
have already been requested if the DTL isn't empty.
	- This is fixed by cancelling the SPA_ASYNC_RESILVER
request immediately at the end of vdev_reopen() when a resilver
is unneeded.

Finally, added a testcase in ZTS for verification.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Dipak Ghosh <[email protected]>
Signed-off-by: Akash B <[email protected]>
Closes openzfs#14881
Closes openzfs#14892
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 25, 2023
@behlendorf behlendorf changed the title Fix concurrent resilvers initiated at same time [2.1.12] Fix concurrent resilvers initiated at same time May 25, 2023
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 26, 2023
@behlendorf behlendorf merged commit c2f0aae into openzfs:zfs-2.1.12-staging May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants