Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.2] BRT and other fixes into 2.2.3-staging #15714

Merged

Conversation

mmatuska
Copy link
Contributor

@mmatuska mmatuska commented Dec 26, 2023

Motivation and Context

Backport BRT fixes + fixes suggested by amotin@ from master into 2.2.3-staging

Description

Backported commits:

86e115e uses ASSERT0P() which is not in 2.2, so use ASSERT0()

How Has This Been Tested?

Build and test on Linux and FreeBSD

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin
Copy link
Member

amotin commented Dec 27, 2023

adcea23 and bee9cfb are not related to block cloning, but I don't mind, though I am not familiar with the last.

But if we are going wider, I'd also propose e007908 , 58398cb , 3a8d9b8 , 35da345 , 2a27fd4 and f9765b1.

@mmatuska
Copy link
Contributor Author

I have removed bee9cfb as it has introduced a new feature.

amotin and others added 17 commits December 27, 2023 18:49
Once we verified the ABDs and asserted the sizes we should never
see premature ABDs ends.  Assert that and remove extra branches
from production builds.

Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15428
Add a dataset_kstats_rename function, and call it when renaming
a zvol on FreeBSD and Linux.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Sponsored-by: Axcient
Closes openzfs#15482
Closes openzfs#15486
- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl
output.
- Use much faster sbuf_cat() instead of sbuf_printf("%s").

Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes
to seconds, making dbufstat almost usable.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#15495
It is unused for 3 years since openzfs#10576.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15507
PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared
bigger than device size, instead of liming write it reset all the
system-wide tunables to their default.  Aside of being excessive,
it did not actually help with the problem, still allowing infinite
loop to happen.

This patch removes the tunables reverting logic, but instead limits
L2ARC writes (or at least eviction/trim) to 1/4 of the capacity.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#15519
This should make sure we have log written without overflows.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15517
Since we use a limited set of kmem caches, quite often we have unused
memory after the end of the buffer.  Put there up to a 512-byte canary
when built with debug to detect buffer overflows at the free time.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15553
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means we do not
need to do anything (drop the references) for not implemented yet
TX_CLONE_RANGE replay for ZVOLs.

This is a logical follow up to openzfs#15603.

Reviewed-by: Kay Pedersen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15612
ZIL claim can not handle block pointers cloned from the future,
since they are not yet allocated at that point.  It may happen
either if the block was just written when it was cloned, or if
the pool was frozen or somehow else rewound on import.

Handle it from two sides: prevent cloning of blocks with physical
birth time from not yet synced or frozen TXG, and abort ZIL claim
if we still detect such blocks due to rewind or something else.

While there, assert that any cloned blocks we claim are really
allocated by calling metaslab_check_free().

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15617
When two datasets share the same master encryption key, it is safe
to clone encrypted blocks. Currently only snapshots and clones
of a dataset share with it the same encryption key.

Added a test for:
- Clone from encrypted sibling to encrypted sibling with
  non encrypted parent
- Clone from encrypted parent to inherited encrypted child
- Clone from child to sibling with encrypted parent
- Clone from snapshot to the original datasets
- Clone from foreign snapshot to a foreign dataset
- Cloning from non-encrypted to encrypted datasets
- Cloning from encrypted to non-encrypted datasets

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Original-patch-by: Pawel Jakub Dawidek <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15544
Block pointers are not encrypted in TX_WRITE and TX_CLONE_RANGE
records, so we can dump them, that may be useful for debugging.

Related to openzfs#15543.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15629
To improve 128KB block write performance in case of multiple VDEVs
ZIL used to spit those writes into two 64KB ones.  Unfortunately it
was found to cause LWB buffer overflow, trying to write maximum-
sizes 128KB TX_CLONE_RANGE record with 1022 block pointers into
68KB buffer, since unlike TX_WRITE ZIL code can't split it.

This is a minimally-invasive temporary block cloning fix until the
following more invasive prediction code refactoring.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15634
Without this patch on pool of 60 vdevs with ZFS_DEBUG enabled clone
takes much more time than copy, while heavily trashing dbgmsg for
no good reason, repeatedly dumping all vdevs BRTs again and again,
even unmodified ones.

I am generally not sure this dumping is not excessive, but decided
to keep it for now, just restricting its scope to more reasonable.

Reviewed-by: Kay Pedersen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15625
dmu_assign_arcbuf_by_dnode() should drop dn_struct_rwlock lock in
case dbuf_hold() failed.  I don't have reproduction for this, but
it looks inconsistent with dmu_buf_hold_noread_by_dnode() and co.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#15644
In some cases dbuf_assign_arcbuf() may be called on a block that
was recently cloned.  If it happened in current TXG we must undo
the block cloning first, since the only one dirty record per TXG
can't and shouldn't mean both cloning and overwrite same time.

Reviewed-by: Kay Pedersen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15653
Block cloning normally creates dirty record without dr_data.  But if
the block is read after cloning, it is moved into DB_CACHED state and
receives the data buffer.  If after that we call dbuf_unoverride()
to convert the dirty record into normal write, we should give it the
data buffer from dbuf and release one.

Reviewed-by: Kay Pedersen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#15654
Closes openzfs#15656
While 763ca47 closes the situation of block cloning creating
unencrypted records in encrypted datasets, existing data still causes
panic on read. Setting zfs_recover bypasses this but at the cost of
potentially ignoring more serious issues.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Peredun <[email protected]>
Closes openzfs#15677
@mmatuska mmatuska changed the title [2.2] BRT fixes into 2.2.3-staging [2.2] BRT and other fixes into 2.2.3-staging Dec 27, 2023
@mmatuska
Copy link
Contributor Author

mmatuska commented Dec 27, 2023

I have updated the patchlist with suggested patches from @amotin

@mmatuska mmatuska requested a review from amotin December 27, 2023 22:29
@amotin
Copy link
Member

amotin commented Jan 3, 2024

Meanwhile I fixed two more cloning issues to add here after master: #15732 and #15735 .

@tonyhutter
Copy link
Contributor

Thanks for these backports.

I see 46c0bfc has in its commit message:

"This is a logical follow up to #15603."

Do we want #15603 included as well?

@amotin
Copy link
Member

amotin commented Jan 4, 2024

Do we want #15603 included as well?

@tonyhutter As I see it was merged to 2.2.2 already.

@behlendorf behlendorf merged commit f71c16a into openzfs:zfs-2.2.3-staging Jan 9, 2024
24 checks passed
@mmatuska mmatuska deleted the zfs-2.2.3-staging-import branch January 17, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants