-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs-2.2.2 patchset #15602
zfs-2.2.2 patchset #15602
Conversation
This reverts commit bd7a02c which can trigger an unlikely existing bio alignment issue on Linux. This change is good, but the underlying issue it exposes needs to be resolved before this can be re-applied. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#15533
Over its history this the dirty dnode test has been changed between checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and `dn_dirty_record`. de198f2 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency 2531ce3 Revert "Report holes when there are only metadata changes" ec4f9b8 Report holes when there are only metadata changes 454365b Fix dirty check in dmu_offset_next() 66aca24 SEEK_HOLE should not block on txg_wait_synced() Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9 It turns out both are actually required. In the case of appending data to a newly created file, the dnode proper is dirtied (at least to change the blocksize) and dirty records are added. Thus, a single logical operation is represented by separate dirty indicators, and must not be separated. The incorrect dirty check becomes a problem when the first block of a file is being appended to while another process is calling lseek to skip holes. There is a small window where the dnode part is undirtied while there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)` would not know that the file is dirty, and would go to `dnode_next_offset()`. Since the object has no data blocks yet, it returns `ESRCH`, indicating no data found, which results in `ENXIO` being returned to `lseek()`'s caller. Since coreutils 9.2, `cp` performs sparse copies by default, that is, it uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to replicate the holes in the target. When it hits the bug, its initial search for data fails, and it goes on to call `fallocate()` to create a hole over the entire destination file. This has come up more recently as users upgrade their systems, getting OpenZFS 2.2 as well as a newer coreutils. However, this problem has been reproduced against 2.1, as well as on FreeBSD 13 and 14. This change simply updates the dirty check to check both types of dirty. If there's anything dirty at all, we immediately go to the "wait for sync" stage, It doesn't really matter after that; both changes are on disk, so the dirty fields should be correct. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#15571 Closes openzfs#15526
In case of crash cloned blocks need to be claimed on pool import. It is only possible if they (lr_bps) and their count (lr_nbps) are not encrypted but only authenticated, similar to block pointer in lr_write_t. Few other fields can be and are still encrypted. This should fix panic on ZIL claim after crash when block cloning is actively used. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Edmund Nadolski <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15543 Closes openzfs#15513
The zfs_load-key tests were failing on F39 due to their use of the deprecated ssl.wrap_socket function. This commit updates the test to instead use ssl.SSLContext() as described in https://stackoverflow.com/a/65194957. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#15534 Closes openzfs#15550
So that zdb (and others!) can get at the BRT on-disk structures. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#15541
Same idea as the dedup stats, but for block cloning. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#15541
zdb with '-e' or exported zpool doesn't work along with '-O' and '-r' options as we process them before '-e' has been processed. Below errors are seen: ~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0 failed to hold dataset 'pool-mds65/mdt65': No such file or directory ~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p. failed to hold dataset 'pool-oss0/ost0': No such file or directory zdb: internal error: No such file or directory We need to make sure to process '-O|-r' options after the '-e' option has been processed, which imports the pool to the namespace if it's not in the cachefile. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Akash B <[email protected]> Closes openzfs#15532
Previously, dmu_buf_will_clone() would roll back any dirty record, but would not clean out the modified data nor reset the state before releasing the lock. That leaves the last-written data in db_data, but the dbuf in the wrong state. This is eventually corrected when the dbuf state is made NOFILL, and dbuf_noread() called (which clears out the old data), but at this point its too late, because the lock was already dropped with that invalid state. Any caller acquiring the lock before the call into dmu_buf_will_not_fill() can find what appears to be a clean, readable buffer, and would take the wrong state from it: it should be getting the data from the cloned block, not from earlier (unwritten) dirty data. Even after the state was switched to NOFILL, the old data was still not cleaned out until dbuf_noread(), which is another gap for a caller to take the lock and read the wrong data. This commit fixes all this by properly cleaning up the previous state and then setting the new state before dropping the lock. The DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the lock is down. Sponsored-by: Klara, Inc. Sponsored-By: OpenDrives Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#15566 Closes openzfs#15526
It was broken for several reasons: * VOP_UNLOCK lost an argument in 13.0. So OpenZFS should be using VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in. * The location of the zlib header moved in 13.0 and 12.1. We can drop support for building on 12.0, which is EoL. * knlist_init lost an argument in 13.0. OpenZFS change 9d08874 assumed 13.0 or later. * FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b03 assumed 13.0 or later. Sponsored-by: Axcient Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#15551
If all zfs dkms modules have been removed, a shell-init error message may appear, because /var/lib/dkms/zfs does no longer exist. Resolve this by leaving the directory earlier on. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mart Frauenlob <[email protected]> Closes openzfs#15576
With Linux v6.6.x and clang 16, a configure step fails on a warning that later results in an error while building, due to 'ts' being uninitialized. Add a trivial initialization to silence the warning. Signed-off-by: Jaron Kent-Dobias <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
0b4adaa
to
5d20406
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I've just found and fixed one more simple block cloning issue, that would be nice to include: #15603 .
@tonyhutter Two other things: I just verified myself, when using the tarball https://github.com/openzfs/zfs/releases/download/zfs-2.2.1/zfs-2.2.1.tar.gz, make native-deb-utils fails:
Would be nice not to carry that problem into the next (now 3rd) release. |
Github's mention tracking kind of makes this redundant but... probably worth including: |
Bug introduced in 213d682. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Warner Losh <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes openzfs#15606
zil_claim_clone_range() takes references on cloned blocks before ZIL replay. Later zil_free_clone_range() drops them after replay or on dataset destroy. The total balance is neutral. It means on actual replay we must take additional references, which would stay in BRT. Without this blocks could be freed prematurely when either original file or its clone are destroyed. I've observed BRT being emptied and the feature being deactivated after ZIL replay completion, which should not have happened. With the patch I see expected stats. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15603
Call vfs_exjail_clone() for mounts created under .zfs/snapshot to fill in the mnt_exjail field for the mount. If this is not done, the snapshots under .zfs/snapshot with not be accessible over NFS. This version has the argument name in vfs.h fixed to match that of the name in spl_vfs.c, although it really does not matter. External-issue: https://reviews.freebsd.org/D42672 Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rick Macklem <[email protected]> Closes openzfs#15563
META file and changelog updated. Signed-off-by: Tony Hutter <[email protected]>
5d20406
to
494aaae
Compare
8adf2e3 - would be good to also include, I guess. |
Just curious if you'll also revert Disable block cloning by default here/now/later. |
I would be also interested if this would be safe to enable again, since from my understanding block cloning was not responsible for the data loss problems after all, but just increased the possibility for corruption. If I understand it correctly the real bug got fixed in #15571 (c7fadf2) |
@telsasoft @FL140 The idea is to re-enable block cloning in some of the following 2.2 releases. 2.2.2 includes number of block cloning fixes too, making it more robust. But at least #15617 I couldn't get in time, the issue @ixhamza found just yesterday. Though we are getting to more and more rare cases and could now benefit from another round of tests. |
@amotin Great, I absolutely appreciate testing. I have no issue with having the feature disabled until this is rock solid. So since I upgraded the pools as recommended after the upgrade to Ubuntu 23.10 I have to ask, to make sure this is the correct approach now. 2.2.1 disabled the block cloning feature by just ignoring the setting all the time, correct? So we DON'T? need to set In addition I am sure about |
@FL140 yes, the 2.2.2 defaults are |
@robn Thank's for confirming, unfortunately I just ran in a really bad "... uncorrectable I/O failure and failure mode property for this pool is set to panic." after using the 2.2.2 branch from @tonyhutter with this patchset!
Honestly I am really trying to keep being positive here after the last to weeks, but this is getting more frustrating by every day. Upgrading to a stable Distro should never end up in such a f**k*p, sorry but I am getting frustrated right now and need a stable systems again ASAP. Trying to being rational I will investigate this further in ~9h to add as much info as possible, but I will have to dd the disk prior any further steps, this is getting to dangerous. So right now I can't tell if this is caused by any of the patches, but my guess would be that due to the "block cloning" bug (which we now know has a different background) there is a problem with the cloned data!? which was produced since 2.2.0 and prior 2.2.2. But I can't tell if that is the case. If anyone can point me in the right direction I am happy to investigate. But I really need a working machines again within the next days. |
I ran the script in #15586 (comment) to build the Ubuntu 23.10 packages. |
Noted that 2.2.2 just got released while I ran into that, I will compile the packages again with the official 2.2.2, but I guess the result will be the same as the patchset is the same in the official 2.2.2 branch and in @tonyhutter 2.2.2 branch, or did I miss something? |
Did the script output anything? If so, did it get to print the list of paths? At least that could give an idea what zdb invocation in the script triggered it. |
Thanks for the fast reply. No the script crashed with the above message very early. I made a screenshot but I think it will not be of much help here. The exact output (retyped from screenshot except pool name) was:
@0x5c Please don't get me wrong, apart from the fact that the script triggered the panic, which is not your fault!, I am really happy that you wrote it, so I can check the pool to some degree, which already showed that there still is a problem even with 2.2.2 or the data produced between 2.2.0 and 2.2.2. |
This output is not from the first zdb invocation. Since it is the only one touches the BRT (ie: block cloning), this seems to exclude any bug related to block cloning or the new zdb feature to dump the BRT. The rest of the script is one long series of pipes which
With that output, the failure could be just before or during 1), or one of the If you try again, do this change to the script -... | xargs -n2 zfs_ids_to_path -v "$zpool" | column -ts: -N DATASET,PATH -l2
+... | xargs -n2 zfs_ids_to_path -v "$zpool" EDIT: Just to make sure, also add |
@0x5c Thank's for the input, actually I am in the lucky position, that I am quite good with shell scripts, so I WILL investigate this for sure, but after a very long night I will fist dd the whole SSD prior any further steps. So expect feedback in the next days. |
@FL140 its unclear to me if you are seeing errors on the released 2.2.2, please confirm? If so, is it only coming from |
@robn While that happened with the tonyhutter branch of 2.2.2 this should be the same as the released 2.2.2. Which I build, installed just after the tonyhutter branch and I am running at the moment. The commits are exactly the same. As written I will investigate this in depth over the next days, but I want to dd the 1.7TB pool prior that. Then I will run the script again and report the results back. I will also disect the script and trace where it happens, if it happens again. What I really don't get though is that I get the message that the pool is in a panic mode and after a reboot it is happy working as nothing ever happened and |
That's why I asked if you got this from |
@robn I can't say where the message came from, BUT a quick grep of So I would guess it is not coming from the kernel, but can't say anything related to zdb (zdb was found nowhere in the syslog) is there output produced somewhere else than the terminal? But greping through the zfs source code with the error message should point us in the right direction anyways. |
@robn dissecting 0x5c:zfs-bclonecheck.git/bclonecheck.sh:
runs without an error and produces the file content:
|
That's the only part we already knew works, it's everything else that isn't clear |
@0x5c @robn Trying to be brave here, but running, crashing and rebooting takes time... I also ran
Looking on that my first guess would be a bad checksum!? But then I don't know how often the output is flushed and if we are missing stuff at the end... All of the above was produced with the release 2.2.2 branch and installed building Ubuntu packages. FYI: The output file has 2746022 lines, so it doesn't crash fast. Any ideas on how to proceed from here? |
Does it panic the system, exits, or stalls after those last lines? If it doesn't trigger a panic proper, it's hard to tell without running the modified script if it is Also before you test the modified script, can you run |
@0x5c I didn't run the script as a whole I took the relevant commands one after the other and analyzed the output. So I can confirm that
yes, they are as far as I can tell. It boils down to the |
Motivation and Context
Patchset for 2.2.2. This release includes the fix for dirty dbuf corruption: #15526
Description
Include fix for data corruption. Full details in: #15526. Other fixes also included.
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.