Move to ZFS volume creates correctly sized files filled with \0 #8816

abrasive · 2019-05-26T00:37:14Z

System information

Type	Version/Name
Distribution Name	Gentoo
Distribution Version	n/a
Linux Kernel	5.0.0-rc1+
Architecture	amd64
ZFS Version	0.8.0-r0-gentoo
SPL Version	0.8.0-r0-gentoo

Describe the problem you're observing

Since upgrading to 0.8.0 and attempting to upgrade system packages, I began to find newly installed files filled with nulls, but with sane sizes.
On this system, packages are compiled on one dataset (compression=lz4), before being installed to other datasets (compression=lz4 and compression=gzip).

The pattern of nulling is not consistent, but it is bursty, ie. several related files with similar inode numbers, which were copied sequentially, tend to be affected.

The behaviour is similar to that observed in #3125. Setting zfs_nopwrite_enabled=0 appears to have no effect.

dedup=off, checksum=on, sync=standard on all datasets.

This problem was not observed on 0.7.13 which I had been running for quite some time. Portage has not changed so I think this is triggered by a ZFS change. Unfortunately, like an optimist, I upgraded the pool already, and since feature@spacemap_v2=active I can't go back to test v0.7.13 again.

Disabling portage's native-extensions USE flag, preventing portage from using fancy kernel ops for faster file copying, prevents the problem from occurring.

Examining a nulled file with zdb -ddddd:

Dataset zwei/sys/debug/src [ZPL], ID 233, cr_txg 18372, 2.68G, 259475 objects, rootbp DVA[0]=<0:a14af22000:2000> DVA[1]=<0:d98c456000:2000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=800L/800P birth=603513L/603513P fill=259475 cksum=11fe4fc52e:136c39a331f8:d4758e007904d:6e79435bbf1f4df

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
    390796    1   128K    32K      0     512    32K    0.00  ZFS plain file
                                               168   bonus  System attributes
	dnode flags: USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	path	/sys-power/iasl-20190215/acpica-unix-20190215/source/compiler/aslfold.c
	uid     0
	gid     0
	atime	Sun May 26 08:55:58 2019
	mtime	Sat Feb 16 04:38:40 2019
	ctime	Sun May 26 08:49:46 2019
	crtime	Sun May 26 08:49:46 2019
	gen	603441
	mode	100644
	size	32675
	parent	370335
	links	1
	pflags	40800000004
Indirect blocks:

Describe how to reproduce the problem

Choosing iasl as a small package which has exhibited the problem, we can repeatedly reinstall it without rebuilding it every time until the problem occurs:

cd /usr/portage/sys-power/iasl
while [[ -z $stop ]]; do
  sudo FEATURES=noclean ebuild iasl-20190215.ebuild install qmerge
  for f in $(equery --no-color files --filter=obj,conf,cmd,doc,man,info sys-power/iasl 2>/dev/null); do
    if [ -s $f ] && ! (tr -d '\0' < $f | read -n 1); then
      stop=1
      echo $f: null
    fi
  done
done

This will typically cause a failure within 2-3 cycles.

I have attempted to write a test script to copy files using Portage's native extension, but have been unable to reproduce the issue (recursively copying a directory of files from one dataset to another, or generating a single directory of files and then copying those).

Include any warning/errors/backtraces from the system logs

No warnings or errors observed.

The text was updated successfully, but these errors were encountered:

abrasive · 2019-05-26T01:36:17Z

I suspect the origin files have to be very fresh to reproduce this bug.

The most commonly affected files are those which were touched last. For iasl, as I have Portage set up to install the source code as part of the merge, the install phase ends by rsync'ing the source files into the tree of files to install - and then these are most commonly affected.

If I split the install stage, which builds the tree of files to install, from the qmerge stage, which copies them to the target filesystem, I still reproduce the problem, albeit less often:

    sudo FEATURES=noclean ebuild iasl-20190215.ebuild install
    sudo FEATURES=noclean ebuild iasl-20190215.ebuild qmerge

If I insert sleep 5 in between, I can't reproduce the issue.

If I sync or echo 1 > /proc/sys/vm/drop_caches, the issue still occurs.

If I set sync=always on the /var/tmp dataset, the install phase slows down significantly, and I can't reproduce the issue.

fling- · 2019-05-26T04:18:00Z

What features are enabled on the pool?

I had the same problem in #6931 but got it fixed with 454365b.

bunder2015 · 2019-05-26T06:18:45Z

Reported upstream to gentoo https://bugs.gentoo.org/635002#c18

abrasive · 2019-05-26T06:38:40Z

@fling-

Pool features

zwei  feature@async_destroy          enabled                        local
zwei  feature@empty_bpobj            active                         local
zwei  feature@lz4_compress           active                         local
zwei  feature@multi_vdev_crash_dump  enabled                        local
zwei  feature@spacemap_histogram     active                         local
zwei  feature@enabled_txg            active                         local
zwei  feature@hole_birth             active                         local
zwei  feature@extensible_dataset     active                         local
zwei  feature@embedded_data          active                         local
zwei  feature@bookmarks              enabled                        local
zwei  feature@filesystem_limits      enabled                        local
zwei  feature@large_blocks           enabled                        local
zwei  feature@large_dnode            enabled                        local
zwei  feature@sha512                 enabled                        local
zwei  feature@skein                  enabled                        local
zwei  feature@edonr                  enabled                        local
zwei  feature@userobj_accounting     active                         local
zwei  feature@encryption             enabled                        local
zwei  feature@project_quota          enabled                        local
zwei  feature@device_removal         enabled                        local
zwei  feature@obsolete_counts        enabled                        local
zwei  feature@zpool_checkpoint       enabled                        local
zwei  feature@spacemap_v2            active                         local
zwei  feature@allocation_classes     enabled                        local
zwei  feature@resilver_defer         enabled                        local
zwei  feature@bookmark_v2            enabled                        local

I did turn the new autotrim option on initially, as this is on an SSD, then turned it off once the issue became apparent; all of the subsequent tracking-down has been done with autotrim=off.

gyakovlev · 2019-05-26T06:53:24Z

I have plenty of systems running 0.8.0 on gentoo but most of them are running non-upgraded pools.
no issues so far.

I've been hit by previous iteration of this, but can't observe it with 0.8.0 yet.

manual trim couple of times, but no autotrim yet

zroot  autotrim                       off                            default
zroot  feature@async_destroy          enabled                        local
zroot  feature@empty_bpobj            active                         local
zroot  feature@lz4_compress           active                         local
zroot  feature@multi_vdev_crash_dump  enabled                        local
zroot  feature@spacemap_histogram     active                         local
zroot  feature@enabled_txg            active                         local
zroot  feature@hole_birth             active                         local
zroot  feature@extensible_dataset     active                         local
zroot  feature@embedded_data          active                         local
zroot  feature@bookmarks              enabled                        local
zroot  feature@filesystem_limits      enabled                        local
zroot  feature@large_blocks           enabled                        local
zroot  feature@large_dnode            active                         local
zroot  feature@sha512                 enabled                        local
zroot  feature@skein                  enabled                        local
zroot  feature@edonr                  enabled                        local
zroot  feature@userobj_accounting     active                         local
zroot  feature@encryption             enabled                        local
zroot  feature@project_quota          disabled                       local
zroot  feature@device_removal         disabled                       local
zroot  feature@obsolete_counts        disabled                       local
zroot  feature@zpool_checkpoint       disabled                       local
zroot  feature@spacemap_v2            disabled                       local
zroot  feature@allocation_classes     disabled                       local
zroot  feature@resilver_defer         disabled                       local
zroot  feature@bookmark_v2            enabled                        local

abrasive · 2019-05-26T14:40:30Z

I have managed to reproduce this without installing packages (though it still requires Portage's python libraries installed).

Python3 script here: https://gist.github.com/abrasive/dd85c7bb4686200927df660a2b4f1d93
Usage eg.

mkdir ~/scratch/test1 ~/scratch/test2
while python zfs_test_null.py ~/scratch/test1 ~/scratch/test2; echo .; done

It also reproduces the issue when copying within the same dataset, the cross-dataset thing is a red herring.

The repro works if a scrub is running on the pool at the time. I've tried loading it in other ways (eg. reading all the files on the FS sequentially in the background) but have not had a measurable repro rate with those methods.

beren12 · 2019-05-28T01:20:12Z

does it still do it if you disable native-extensions for the build?

abrasive · 2019-05-28T12:13:09Z

@beren12 no, with native-extensions disabled the problem does not occur.

abrasive · 2019-05-28T12:18:45Z

I have managed to observe the issue at the syscall level. The problem here is that the copy routine is sparsity-aware, and so it looks for the first data chunk by doing lseek(fd_in, 0, SEEK_DATA).
On a newly-written file, in particular large files, this occasionally returns ENXIO instead; this tells the copy routine that the entire file is a hole, and it writes the output accordingly.

For example, here is a trace of a successful copy operation:

     0.000090 openat(AT_FDCWD, "/tmp/a/zfs_test/test00008", O_RDONLY|O_CLOEXEC) = 3
     0.000130 openat(AT_FDCWD, "/tmp/b/zfs_test/test00008", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 6
     0.000124 lseek(3, 0, SEEK_DATA)    = 0
     0.000041 lseek(3, 0, SEEK_HOLE)    = 10000
     0.000029 copy_file_range(3, [0], 6, [0], 10000, 0) = 10000
     0.000289 lseek(3, 10000, SEEK_DATA) = -1 ENXIO (No such device or address)
     0.000202 lseek(3, 0, SEEK_END)     = 10000
     0.000054 ftruncate(6, 10000)       = 0
     0.000064 close(6)                  = 0
     0.000064 close(3)                  = 0

And here is the immediately following copy, which failed:

     0.000110 openat(AT_FDCWD, "/tmp/a/zfs_test/test00009", O_RDONLY|O_CLOEXEC) = 3
     0.000117 openat(AT_FDCWD, "/tmp/b/zfs_test/test00009", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 6
     0.000147 lseek(3, 0, SEEK_DATA)    = -1 ENXIO (No such device or address)
     0.000067 lseek(3, 0, SEEK_END)     = 1000000
     0.000044 lseek(6, 1000000, SEEK_SET) = 1000000
     0.000043 ftruncate(6, 1000000)     = 0
     0.000058 close(6)                  = 0
     0.000081 close(3)                  = 0

abrasive · 2019-05-28T12:27:16Z

I should note, for the record, that the test script in question creates 1000 files in a directory sequentially, then copies them all, so they've all been closed and they stat() as the correct size.

abrasive · 2019-05-28T12:50:41Z

Here is a pure C repro of the lseek(..., SEEK_DATA) results:

https://gist.github.com/abrasive/f062f967cc41c797bd08ec11f30f8fbf

As before I find I need to be loading the pool to reliably trigger the bug; scrubbing remains particularly effective.

abrasive · 2019-05-28T21:05:32Z

Tested also on:

Gentoo
zfs/spl 0.8.0-r0-gentoo
kernel 4.9.76-gentoo-r1
amd64
spinning rust (2x2 striped mirror)
Reproduces instantly, even on small files, without loading eg. scrubbing the system.

Raspbian
zfs/spl 0.7.11-1
kernel 4.14.71-v7+
armv7l
spinning rust (single disk via USB)
Unable to reproduce so far. (Terrible machine but my only <0.8.0 system.)

richardelling · 2019-05-29T00:59:45Z

What happens if you change https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_dmu_offset_next_sync ?

abrasive · 2019-05-29T16:20:28Z

@richardelling I see no change, the bug still occurs.

behlendorf · 2019-05-29T19:21:26Z

@abrasive thank you for the reproducer, with it I was able to reproduce the issue. Reverting commit ec4f9b8 should resolve the issue.

gyakovlev · 2019-05-29T22:19:26Z

zfs-kmod-0.8.0-r1 pushed to gentoo
gentoo/gentoo@abf3bbd

Issue: openzfs/zfs#8816 Issue: openzfs/zfs#8778 Bug: https://bugs.gentoo.org/635002 Package-Manager: Portage-2.3.67, Repoman-2.3.12 Signed-off-by: Georgy Yakovlev <[email protected]>

behlendorf · 2019-05-29T23:16:19Z

I've opened #8834 which reverts commit ec4f9b8. @abrasive would you mind if I adapted your C test slightly and included it in the PR so it could be part of the ZFS Test Suite to help detect this kind of problem.

abrasive · 2019-05-30T00:11:19Z

Thanks @behlendorf! Reverting ec4f9b8 works for me. The test code is all yours to use as you wish. How quickly did it tend to reproduce for you? On my NVMe system it would sometimes take 20+ runs, which misled me when I tried doing a bit of bisection myself.

behlendorf · 2019-05-30T00:30:38Z

@abrasive thanks. Surprisingly quickly in my test environment, I was able to reproduce the issue usually in under a minute but it would take multiple runs.

misterhsp · 2019-05-31T07:57:43Z

Do I have to recreate the pools afterwards or can I leave it like this? I just reinstalled with 0.8.0 before I saw it?

Thanks...

behlendorf · 2019-06-01T01:40:57Z

@misterhsp this issue would not cause any damage to your pool. What could happen, is that immediately after performing a write there was a race where a call to lseek(.. , SEEK_DATA) could incorrectly return ENXIO. This indicated that the file offset was past the end of the file and could result in data not being copied. Gentoo's portage was particularly good and hitting this, and they've already patched their packages.

misterhsp · 2019-06-01T06:12:08Z

Am 01.06.19 um 03:41 schrieb Brian Behlendorf:

@misterhsp <https://github.com/misterhsp> this issue would not cause any damage to your pool. What could happen, is that /immediately/ after performing a write there was a race where a call to |lseek(.. , SEEK_DATA)| could incorrectly return |ENXIO|. This indicated that the file offset was past the end of the file and could result in data not being copied. Gentoo's portage was particularly good and hitting this, and they've already patched their packages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8816>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKGJLRNP4KIELO5RCBHGVOLPYHHUHANCNFSM4HPVUHRA>.

Debian already has this patch. * Revert ec4f9b8 to avoid potential dataloss. (See: Github#8816) If I understood everything correctly, can I leave my pools like this even though they were created with the Debian packages that didn't have this patch yet?

abrasive · 2019-06-01T10:09:43Z

@misterhsp Yes, your pools should be left alone.

The effect of this defect is, when using sparse-aware tools to copy/move newly created files from a ZFS dataset, that the copied files in the destination might be empty. This combination of circumstances is uncommon, but if it did trigger, then those destination files are just regular empty files.

There is no corruption of the pool data structures.

misterhsp · 2019-06-01T10:21:18Z

Am 01.06.19 um 12:10 schrieb James Laird-Wah:

@misterhsp <https://github.com/misterhsp> Yes, your pools should be left alone. The effect of this defect is, when using sparse-aware tools to copy/move newly created files from a ZFS dataset, that the copied files in the destination might be empty. This combination of circumstances is uncommon, but if it did trigger, then those destination files are just regular empty files. There is no corruption of the pool data structures. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8816>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKGJLRJOIIRQLLF2DW6Y74DPYJDH5ANCNFSM4HPVUHRA>.

Thank you, now i get it. ...

This reverts commit ec4f9b8 which introduced a narrow race which can lead to lseek(, SEEK_DATA) incorrectly returning ENXIO. Resolve the issue by revering this change to restore the previous behavior which depends solely on checking the dirty list. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8816 Closes #8834

GregorKopka · 2019-08-05T20:21:40Z

Only slightly related comment regarding

Setting zfs_nopwrite_enabled=0 appears to have no effect.
dedup=off, checksum=on, sync=standard on all datasets.

NOP write needs a strong checksum, having checksum=on is (on non-deduped datasets, as the default there is fletcher4) effectively disabling this feature (see https://github.com/zfsonlinux/zfs/wiki/Checksums) on the dataset level.

Issue: openzfs/zfs#8816 Issue: openzfs/zfs#8778 Bug: https://bugs.gentoo.org/635002 Package-Manager: Portage-2.3.67, Repoman-2.3.12 Signed-off-by: Georgy Yakovlev <[email protected]>

behlendorf added Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression labels May 29, 2019

behlendorf mentioned this issue May 29, 2019

Revert "Report holes when there are only metadata changes" #8834

Merged

12 tasks

behlendorf closed this as completed in 2531ce3 May 31, 2019

rincebrain mentioned this issue Mar 8, 2020

make buildworld on a ZoL-originated pool results in invalid binaries zfsonfreebsd/ZoF#153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to ZFS volume creates correctly sized files filled with \0 #8816

Move to ZFS volume creates correctly sized files filled with \0 #8816

abrasive commented May 26, 2019

abrasive commented May 26, 2019

fling- commented May 26, 2019 •

edited

Loading

bunder2015 commented May 26, 2019

abrasive commented May 26, 2019

gyakovlev commented May 26, 2019

abrasive commented May 26, 2019 •

edited

Loading

beren12 commented May 28, 2019

abrasive commented May 28, 2019

abrasive commented May 28, 2019 •

edited

Loading

abrasive commented May 28, 2019

abrasive commented May 28, 2019 •

edited

Loading

abrasive commented May 28, 2019

richardelling commented May 29, 2019

abrasive commented May 29, 2019

behlendorf commented May 29, 2019

gyakovlev commented May 29, 2019

behlendorf commented May 29, 2019

abrasive commented May 30, 2019 via email

behlendorf commented May 30, 2019

misterhsp commented May 31, 2019

behlendorf commented Jun 1, 2019

misterhsp commented Jun 1, 2019 via email

abrasive commented Jun 1, 2019

misterhsp commented Jun 1, 2019 via email

GregorKopka commented Aug 5, 2019

Move to ZFS volume creates correctly sized files filled with \0 #8816

Move to ZFS volume creates correctly sized files filled with \0 #8816

Comments

abrasive commented May 26, 2019

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

abrasive commented May 26, 2019

fling- commented May 26, 2019 • edited Loading

bunder2015 commented May 26, 2019

abrasive commented May 26, 2019

gyakovlev commented May 26, 2019

abrasive commented May 26, 2019 • edited Loading

beren12 commented May 28, 2019

abrasive commented May 28, 2019

abrasive commented May 28, 2019 • edited Loading

abrasive commented May 28, 2019

abrasive commented May 28, 2019 • edited Loading

abrasive commented May 28, 2019

richardelling commented May 29, 2019

abrasive commented May 29, 2019

behlendorf commented May 29, 2019

gyakovlev commented May 29, 2019

behlendorf commented May 29, 2019

abrasive commented May 30, 2019 via email

behlendorf commented May 30, 2019

misterhsp commented May 31, 2019

behlendorf commented Jun 1, 2019

misterhsp commented Jun 1, 2019 via email

abrasive commented Jun 1, 2019

misterhsp commented Jun 1, 2019 via email

GregorKopka commented Aug 5, 2019

fling- commented May 26, 2019 •

edited

Loading

abrasive commented May 26, 2019 •

edited

Loading

abrasive commented May 28, 2019 •

edited

Loading

abrasive commented May 28, 2019 •

edited

Loading