Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to ZFS volume creates correctly sized files filled with \0 #8816

Closed
abrasive opened this issue May 26, 2019 · 25 comments
Closed

Move to ZFS volume creates correctly sized files filled with \0 #8816

abrasive opened this issue May 26, 2019 · 25 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression

Comments

@abrasive
Copy link
Contributor

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version n/a
Linux Kernel 5.0.0-rc1+
Architecture amd64
ZFS Version 0.8.0-r0-gentoo
SPL Version 0.8.0-r0-gentoo

Describe the problem you're observing

Since upgrading to 0.8.0 and attempting to upgrade system packages, I began to find newly installed files filled with nulls, but with sane sizes.
On this system, packages are compiled on one dataset (compression=lz4), before being installed to other datasets (compression=lz4 and compression=gzip).

The pattern of nulling is not consistent, but it is bursty, ie. several related files with similar inode numbers, which were copied sequentially, tend to be affected.

The behaviour is similar to that observed in #3125. Setting zfs_nopwrite_enabled=0 appears to have no effect.

dedup=off, checksum=on, sync=standard on all datasets.

This problem was not observed on 0.7.13 which I had been running for quite some time. Portage has not changed so I think this is triggered by a ZFS change. Unfortunately, like an optimist, I upgraded the pool already, and since feature@spacemap_v2=active I can't go back to test v0.7.13 again.

Disabling portage's native-extensions USE flag, preventing portage from using fancy kernel ops for faster file copying, prevents the problem from occurring.

Examining a nulled file with zdb -ddddd:

Dataset zwei/sys/debug/src [ZPL], ID 233, cr_txg 18372, 2.68G, 259475 objects, rootbp DVA[0]=<0:a14af22000:2000> DVA[1]=<0:d98c456000:2000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=800L/800P birth=603513L/603513P fill=259475 cksum=11fe4fc52e:136c39a331f8:d4758e007904d:6e79435bbf1f4df

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
    390796    1   128K    32K      0     512    32K    0.00  ZFS plain file
                                               168   bonus  System attributes
	dnode flags: USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	path	/sys-power/iasl-20190215/acpica-unix-20190215/source/compiler/aslfold.c
	uid     0
	gid     0
	atime	Sun May 26 08:55:58 2019
	mtime	Sat Feb 16 04:38:40 2019
	ctime	Sun May 26 08:49:46 2019
	crtime	Sun May 26 08:49:46 2019
	gen	603441
	mode	100644
	size	32675
	parent	370335
	links	1
	pflags	40800000004
Indirect blocks:

Describe how to reproduce the problem

Choosing iasl as a small package which has exhibited the problem, we can repeatedly reinstall it without rebuilding it every time until the problem occurs:

cd /usr/portage/sys-power/iasl
while [[ -z $stop ]]; do
  sudo FEATURES=noclean ebuild iasl-20190215.ebuild install qmerge
  for f in $(equery --no-color files --filter=obj,conf,cmd,doc,man,info sys-power/iasl 2>/dev/null); do
    if [ -s $f ] && ! (tr -d '\0' < $f | read -n 1); then
      stop=1
      echo $f: null
    fi
  done
done

This will typically cause a failure within 2-3 cycles.

I have attempted to write a test script to copy files using Portage's native extension, but have been unable to reproduce the issue (recursively copying a directory of files from one dataset to another, or generating a single directory of files and then copying those).

Include any warning/errors/backtraces from the system logs

No warnings or errors observed.

@abrasive
Copy link
Contributor Author

I suspect the origin files have to be very fresh to reproduce this bug.

The most commonly affected files are those which were touched last. For iasl, as I have Portage set up to install the source code as part of the merge, the install phase ends by rsync'ing the source files into the tree of files to install - and then these are most commonly affected.

If I split the install stage, which builds the tree of files to install, from the qmerge stage, which copies them to the target filesystem, I still reproduce the problem, albeit less often:

    sudo FEATURES=noclean ebuild iasl-20190215.ebuild install
    sudo FEATURES=noclean ebuild iasl-20190215.ebuild qmerge

If I insert sleep 5 in between, I can't reproduce the issue.

If I sync or echo 1 > /proc/sys/vm/drop_caches, the issue still occurs.

If I set sync=always on the /var/tmp dataset, the install phase slows down significantly, and I can't reproduce the issue.

@fling-
Copy link
Contributor

fling- commented May 26, 2019

What features are enabled on the pool?

I had the same problem in #6931 but got it fixed with 454365b.

@bunder2015
Copy link
Contributor

Reported upstream to gentoo https://bugs.gentoo.org/635002#c18

@abrasive
Copy link
Contributor Author

@fling-

Pool features
zwei  feature@async_destroy          enabled                        local
zwei  feature@empty_bpobj            active                         local
zwei  feature@lz4_compress           active                         local
zwei  feature@multi_vdev_crash_dump  enabled                        local
zwei  feature@spacemap_histogram     active                         local
zwei  feature@enabled_txg            active                         local
zwei  feature@hole_birth             active                         local
zwei  feature@extensible_dataset     active                         local
zwei  feature@embedded_data          active                         local
zwei  feature@bookmarks              enabled                        local
zwei  feature@filesystem_limits      enabled                        local
zwei  feature@large_blocks           enabled                        local
zwei  feature@large_dnode            enabled                        local
zwei  feature@sha512                 enabled                        local
zwei  feature@skein                  enabled                        local
zwei  feature@edonr                  enabled                        local
zwei  feature@userobj_accounting     active                         local
zwei  feature@encryption             enabled                        local
zwei  feature@project_quota          enabled                        local
zwei  feature@device_removal         enabled                        local
zwei  feature@obsolete_counts        enabled                        local
zwei  feature@zpool_checkpoint       enabled                        local
zwei  feature@spacemap_v2            active                         local
zwei  feature@allocation_classes     enabled                        local
zwei  feature@resilver_defer         enabled                        local
zwei  feature@bookmark_v2            enabled                        local

I did turn the new autotrim option on initially, as this is on an SSD, then turned it off once the issue became apparent; all of the subsequent tracking-down has been done with autotrim=off.

@gyakovlev
Copy link
Contributor

I have plenty of systems running 0.8.0 on gentoo but most of them are running non-upgraded pools.
no issues so far.

I've been hit by previous iteration of this, but can't observe it with 0.8.0 yet.

manual trim couple of times, but no autotrim yet

zroot  autotrim                       off                            default
zroot  feature@async_destroy          enabled                        local
zroot  feature@empty_bpobj            active                         local
zroot  feature@lz4_compress           active                         local
zroot  feature@multi_vdev_crash_dump  enabled                        local
zroot  feature@spacemap_histogram     active                         local
zroot  feature@enabled_txg            active                         local
zroot  feature@hole_birth             active                         local
zroot  feature@extensible_dataset     active                         local
zroot  feature@embedded_data          active                         local
zroot  feature@bookmarks              enabled                        local
zroot  feature@filesystem_limits      enabled                        local
zroot  feature@large_blocks           enabled                        local
zroot  feature@large_dnode            active                         local
zroot  feature@sha512                 enabled                        local
zroot  feature@skein                  enabled                        local
zroot  feature@edonr                  enabled                        local
zroot  feature@userobj_accounting     active                         local
zroot  feature@encryption             enabled                        local
zroot  feature@project_quota          disabled                       local
zroot  feature@device_removal         disabled                       local
zroot  feature@obsolete_counts        disabled                       local
zroot  feature@zpool_checkpoint       disabled                       local
zroot  feature@spacemap_v2            disabled                       local
zroot  feature@allocation_classes     disabled                       local
zroot  feature@resilver_defer         disabled                       local
zroot  feature@bookmark_v2            enabled                        local

@abrasive
Copy link
Contributor Author

abrasive commented May 26, 2019

I have managed to reproduce this without installing packages (though it still requires Portage's python libraries installed).

Python3 script here: https://gist.github.com/abrasive/dd85c7bb4686200927df660a2b4f1d93
Usage eg.

mkdir ~/scratch/test1 ~/scratch/test2
while python zfs_test_null.py ~/scratch/test1 ~/scratch/test2; echo .; done

It also reproduces the issue when copying within the same dataset, the cross-dataset thing is a red herring.

The repro works if a scrub is running on the pool at the time. I've tried loading it in other ways (eg. reading all the files on the FS sequentially in the background) but have not had a measurable repro rate with those methods.

@beren12
Copy link
Contributor

beren12 commented May 28, 2019

does it still do it if you disable native-extensions for the build?

@abrasive
Copy link
Contributor Author

@beren12 no, with native-extensions disabled the problem does not occur.

@abrasive
Copy link
Contributor Author

abrasive commented May 28, 2019

I have managed to observe the issue at the syscall level. The problem here is that the copy routine is sparsity-aware, and so it looks for the first data chunk by doing lseek(fd_in, 0, SEEK_DATA).
On a newly-written file, in particular large files, this occasionally returns ENXIO instead; this tells the copy routine that the entire file is a hole, and it writes the output accordingly.

For example, here is a trace of a successful copy operation:

     0.000090 openat(AT_FDCWD, "/tmp/a/zfs_test/test00008", O_RDONLY|O_CLOEXEC) = 3
     0.000130 openat(AT_FDCWD, "/tmp/b/zfs_test/test00008", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 6
     0.000124 lseek(3, 0, SEEK_DATA)    = 0
     0.000041 lseek(3, 0, SEEK_HOLE)    = 10000
     0.000029 copy_file_range(3, [0], 6, [0], 10000, 0) = 10000
     0.000289 lseek(3, 10000, SEEK_DATA) = -1 ENXIO (No such device or address)
     0.000202 lseek(3, 0, SEEK_END)     = 10000
     0.000054 ftruncate(6, 10000)       = 0
     0.000064 close(6)                  = 0
     0.000064 close(3)                  = 0

And here is the immediately following copy, which failed:

     0.000110 openat(AT_FDCWD, "/tmp/a/zfs_test/test00009", O_RDONLY|O_CLOEXEC) = 3
     0.000117 openat(AT_FDCWD, "/tmp/b/zfs_test/test00009", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 6
     0.000147 lseek(3, 0, SEEK_DATA)    = -1 ENXIO (No such device or address)
     0.000067 lseek(3, 0, SEEK_END)     = 1000000
     0.000044 lseek(6, 1000000, SEEK_SET) = 1000000
     0.000043 ftruncate(6, 1000000)     = 0
     0.000058 close(6)                  = 0
     0.000081 close(3)                  = 0

@abrasive
Copy link
Contributor Author

I should note, for the record, that the test script in question creates 1000 files in a directory sequentially, then copies them all, so they've all been closed and they stat() as the correct size.

@abrasive
Copy link
Contributor Author

abrasive commented May 28, 2019

Here is a pure C repro of the lseek(..., SEEK_DATA) results:

https://gist.github.com/abrasive/f062f967cc41c797bd08ec11f30f8fbf

As before I find I need to be loading the pool to reliably trigger the bug; scrubbing remains particularly effective.

@abrasive
Copy link
Contributor Author

Tested also on:

Gentoo
zfs/spl 0.8.0-r0-gentoo
kernel 4.9.76-gentoo-r1
amd64
spinning rust (2x2 striped mirror)
Reproduces instantly, even on small files, without loading eg. scrubbing the system.

Raspbian
zfs/spl 0.7.11-1
kernel 4.14.71-v7+
armv7l
spinning rust (single disk via USB)
Unable to reproduce so far. (Terrible machine but my only <0.8.0 system.)

@richardelling
Copy link
Contributor

@abrasive
Copy link
Contributor Author

@richardelling I see no change, the bug still occurs.

@behlendorf
Copy link
Contributor

@abrasive thank you for the reproducer, with it I was able to reproduce the issue. Reverting commit ec4f9b8 should resolve the issue.

@gyakovlev
Copy link
Contributor

zfs-kmod-0.8.0-r1 pushed to gentoo
gentoo/gentoo@abf3bbd

pvdabeel pushed a commit to pvdabeel/gentoo that referenced this issue May 29, 2019
Issue: openzfs/zfs#8816
Issue: openzfs/zfs#8778
Bug: https://bugs.gentoo.org/635002
Package-Manager: Portage-2.3.67, Repoman-2.3.12
Signed-off-by: Georgy Yakovlev <[email protected]>
@behlendorf behlendorf added Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression labels May 29, 2019
@behlendorf
Copy link
Contributor

I've opened #8834 which reverts commit ec4f9b8. @abrasive would you mind if I adapted your C test slightly and included it in the PR so it could be part of the ZFS Test Suite to help detect this kind of problem.

@abrasive
Copy link
Contributor Author

abrasive commented May 30, 2019 via email

@behlendorf
Copy link
Contributor

@abrasive thanks. Surprisingly quickly in my test environment, I was able to reproduce the issue usually in under a minute but it would take multiple runs.

@misterhsp
Copy link

Do I have to recreate the pools afterwards or can I leave it like this? I just reinstalled with 0.8.0 before I saw it?

Thanks...

@behlendorf
Copy link
Contributor

@misterhsp this issue would not cause any damage to your pool. What could happen, is that immediately after performing a write there was a race where a call to lseek(.. , SEEK_DATA) could incorrectly return ENXIO. This indicated that the file offset was past the end of the file and could result in data not being copied. Gentoo's portage was particularly good and hitting this, and they've already patched their packages.

@misterhsp
Copy link

misterhsp commented Jun 1, 2019 via email

@abrasive
Copy link
Contributor Author

abrasive commented Jun 1, 2019

@misterhsp Yes, your pools should be left alone.

The effect of this defect is, when using sparse-aware tools to copy/move newly created files from a ZFS dataset, that the copied files in the destination might be empty. This combination of circumstances is uncommon, but if it did trigger, then those destination files are just regular empty files.

There is no corruption of the pool data structures.

@misterhsp
Copy link

misterhsp commented Jun 1, 2019 via email

behlendorf added a commit that referenced this issue Jun 7, 2019
This reverts commit ec4f9b8 which introduced a narrow race which
can lead to lseek(, SEEK_DATA) incorrectly returning ENXIO.  Resolve
the issue by revering this change to restore the previous behavior
which depends solely on checking the dirty list.

Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8816 
Closes #8834
@GregorKopka
Copy link
Contributor

Only slightly related comment regarding

Setting zfs_nopwrite_enabled=0 appears to have no effect.
dedup=off, checksum=on, sync=standard on all datasets.

NOP write needs a strong checksum, having checksum=on is (on non-deduped datasets, as the default there is fletcher4) effectively disabling this feature (see https://github.com/zfsonlinux/zfs/wiki/Checksums) on the dataset level.

gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Oct 6, 2019
Issue: openzfs/zfs#8816
Issue: openzfs/zfs#8778
Bug: https://bugs.gentoo.org/635002
Package-Manager: Portage-2.3.67, Repoman-2.3.12
Signed-off-by: Georgy Yakovlev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression
Projects
None yet
Development

No branches or pull requests

9 participants