ztest failures #989

dechamps · 2012-09-26T09:48:46Z

Testing on 37abac6, with the reguid test disabled because of #939, over 23 instances of ztest -P 1200 -T 3600, roughly 50% (12/23) of the instances failed. Here are the 12 errors:

lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.1a 70778880, /tmp/zf 64344436, 1) returned 0, expected 75
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.59a 80216064, /tmp/zf 70778880, 0) returned 24, expected 95
lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.21a 70778880, /tmp/zf 64344436, 1) returned 0, expected 75
lt-ztest: ../../module/zfs/vdev.c:2678: vdev_stat_update: Assertion `spa_sync_pass(spa) == 1' failed.
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.4a 70778880, /tmp/zb 64344436, 1) returned 0, expected 75
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.10a 62390272, /tmp/zb 90701824, 0) returned 24, expected 95
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.26a 80216064, /tmp/zb 72923694, 1) returned 0, expected 75

I have no idea if this is a regression or not. I'll try to bisect and/or isolate the corrupting test if there is one, but this is likely to be very time-consuming: if there's a 50% chance of success, it means I have to get 5 consecutive successes (= 5 hours without errors) to be 98% sure that the code being tested is good.

The text was updated successfully, but these errors were encountered:

dechamps · 2012-09-26T14:46:14Z

Some investigation on the error messages themselves. There are four distinct errors:

ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.

Means that spa_import() failed with EMFILE.

attach (/tmp/...) returned 24, expected 95

Means that spa_vdev_attach() returned EMFILE, but ztest expected it to return EOPNOTSUPP.

attach (/tmp/...) returned 0, expected 75

Means that spa_vdev_attach() completed successfully, but ztest expected it to fail with EOVERFLOW. A quick look through the code indicates that spa_vdev_attach() should return EOVERFLOW when replacing, if the new vdev isn't large enough:

/*
 * Make sure the new device is big enough.
 */
if (newvd->vdev_asize < vdev_get_min_asize(oldvd))
    return (spa_vdev_exit(spa, newrootvd, txg, EOVERFLOW));

vdev.c:2678: vdev_stat_update: Assertion `spa_sync_pass(spa) == 1' failed.

This indicates that vdev_stat_update() was called with a repair write leaf vdev ZIO as part of a TXG with the ZIO_FLAG_SCAN_THREAD flag set. The assertion failure means that this is not the first SPA sync pass, which either means that spa_sync() wasn't running when the assertion fired, or that the assertion fired after spa_sync() did its first pass.

I think we're probably dealing with multiple separate issues here.

The first two issues seem to be caused by EMFILE (too many open files) errors, which should be simple to diagnose. Note that the following code is run before the test begins (in ztest main()):

struct rlimit rl = { 1024, 1024 };
(void) setrlimit(RLIMIT_NOFILE, &rl);

On my system this is a no-op, because:

$ ulimit -n -S
1024
$ ulimit -n -H
1024

The EMFILE errors might be caused by differences in the number of open files when ztest runs on Linux versus Solaris. It's unclear why it needs more than 1024 files though, since we're dealing with approx. 60 file vdevs at any given time. Maybe we're forgetting to close some fds, so they leak and we end up with EMFILE?

I will continue investigating tomorrow.

dechamps · 2012-09-27T07:41:09Z

This seems really hard to reproduce. I had my 50% failure rate while running 5 ztests at the same time on different trees on a single VM. Now I tested again using 5 different VMs (one for rc7, rc8, rc9, rc10 and rc11) and I only got one failure over 17 runs on each VM (6%), and each time it was spa_import() returning EMFILE. Maybe I should increase the pass duration.

behlendorf · 2012-09-27T08:24:45Z

If you suspect a resource leak, you might try running ztest through the clang static analysis tool. It might flag a file handle leak. It also sounds like this is a long standing issue.

dechamps · 2012-09-27T08:27:17Z

With a one-hour pass, all my ztest instances failed nearly simultaneously after 40 minutes with the following message:

lt-ztest: ../../lib/libzpool/kernel.c:181: Assertion `pthread_create(&kt->t_tid, &attr, &zk_thread_helper, kt) == 0 (0xb == 0x0)' failed.

Investigating. #36 comes to mind, but I'm running this on a 64-bit system.

dechamps · 2012-09-27T10:52:55Z

I heavily suspect that the pthread_create() issue is due to a leakage of pthread threads objects. The problem is that a lot of threads are neither detached nor joined. ztest correctly uses pthread_join() on its own worker threads, but most ZFS teardown functions (e.g. called as part of kernel_fini()) don't. This means that there is a number of "zombie threads" lingering around and their number increases rapidly until the thread resource pool is exhausted, at which point pthread_create() returns EAGAIN.

I added some debugging printfs and they seem to indicate that ztest consistently fails when more than 33305 pthread objects are lingering around.

In addition, I noticed this in the original Solaris sources:

kthread_t *
zk_thread_create(void (*func)(), void *arg)
{
    thread_t tid;

    VERIFY(thr_create(0, 0, (void *(*)(void *))func, arg, THR_DETACHED,
        &tid) == 0);

    return ((void *)(uintptr_t)tid);
}

Notice the THR_DETACHED flag. ZFS On Linux doesn't do this, hence the zombie threads.

This should be easy to fix. I'm on it.

(Apparently, this isn't an issue in kernel space because kthreads always run detached. Can someone confirm?)

Currently, thread_create(), when called in userspace, creates a joinable (i.e. not detached thread). This is the pthread default. Unfortunately, this does not reproduce kthreads behavior (kthreads are always detached). In addition, this contradicts the original Solaris code which creates userspace threads in detached mode. These joinable threads are never joined, which leads to a leakage of pthread thread objects ("zombie threads"). This in turn results in excessive ressource consumption, and possible ressource exhaustion in extreme cases (e.g. long ztest runs). This patch fixes the issue by creating userspace threads in detached mode. The only exception is ztest worker threads which are meant to be joinable. See issue openzfs#989.

dechamps · 2012-09-27T12:44:24Z

Testing confirms that #991 fixes the thread issue.

In the mean time, I got another error message related to EMFILE, seems like we are definitly leaking file descriptors somewhere:

 ztest: can't open /tmp/zfstest/2012-09-27_135915_7730/zfstest_2012-09-27_135915_7730.spares.0: Too many open files

Currently, for unknown reasons, VOP_CLOSE() is a no-op in userspace. This causes file descriptor leaks. This is especially problematic with long ztest runs, since zpool.cache is opened repeatedly and never closed, resulting in resource exhaustion (EMFILE errors). This patch fixes the issue by making VOP_CLOSE() do what it is supposed to do. See issue openzfs#989.

dechamps · 2012-09-27T14:12:06Z

Regarding EMFILE: the issue is easy to reproduce by changing the setrlimit() call to 128 instead of 1024, which makes the run fail much faster (< 1 minute). Seems like all leaking fds point to zpool.cache, as evidenced by this result in the middle of a ztest run (with max 1024):

# lsof -p 32026 | grep zpool.cache
lt-ztest 32026 root   14w   REG   0,17     2928 165775 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   65w   REG   0,17     2928 173295 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   66w   REG   0,17     2928 173296 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   67w   REG   0,17     2928 164692 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   70w   REG   0,17     2928 164693 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   71w   REG   0,17     2928 164694 /tmp/zpool.cache (deleted)
(...)
# lsof -p 30256 | grep zpool.cache | wc -l
833

(and climbing)

I found the culprit: spa_config_write() closes the zpool.cache fd using VOP_CLOSE(), which happens to be... a no-op. Here's the WTF:

#define VOP_CLOSE(vp, f, c, o, cr, ct)  0

Fixed in #992.

Two issues remaining (the EOVERFLOW issue and the vdev_stat_update() issue). I'll run a series of tests again to make sure these two weren't caused by the ones I just fixed. Or maybe I won't be able to reproduce them, since they seem very rare (happened only one time over tens of hours).

dechamps · 2012-09-28T08:07:12Z

Great news, everyone: ZFS rc11 with the fixes in #991 and #992 (and the reguid test disabled) now reliably pass 32 1-hour passes of ztest. Seems like the other issues I mentioned disappeared with these two fixes. I'll close this as soon as the fixes get merged.

dechamps · 2012-09-28T14:11:45Z

Got another one, but it seems to be extremely rare (happened only once over dozens of hours of ztesting):

lt-ztest: ../../module/zfs/vdev.c:584: vdev_free: Assertion `!list_link_active(&vd->vdev_state_dirty_node)' failed.

This has already been reported upstream a week ago (Illumos #3212), so it's not specific to ZFS On Linux.

behlendorf · 2012-09-28T18:41:58Z

Great work. I'll get all of the changes you've been working on reviewed and merged next week.

dechamps · 2012-10-01T08:03:07Z

Doh, my tests over the week-end failed, even with #991, #992, #994, #995 and #997 applied. Got a 18% failure rate (4/22) with three-hour passes. Here are the errors:

child died with signal 11

After 50 minutes. Classic segfault.

ztest: attach (/tmp/zfstest/2012-09-28_164921_24802/zfstest_2012-09-28_164921_24802.52a 115867648, /tmp/zfstest/2012-09-28_164921_24802/zfstest_2012-09-28_164921_24802.spares.0 79167488, 1) returned 0, expected 75

After nearly 3 hours. We already got this one. I thought it was gone, turns out it isn't.

lt-ztest: ../../lib/libzpool/kernel.c:272: Assertion `mp->m_magic == 0x9522f51362a6e326ull (0x0 == 0x9522f51362a6e326)' failed.

After nearly 3 hours. Seems like some sort of mutex corruption.

lt-ztest: ../../module/zfs/arc.c:2872: arc_read: Assertion `!refcount_is_zero(&pbuf->b_hdr->b_refcnt)' failed.

After 2 hours.

@behlendorf: considering these failures seem to only occur on long runs (> 30 minutes), you should be fine if you stick with a default 5-minute ztest in your testing chain, for now.

dechamps · 2012-10-01T14:53:37Z

While ztest was running, I took a look at the files in /tmp and was surprised to see a file named /tmp/zt, which seemed wrong. I investigated and came up with #1001. This is likely to fix some of the remaining issues, especially the "attach() returned 0, expected 75" issue.

dechamps · 2012-10-03T08:05:21Z

Still needs more testing, but right now ztest has been running happily for 25 hours (8 three-hour passes) without a hitch. Judging from other tests I don't think it's 100% stable quite yet, but it's very close.

Currently, thread_create(), when called in userspace, creates a joinable (i.e. not detached thread). This is the pthread default. Unfortunately, this does not reproduce kthreads behavior (kthreads are always detached). In addition, this contradicts the original Solaris code which creates userspace threads in detached mode. These joinable threads are never joined, which leads to a leakage of pthread thread objects ("zombie threads"). This in turn results in excessive ressource consumption, and possible ressource exhaustion in extreme cases (e.g. long ztest runs). This patch fixes the issue by creating userspace threads in detached mode. The only exception is ztest worker threads which are meant to be joinable. Signed-off-by: Brian Behlendorf <[email protected]> Issue #989

Currently, for unknown reasons, VOP_CLOSE() is a no-op in userspace. This causes file descriptor leaks. This is especially problematic with long ztest runs, since zpool.cache is opened repeatedly and never closed, resulting in resource exhaustion (EMFILE errors). This patch fixes the issue by making VOP_CLOSE() do what it is supposed to do. Signed-off-by: Brian Behlendorf <[email protected]> Issue #989

Currently, in several instances (but not all), ztest generates vdev file paths using a statement similar to this: snprintf(path, sizeof (path), ztest_dev_template, ...); This worked fine until 40b84e7, which changed path to be a pointer to the heap instead of an array allocated on the stack. Before this change, sizeof(path) would return the size of the array; now, it returns the size of the pointer instead. As a result, the aforementioned sprintf statement uses the wrong size and truncates the vdev file path to the first 4 or 8 bytes (depending on the architecture). Typically, with default settings, the file path will become "/tmp/zt" instead of "/test/ztest.XXX". This issue only exists in ztest_vdev_attach_detach() and ztest_fault_inject(), which explains why ztest doesn't fail right away. Signed-off-by: Brian Behlendorf <[email protected]> Issue #989

behlendorf · 2012-10-03T22:45:13Z

With the above patches merged ztest is again running fairly reliably for short durations so I've re-enabled it in my test suite. However, there are still a few of recent upstream Illumos ztest fixes we'll want to port.

illumos/illumos-gate@9253d63
illumos/illumos-gate@cd1c8b8

illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e Illumos changeset: 13840:97fd5cdf328a 3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove() Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Justin T. Gibbs <[email protected]> Approved by: Eric Schrock <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Issue openzfs#989 Issue openzfs#1137

3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove() Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Justin T. Gibbs <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e illumos changeset: 13840:97fd5cdf328a https://www.illumos.org/issues/3145 https://www.illumos.org/issues/3212 Ported-by: Brian Behlendorf <[email protected]> Closes openzfs#989 Closes openzfs#1137

…nt (openzfs#989) Bumps [serde_path_to_error](https://github.com/dtolnay/path-to-error) from 0.1.11 to 0.1.12. - [Release notes](https://github.com/dtolnay/path-to-error/releases) - [Commits](dtolnay/path-to-error@0.1.11...0.1.12) --- updated-dependencies: - dependency-name: serde_path_to_error dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

dechamps mentioned this issue Sep 26, 2012

TRIM/UNMAP/DISCARD support for vdevs #924

Closed

dechamps mentioned this issue Sep 27, 2012

Create threads in detached state in userspace #991

Closed

dechamps mentioned this issue Sep 27, 2012

Fix VOP_CLOSE() in userspace #992

Closed

behlendorf mentioned this issue Dec 10, 2012

Port Illumos Gate #3212: ztest: race condition between vdev_online() and spa_vdev_remove() #1137

Closed

behlendorf mentioned this issue Dec 21, 2012

Illumos #3145, #3212 #1160

Closed

behlendorf closed this as completed in 1eb5bfa Jan 8, 2013

kernelOfTruth mentioned this issue Jan 16, 2016

[testing] ABD2: linear/scatter dual typed buffer for ARC ([upstream] rebase master January 13th 2016) #4225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ztest failures #989

ztest failures #989

dechamps commented Sep 26, 2012

dechamps commented Sep 26, 2012

dechamps commented Sep 27, 2012

behlendorf commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 28, 2012

dechamps commented Sep 28, 2012

behlendorf commented Sep 28, 2012

dechamps commented Oct 1, 2012

dechamps commented Oct 1, 2012

dechamps commented Oct 3, 2012

behlendorf commented Oct 3, 2012

ztest failures #989

ztest failures #989

Comments

dechamps commented Sep 26, 2012

dechamps commented Sep 26, 2012

dechamps commented Sep 27, 2012

behlendorf commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 27, 2012

dechamps commented Sep 28, 2012

dechamps commented Sep 28, 2012

behlendorf commented Sep 28, 2012

dechamps commented Oct 1, 2012

dechamps commented Oct 1, 2012

dechamps commented Oct 3, 2012

behlendorf commented Oct 3, 2012