Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ztest failures #989

Closed
dechamps opened this issue Sep 26, 2012 · 14 comments
Closed

ztest failures #989

dechamps opened this issue Sep 26, 2012 · 14 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Milestone

Comments

@dechamps
Copy link
Contributor

Testing on 37abac6, with the reguid test disabled because of #939, over 23 instances of ztest -P 1200 -T 3600, roughly 50% (12/23) of the instances failed. Here are the 12 errors:

lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.1a 70778880, /tmp/zf 64344436, 1) returned 0, expected 75
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.59a 80216064, /tmp/zf 70778880, 0) returned 24, expected 95
lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
lt-ztest: ../../cmd/ztest/ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.21a 70778880, /tmp/zf 64344436, 1) returned 0, expected 75
lt-ztest: ../../module/zfs/vdev.c:2678: vdev_stat_update: Assertion `spa_sync_pass(spa) == 1' failed.
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.4a 70778880, /tmp/zb 64344436, 1) returned 0, expected 75
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.10a 62390272, /tmp/zb 90701824, 0) returned 24, expected 95
ztest: attach (/tmp/zfstest/2012-09-25_155309_12730/zfstest_2012-09-25_155309_12730.26a 80216064, /tmp/zb 72923694, 1) returned 0, expected 75

I have no idea if this is a regression or not. I'll try to bisect and/or isolate the corrupting test if there is one, but this is likely to be very time-consuming: if there's a 50% chance of success, it means I have to get 5 consecutive successes (= 5 hours without errors) to be 98% sure that the code being tested is good.

@dechamps
Copy link
Contributor Author

Some investigation on the error messages themselves. There are four distinct errors:

ztest.c:5087: Assertion `0 == spa_import(newname, config, ((void *)0), 0) (0x0 == 0x18)' failed.

Means that spa_import() failed with EMFILE.

attach (/tmp/...) returned 24, expected 95

Means that spa_vdev_attach() returned EMFILE, but ztest expected it to return EOPNOTSUPP.

attach (/tmp/...) returned 0, expected 75

Means that spa_vdev_attach() completed successfully, but ztest expected it to fail with EOVERFLOW. A quick look through the code indicates that spa_vdev_attach() should return EOVERFLOW when replacing, if the new vdev isn't large enough:

/*
 * Make sure the new device is big enough.
 */
if (newvd->vdev_asize < vdev_get_min_asize(oldvd))
    return (spa_vdev_exit(spa, newrootvd, txg, EOVERFLOW));
vdev.c:2678: vdev_stat_update: Assertion `spa_sync_pass(spa) == 1' failed.

This indicates that vdev_stat_update() was called with a repair write leaf vdev ZIO as part of a TXG with the ZIO_FLAG_SCAN_THREAD flag set. The assertion failure means that this is not the first SPA sync pass, which either means that spa_sync() wasn't running when the assertion fired, or that the assertion fired after spa_sync() did its first pass.

I think we're probably dealing with multiple separate issues here.

The first two issues seem to be caused by EMFILE (too many open files) errors, which should be simple to diagnose. Note that the following code is run before the test begins (in ztest main()):

struct rlimit rl = { 1024, 1024 };
(void) setrlimit(RLIMIT_NOFILE, &rl);

On my system this is a no-op, because:

$ ulimit -n -S
1024
$ ulimit -n -H
1024

The EMFILE errors might be caused by differences in the number of open files when ztest runs on Linux versus Solaris. It's unclear why it needs more than 1024 files though, since we're dealing with approx. 60 file vdevs at any given time. Maybe we're forgetting to close some fds, so they leak and we end up with EMFILE?

I will continue investigating tomorrow.

@dechamps
Copy link
Contributor Author

This seems really hard to reproduce. I had my 50% failure rate while running 5 ztests at the same time on different trees on a single VM. Now I tested again using 5 different VMs (one for rc7, rc8, rc9, rc10 and rc11) and I only got one failure over 17 runs on each VM (6%), and each time it was spa_import() returning EMFILE. Maybe I should increase the pass duration.

@behlendorf
Copy link
Contributor

If you suspect a resource leak, you might try running ztest through the clang static analysis tool. It might flag a file handle leak. It also sounds like this is a long standing issue.

@dechamps
Copy link
Contributor Author

With a one-hour pass, all my ztest instances failed nearly simultaneously after 40 minutes with the following message:

lt-ztest: ../../lib/libzpool/kernel.c:181: Assertion `pthread_create(&kt->t_tid, &attr, &zk_thread_helper, kt) == 0 (0xb == 0x0)' failed.

Investigating. #36 comes to mind, but I'm running this on a 64-bit system.

@dechamps
Copy link
Contributor Author

I heavily suspect that the pthread_create() issue is due to a leakage of pthread threads objects. The problem is that a lot of threads are neither detached nor joined. ztest correctly uses pthread_join() on its own worker threads, but most ZFS teardown functions (e.g. called as part of kernel_fini()) don't. This means that there is a number of "zombie threads" lingering around and their number increases rapidly until the thread resource pool is exhausted, at which point pthread_create() returns EAGAIN.

I added some debugging printfs and they seem to indicate that ztest consistently fails when more than 33305 pthread objects are lingering around.

In addition, I noticed this in the original Solaris sources:

kthread_t *
zk_thread_create(void (*func)(), void *arg)
{
    thread_t tid;

    VERIFY(thr_create(0, 0, (void *(*)(void *))func, arg, THR_DETACHED,
        &tid) == 0);

    return ((void *)(uintptr_t)tid);
}

Notice the THR_DETACHED flag. ZFS On Linux doesn't do this, hence the zombie threads.

This should be easy to fix. I'm on it.

(Apparently, this isn't an issue in kernel space because kthreads always run detached. Can someone confirm?)

dechamps added a commit to dechamps/zfs that referenced this issue Sep 27, 2012
Currently, thread_create(), when called in userspace, creates a
joinable (i.e. not detached thread). This is the pthread default.

Unfortunately, this does not reproduce kthreads behavior (kthreads
are always detached). In addition, this contradicts the original
Solaris code which creates userspace threads in detached mode.

These joinable threads are never joined, which leads to a leakage of
pthread thread objects ("zombie threads"). This in turn results in
excessive ressource consumption, and possible ressource exhaustion in
extreme cases (e.g. long ztest runs).

This patch fixes the issue by creating userspace threads in detached
mode. The only exception is ztest worker threads which are meant to be
joinable.

See issue openzfs#989.
@dechamps
Copy link
Contributor Author

Testing confirms that #991 fixes the thread issue.

In the mean time, I got another error message related to EMFILE, seems like we are definitly leaking file descriptors somewhere:

 ztest: can't open /tmp/zfstest/2012-09-27_135915_7730/zfstest_2012-09-27_135915_7730.spares.0: Too many open files

dechamps added a commit to dechamps/zfs that referenced this issue Sep 27, 2012
Currently, for unknown reasons, VOP_CLOSE() is a no-op in userspace.
This causes file descriptor leaks. This is especially problematic with
long ztest runs, since zpool.cache is opened repeatedly and never
closed, resulting in resource exhaustion (EMFILE errors).

This patch fixes the issue by making VOP_CLOSE() do what it is supposed
to do.

See issue openzfs#989.
@dechamps
Copy link
Contributor Author

Regarding EMFILE: the issue is easy to reproduce by changing the setrlimit() call to 128 instead of 1024, which makes the run fail much faster (< 1 minute). Seems like all leaking fds point to zpool.cache, as evidenced by this result in the middle of a ztest run (with max 1024):

# lsof -p 32026 | grep zpool.cache
lt-ztest 32026 root   14w   REG   0,17     2928 165775 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   65w   REG   0,17     2928 173295 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   66w   REG   0,17     2928 173296 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   67w   REG   0,17     2928 164692 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   70w   REG   0,17     2928 164693 /tmp/zpool.cache (deleted)
lt-ztest 32026 root   71w   REG   0,17     2928 164694 /tmp/zpool.cache (deleted)
(...)
# lsof -p 30256 | grep zpool.cache | wc -l
833

(and climbing)

I found the culprit: spa_config_write() closes the zpool.cache fd using VOP_CLOSE(), which happens to be... a no-op. Here's the WTF:

#define VOP_CLOSE(vp, f, c, o, cr, ct)  0

Fixed in #992.

Two issues remaining (the EOVERFLOW issue and the vdev_stat_update() issue). I'll run a series of tests again to make sure these two weren't caused by the ones I just fixed. Or maybe I won't be able to reproduce them, since they seem very rare (happened only one time over tens of hours).

@dechamps
Copy link
Contributor Author

Great news, everyone: ZFS rc11 with the fixes in #991 and #992 (and the reguid test disabled) now reliably pass 32 1-hour passes of ztest. Seems like the other issues I mentioned disappeared with these two fixes. I'll close this as soon as the fixes get merged.

@dechamps
Copy link
Contributor Author

Got another one, but it seems to be extremely rare (happened only once over dozens of hours of ztesting):

lt-ztest: ../../module/zfs/vdev.c:584: vdev_free: Assertion `!list_link_active(&vd->vdev_state_dirty_node)' failed.

This has already been reported upstream a week ago (Illumos #3212), so it's not specific to ZFS On Linux.

@behlendorf
Copy link
Contributor

Great work. I'll get all of the changes you've been working on reviewed and merged next week.

@dechamps
Copy link
Contributor Author

dechamps commented Oct 1, 2012

Doh, my tests over the week-end failed, even with #991, #992, #994, #995 and #997 applied. Got a 18% failure rate (4/22) with three-hour passes. Here are the errors:

child died with signal 11

After 50 minutes. Classic segfault.

ztest: attach (/tmp/zfstest/2012-09-28_164921_24802/zfstest_2012-09-28_164921_24802.52a 115867648, /tmp/zfstest/2012-09-28_164921_24802/zfstest_2012-09-28_164921_24802.spares.0 79167488, 1) returned 0, expected 75

After nearly 3 hours. We already got this one. I thought it was gone, turns out it isn't.

lt-ztest: ../../lib/libzpool/kernel.c:272: Assertion `mp->m_magic == 0x9522f51362a6e326ull (0x0 == 0x9522f51362a6e326)' failed.

After nearly 3 hours. Seems like some sort of mutex corruption.

lt-ztest: ../../module/zfs/arc.c:2872: arc_read: Assertion `!refcount_is_zero(&pbuf->b_hdr->b_refcnt)' failed.

After 2 hours.

@behlendorf: considering these failures seem to only occur on long runs (> 30 minutes), you should be fine if you stick with a default 5-minute ztest in your testing chain, for now.

@dechamps
Copy link
Contributor Author

dechamps commented Oct 1, 2012

While ztest was running, I took a look at the files in /tmp and was surprised to see a file named /tmp/zt, which seemed wrong. I investigated and came up with #1001. This is likely to fix some of the remaining issues, especially the "attach() returned 0, expected 75" issue.

@dechamps
Copy link
Contributor Author

dechamps commented Oct 3, 2012

Still needs more testing, but right now ztest has been running happily for 25 hours (8 three-hour passes) without a hitch. Judging from other tests I don't think it's 100% stable quite yet, but it's very close.

behlendorf pushed a commit that referenced this issue Oct 3, 2012
Currently, thread_create(), when called in userspace, creates a
joinable (i.e. not detached thread). This is the pthread default.

Unfortunately, this does not reproduce kthreads behavior (kthreads
are always detached). In addition, this contradicts the original
Solaris code which creates userspace threads in detached mode.

These joinable threads are never joined, which leads to a leakage of
pthread thread objects ("zombie threads"). This in turn results in
excessive ressource consumption, and possible ressource exhaustion in
extreme cases (e.g. long ztest runs).

This patch fixes the issue by creating userspace threads in detached
mode. The only exception is ztest worker threads which are meant to be
joinable.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue #989
behlendorf pushed a commit that referenced this issue Oct 3, 2012
Currently, for unknown reasons, VOP_CLOSE() is a no-op in userspace.
This causes file descriptor leaks. This is especially problematic with
long ztest runs, since zpool.cache is opened repeatedly and never
closed, resulting in resource exhaustion (EMFILE errors).

This patch fixes the issue by making VOP_CLOSE() do what it is supposed
to do.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue #989
behlendorf pushed a commit that referenced this issue Oct 3, 2012
Currently, in several instances (but not all), ztest generates vdev
file paths using a statement similar to this:

    snprintf(path, sizeof (path), ztest_dev_template, ...);

This worked fine until 40b84e7, which
changed path to be a pointer to the heap instead of an array allocated
on the stack. Before this change, sizeof(path) would return the size of
the array; now, it returns the size of the pointer instead.

As a result, the aforementioned sprintf statement uses the wrong size
and truncates the vdev file path to the first 4 or 8 bytes (depending
on the architecture). Typically, with default settings, the file path
will become "/tmp/zt" instead of "/test/ztest.XXX".

This issue only exists in ztest_vdev_attach_detach() and
ztest_fault_inject(), which explains why ztest doesn't fail right away.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue #989
@behlendorf
Copy link
Contributor

With the above patches merged ztest is again running fairly reliably for short durations so I've re-enabled it in my test suite. However, there are still a few of recent upstream Illumos ztest fixes we'll want to port.

illumos/illumos-gate@9253d63
illumos/illumos-gate@cd1c8b8

behlendorf added a commit to behlendorf/zfs that referenced this issue Dec 21, 2012
illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e
Illumos changeset: 13840:97fd5cdf328a

3145 single-copy arc
3212 ztest: race condition between vdev_online() and spa_vdev_remove()

Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Eric Schrock <[email protected]>
Reviewed by: Justin T. Gibbs <[email protected]>
Approved by: Eric Schrock <[email protected]>

Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs#989
Issue openzfs#1137
unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
3145 single-copy arc
3212 ztest: race condition between vdev_online() and spa_vdev_remove()

Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Eric Schrock <[email protected]>
Reviewed by: Justin T. Gibbs <[email protected]>
Approved by: Eric Schrock <[email protected]>

References:
  illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e
  illumos changeset: 13840:97fd5cdf328a
  https://www.illumos.org/issues/3145
  https://www.illumos.org/issues/3212

Ported-by: Brian Behlendorf <[email protected]>
Closes openzfs#989
Closes openzfs#1137
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…nt (openzfs#989)

Bumps [serde_path_to_error](https://github.com/dtolnay/path-to-error) from 0.1.11 to 0.1.12.
- [Release notes](https://github.com/dtolnay/path-to-error/releases)
- [Commits](dtolnay/path-to-error@0.1.11...0.1.12)

---
updated-dependencies:
- dependency-name: serde_path_to_error
  dependency-type: indirect
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Projects
None yet
Development

No branches or pull requests

2 participants