Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XS-specific test failure, WeakMap vs GC? #2670

Closed
warner opened this issue Mar 17, 2021 · 8 comments · Fixed by #2689
Closed

XS-specific test failure, WeakMap vs GC? #2670

warner opened this issue Mar 17, 2021 · 8 comments · Fixed by #2689
Assignees
Labels
bug Something isn't working

Comments

@warner
Copy link
Member

warner commented Mar 17, 2021

Describe the bug

We're seeing an intermittent failure in the test:xs target. The most reliable appearance appears in https://github.com/Agoric/agoric-sdk/pull/2669/checks?check_run_id=2133636064#step:7:830 , in test/unitTests/zcf/test-zcf.js, during the userSeat.getCurrentAllocation from zcf.makeEmptySeatKit test. When present, the STR is to:

  • cd packages/zoe
  • yarn test:xs test/unitTests/zcf/test-zcf.js

and you'll see something like:

TEST ONLY: capturing test data function [""] (){[native code]}
{
  status: 'not ok',
  id: 168,
  message: 'should be deep equal: {"actual":{"length":0,"keys":[]},"expected":{"length":1,"keys":["A"]}}',
  filename: 'test/unitTests/zcf/test-zcf.js',
  label: ''
}
{
  status: 'not ok',
  id: 169,
F test/unitTests/zcf/test-zcf.js userSeat.getCurrentAllocation from zcf.makeEmptySeatKit
  message: 'should be deep equal: {"actual":{"length":0,"keys":[]},"expected":{"length":2,"keys":["A","B"]}}',
  filename: 'test/unitTests/zcf/test-zcf.js',
  label: ''
}
{
  status: 'not ok',
  id: 170,
  message: 'should be deep equal: {"actual":{"length":2,"keys":["A","B"]},"expected":{"length":0,"keys":[]}}',
  filename: 'test/unitTests/zcf/test-zcf.js',
  label: ''
}

When it happens, it's fairly sticky, but random other code changes can make it go away. I strongly suspect a GC bug in XS. I can make it go away by:

  • introduce more memory allocations before the test runs, by adding the ollowing to the first test in the test-zcf.js file:
  for (let i=0; i < 800; i++) {
    const o = harden({});
  }

(allocating 790 or fewer objects preserves the failure, allocating 800 or more makes the failure go away)

  • delete all test functions after the failing one
    • however retaining the immediately-following test, and deleting all the subsequent ones, retains the failure

Our hypothesis is that XS WeakMaps sometimes lose their keys when a GC event happens. @dckc imported a Moddable update which addressed something involving WeakMaps (perhaps where a WeakMap held another WeakMap, or the retention path was root -> key, key2 = weakmap1.get(key1), target = weakmap2.get(key2)), but it didn't seem to fix the problem.

This might suggest a place to explore: build a structure involving weakmaps, allocate enough objects to force a GC event (or uncomment the facility for adding a gc() function to the global object), then inspect the weakmap again.

Platform Environment

I've seen this locally on linux, and on the CI machines (which are also linux).

@warner warner added the bug Something isn't working label Mar 17, 2021
@warner
Copy link
Member Author

warner commented Mar 17, 2021

This appears to be a short reproduction:

// eslint-disable-next-line import/no-extraneous-dependencies
import '@agoric/zoe/tools/prepare-test-env';
// eslint-disable-next-line import/no-extraneous-dependencies
import test from 'ava';

function setup(wm1, wm2) {
  const key1 = harden({});
  const key2 = harden({});
  wm1.set(key1, key2);
  wm2.set(key2, 'target');
  return key1;
}

test(`weakmap`, async t => {
  const wm1 = new WeakMap();
  const wm2 = new WeakMap();
  const key1 = setup(wm1, wm2);

  // try to force a GC
  for (let i=0; i < 10000; i++) {
    const o = harden({});
  }
  t.is(wm2.get(wm1.get(key1)), 'target');
});

@warner
Copy link
Member Author

warner commented Mar 17, 2021

In that example, 4045 loops are necessary to trigger the GC at the right time, 4044 loops allow the test to pass.

@warner
Copy link
Member Author

warner commented Mar 17, 2021

I'm trying a stripped down version that only uses @agoric/install-ses instead of prepare-test-ava and ava, and I can't make it fail yet.

@warner
Copy link
Member Author

warner commented Mar 17, 2021

This started happening on trunk when I landed #2632 , which must have increased some code size beyond some critical threshold: the PR is correct, but it just randomly happens to tickle this XS bug. Now that we have a somewhat-reduced test case, I'm inclined to leave that PR in place (instead of reverting it), and hope that Moddable can fix it soon. While we're in this state, CI is likely to show a failure in the XS test, however this will not prevent those PRs from landing (the XS test was marked as optional). Once we get it fixed, we might change CI to make the XS test mandatory, at least until we run into the next XS bug.

@dckc
Copy link
Member

dckc commented Mar 17, 2021

@warner try out the Makefile in https://gist.github.com/dckc/7aaa585903730744bb92a64950116fd8 ?

@warner
Copy link
Member Author

warner commented Mar 17, 2021

I get status: 'not ok' from that Makefile, looks like a good STR.

@warner
Copy link
Member Author

warner commented Mar 17, 2021

BTW I can remove all the harden() calls and still make it fail, but I have to increase the loop count to a million.

@dckc
Copy link
Member

dckc commented Mar 19, 2021

I checked that the moddable update addresses this.
Moddable-OpenSource/moddable#608 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants