Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_scheduler failure on travis #5732

Closed
wisechengyi opened this issue Apr 20, 2018 · 4 comments
Closed

test_scheduler failure on travis #5732

wisechengyi opened this issue Apr 20, 2018 · 4 comments
Assignees

Comments

@wisechengyi
Copy link
Contributor

wisechengyi commented Apr 20, 2018

travis seems to be broken consistently with https://api.travis-ci.org/v3/job/369223653/log.txt

pyprep/sources/71f273c4ba5f86077e1318bd8762a2add637f5c1/pants_test/engine/test_scheduler.py FAILED [ 57%]
tests/python/pants_test/engine/test_scheduler.py::SchedulerTest::test_managed_resolve <- ../../../../../../usr/lib/python2.7/unittest/case.py SKIPPED [ 71%]
tests/python/pants_test/engine/test_scheduler.py::SchedulerTest::test_scheduler_visualize <- pyprep/sources/71f273c4ba5f86077e1318bd8762a2add637f5c1/pants_test/engine/test_scheduler.py PASSED [ 85%]
tests/python/pants_test/engine/test_scheduler.py::SchedulerTraceTest::test_trace_includes_rule_exception_traceback <- pyprep/sources/71f273c4ba5f86077e1318bd8762a2add637f5c1/pants_test/engine/test_scheduler.py PASSED [100%]

==================== FAILURES ====================
______ SchedulerTest.test_descendant_specs _______

self = <pants_test.engine.test_scheduler.SchedulerTest testMethod=test_descendant_specs>

   def test_descendant_specs(self):
     """Test that Addresses are produced via recursive globs of the 3rdparty/jvm directory."""
     specs = self.parse_specs('3rdparty/jvm::')
     build_request = self.scheduler.execution_request([BuildFileAddresses], [specs])
     ((subject, _), root), = self.build(build_request)
   
     # Validate the root.
     self.assertEqual(specs, subject)
>     self.assertEqual(BuildFileAddresses, type(root.value))
E     AttributeError: 'Throw' object has no attribute 'value'

pants_test/engine/test_scheduler.py:199: AttributeError
generated xml file: /home/travis/build/pantsbuild/pants/.pants.d/test/pytest/tests.python.pants_test.engine.scheduler/junitxml/TEST-tests.python.pants_test.engine.scheduler.xml 

Related change: 305f1db

Specific error:

E   AssertionError: u'Conflicting values produced for' not found in 'Snapshot failed: Throw(Error making env for store at "/Users/stuhood/.cache/pants/lmdb_store/files/e": Cannot allocate memory, "Traceback (no traceback):\\n  <pants native internals>\\nException: Error making env for store at \\"/Users/stuhood/.cache/pants/lmdb_store/files/e\\": Cannot allocate memory")'
@stuhood stuhood self-assigned this Apr 20, 2018
@baroquebobcat
Copy link
Contributor

I've hit this locally too

@stuhood
Copy link
Member

stuhood commented Apr 20, 2018

My current hypothesis is that this is because we're exhausting virtual memory by creating multiple schedulers in the unit tests by trying to claim 1 TB of virtual memory per scheduler. If you run any of the tests individually they succeed, so I don't think that this is likely to affect end users with or without pantsd, since they will only have 1 live store per run.

But the unit test(s) appear to be 1) lazily dropping schedulers, 2) possibly not deallocating the Store when the scheduler is dropped. I'm going to attempt to fix forward rather than reverting... if I can't figure it out by end of day, I'll revert.

@stuhood
Copy link
Member

stuhood commented Apr 20, 2018

Alright, after fixing the lazy dropping of schedulers (1), the next issue (2) seems to be because Scheduler holds an Arc<Core>: Nodes end up with references to the Arc<Core> in their context... ie, we have a cycle between Core -> Graph -> Node -> Context -> Core.

I'll see whether breaking this is feasible.

@stuhood
Copy link
Member

stuhood commented Apr 20, 2018

Breaking it is feasible, and it looks like that was the only cycle, because the Core is dropped immediately without that one.

The only downside is that this patch looks difficult to cherry-pick. I don't really see an alternative other than cherry-picking it though.

On the bright side, the difficult part of the cherry-pick is 100% in the rust code, and thus easily detected as a compile error.

EDIT: Found a much more cherry-pickable alternative. Updated #5733.

stuhood pushed a commit that referenced this issue Apr 21, 2018
### Problem

`Core` instances are currently being leaked in cases where `Nodes` have not completed running, and thus hold a `Context` in the closure for their `Future`. The cycle is:
```
Core -> Graph -> Node -> Context -> Core
```

### Solution

Clear all `Node` states when we `Drop` a `Scheduler`, breaking their cycles with the `Core`.

### Result

Fixes #5732 by ensuring that the `Store` held via `Scheduler -> Core -> Store` is dropped. It should also unblock #5611.
stuhood pushed a commit that referenced this issue Apr 21, 2018
### Problem

`Core` instances are currently being leaked in cases where `Nodes` have not completed running, and thus hold a `Context` in the closure for their `Future`. The cycle is:
```
Core -> Graph -> Node -> Context -> Core
```

### Solution

Clear all `Node` states when we `Drop` a `Scheduler`, breaking their cycles with the `Core`.

### Result

Fixes #5732 by ensuring that the `Store` held via `Scheduler -> Core -> Store` is dropped. It should also unblock #5611.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants