Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port BaseTest to v2 engine (attempt two) #5611

Merged
merged 12 commits into from
May 26, 2018

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Mar 19, 2018

This is a revert of the revert of #4867.

The only interesting differences here are in the topmost commit, which hides a few more APIs that we want to deprecate, and fixes the flakiness that caused the revert (which turned out to be leaked references to the scheduler causing "Too many open file handle" issues).

@stuhood stuhood changed the title WIP: Port BaseTast to v2 engine (attempt two) WIP: Port BaseTest to v2 engine (attempt two) Mar 19, 2018
@stuhood stuhood force-pushed the stuhood/test-base-round-2 branch 3 times, most recently from b0aafa8 to bdc4688 Compare March 19, 2018 05:34
@stuhood stuhood force-pushed the stuhood/test-base-round-2 branch 4 times, most recently from 28ad9c1 to 6347674 Compare April 5, 2018 21:04
@stuhood stuhood changed the title WIP: Port BaseTest to v2 engine (attempt two) Port BaseTest to v2 engine (attempt two) Apr 5, 2018
@stuhood
Copy link
Member Author

stuhood commented Apr 5, 2018

This is now reviewable.

Copy link
Contributor

@baroquebobcat baroquebobcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, though I'd prefer if the src changes that seem unrelated to updating the test class were broken out.

From looking at the rest of the patch, it seems like some of those changes are to bring the error messages in line with the existing assertions, but there are some moves that look like they could be broken out.

@@ -285,7 +285,8 @@ def test_stitch_deps_remote_existing_rev_respected(self):
pkg='prod',
rev='v1.2.3')
pre_execute_files = self.stitch_deps_remote(materialize=True)
self.build_graph.reset() # Force targets to be loaded off disk
self.reset_build_graph(reset_build_files=True) # Force targets to be loaded off disk
print('>>> {}'.format(self.buildroot_files()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xx print?

@@ -253,7 +253,7 @@ def _raise_incorrect_address_error(self, spec_path, wrong_target_name, addresses

if not addresses:
raise self.EmptyBuildFileError(
'{was_not_found_message}, because that directory contains no BUILD files defining addressable entities.'
'{was_not_found_message}, because that directory does not contain any BUILD files defining addressable entities.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's unrelated to this change, but I do like the updated wording better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's related to the change because a bunch of tests are swapping from being run in v1 to being run in v2.

'TargetMacro.Factory instances that construct more than one type are no longer supported. '
'Consider using a `context_aware_object_factory, which can construct any number of '
'different objects.'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a separate change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I no longer remember which test I was fixing when I changed it, so I'd rather land it here.

@@ -319,27 +319,12 @@ def spec_to_globs(address_mapper, specs):
elif type(spec) is AscendantAddresses:
patterns.update(join(f, pattern)
for pattern in address_mapper.build_patterns
for f in _recursive_dirname(spec.directory))
for f in recursive_dirname(spec.directory))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is now used in file invalidation in test_base.

@@ -0,0 +1,573 @@
# coding=utf-8
# Copyright 2014 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have an updated year? Or is it primarily a copy of 2014 code? Looks like the latter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm. Good question. Might as well refresh.

Copy link
Contributor

@baroquebobcat baroquebobcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Member

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

stuhood pushed a commit that referenced this pull request Apr 21, 2018
### Problem

`Core` instances are currently being leaked in cases where `Nodes` have not completed running, and thus hold a `Context` in the closure for their `Future`. The cycle is:
```
Core -> Graph -> Node -> Context -> Core
```

### Solution

Clear all `Node` states when we `Drop` a `Scheduler`, breaking their cycles with the `Core`.

### Result

Fixes #5732 by ensuring that the `Store` held via `Scheduler -> Core -> Store` is dropped. It should also unblock #5611.
stuhood pushed a commit that referenced this pull request Apr 21, 2018
### Problem

`Core` instances are currently being leaked in cases where `Nodes` have not completed running, and thus hold a `Context` in the closure for their `Future`. The cycle is:
```
Core -> Graph -> Node -> Context -> Core
```

### Solution

Clear all `Node` states when we `Drop` a `Scheduler`, breaking their cycles with the `Core`.

### Result

Fixes #5732 by ensuring that the `Store` held via `Scheduler -> Core -> Store` is dropped. It should also unblock #5611.
@stuhood stuhood force-pushed the stuhood/test-base-round-2 branch 4 times, most recently from f383188 to 2e99612 Compare May 11, 2018 04:58
@stuhood stuhood force-pushed the stuhood/test-base-round-2 branch from 2e99612 to 5984fde Compare May 22, 2018 19:56
@stuhood
Copy link
Member Author

stuhood commented May 23, 2018

This now depends on #5859.

@stuhood
Copy link
Member Author

stuhood commented May 23, 2018

There are two remaining failures in this job, both of which (to be confirmed shortly) are caused by SIGSEGVs in tests, which should now be more cleanly exposed by #5859. Searching for the phrase FAILURE: Test was killed in the logs should show which.

As far as I can tell, the failures occur in stable locations, although I'm unable to reproduce either of them locally:

  1. tests/python/pants_test/backend/python/tasks: These are unit tests using TestBase, so it might potentially make sense to see a failure here due to the usage of the new scheduler.
  2. contrib/go/tests/python/pants_test/contrib/go/tasks:integration: These are integration tests (extending PantsRunIntegrationTest), and thus should not actually have been affected by this PR at all (integration tests have run with v2 for a while).

At some point in the past I was briefly able to reproduce these failures locally, and running the tests with ./pants test $target -- -s (to disable all pytest output capturing) exposed a panic due to having run out of file handles. But unfortunately, I have not had luck attempting to reproduce that error today (and it's possible that it has changed to a virtual memory error at this point... unclear).

Having typed all of this out, I think I have one more thing I want to try, which is to see whether we might be able to generically catch panics in the scheduler via https://doc.rust-lang.org/std/panic/fn.catch_unwind.html and convert them directly to error messages. EDIT: I went ahead and did this here: master...twitter:stuhood/catch-unwind ... it's not pretty, because using catch_unwind safely is apparently quite challenging, but it's possible that it will allow us to get the errors in a more useful context?

illicitonion pushed a commit that referenced this pull request May 23, 2018
…5859)

### Problem

While debugging #5611, I determined that when a test was killed by a signal (and thus had a negative return value from `poll()`), `TestRunnerTaskMixin` incorrectly interpreted the failure as a still-living and hung process, and would later try to kill it. There were a few broken tests allowing for this, but primarily: `test_pytest_run_timeout_cant_terminate` was not waiting long enough for the tested-test to start up, and so was killing it before it had its signal handler in place.

Additionally, the usage of a `Timer` + `poll()` to implement test termination meant that we were guaranteed to wait the full `timeout_terminate_wait` time (10 seconds by default) before `poll()`ing to see whether the test had exited.

### Solution

1. Interpret `poll() == None` as a still running process, and `poll() < 0` as a process killed by a signal.
2. Report processes killed by signals before our initial attempt to kill them (which should expose a SIGSEV in #5611).
3. Rather than a timer, use `wait()` between `terminate()` and `kill()`, which avoids unnecessary sleeping.

### Result

Improved debuggability for tests that exit abnormally.
@stuhood stuhood force-pushed the stuhood/test-base-round-2 branch from e930cb7 to 78cd420 Compare May 24, 2018 20:49
@stuhood
Copy link
Member Author

stuhood commented May 26, 2018

The two failing shards are due to the Yarn outage earlier. Going to go for it.

@stuhood stuhood merged commit 2e65f46 into pantsbuild:master May 26, 2018
@stuhood stuhood deleted the stuhood/test-base-round-2 branch May 26, 2018 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants