Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid capturing Snapshots for previously digested codegen outputs #7241

Merged

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Feb 13, 2019

Problem

As described in #7229, re-capturing Snapshots on noop runs in SimpleCodegenTask caused a performance regression for larger codegen usecases.

Solution

Remove features from the python-exposed Snapshot API that would prevent them from being roundtrippable via a Digest (including preservation of canonical paths, and preservation of literal ordering... ie. #5802), add support for optimistically loading a Snapshot from a Digest, and then reuse code to dump/load a Digest for the codegen directories to skip Snapshot capturing in cases where the Digest had already been stored.

Result

Very large codegen noop usecase runtimes reduced from ~15.2 seconds to ~3.05 seconds. Fixes #7229, and fixes #5802.

@stuhood stuhood force-pushed the stuhood/restore-snapshots-from-digests branch 2 times, most recently from aa70f07 to f4118f8 Compare February 13, 2019 06:49
@stuhood
Copy link
Member Author

stuhood commented Feb 13, 2019

I haven't added any new tests yet, but this should be reviewable. The individual commits should be useful to review independently, and mostly follow the plan from #7229.

@stuhood stuhood force-pushed the stuhood/restore-snapshots-from-digests branch from f4118f8 to 588c6ad Compare February 13, 2019 07:45
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Snapshots are ordering-sensitive, and this is relied on in a few places (specifically wire protobuf generation, IIRC). This Digest -> Snapshot hydration doesn't preserve order, which may lead to sneaky bugs.

I think that means we should probably block merging this on fixing #5802?

Otherwise, looks great :)

digest: Digest,
f: F,
) -> BoxFuture<Vec<T>, String> {
let f = Arc::new(Mutex::new(f));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you actually need a Mutex here as F is just a Fn; would need a mutex if it's a FnMut or a FnOnce.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed one more bound on F to remove the Mutex: F: Sync, but that indeed works. Thanks!

.map(move |dir_node| {
let subdir_digest = try_future!(dir_node.get_digest().into());
let path = path_so_far.join(dir_node.get_name());
store.walk_helper(subdir_digest, path, f.clone(), accumulator.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is all futuresy, accumulator may be filled in in a random order, right? Would be worth writing down either what ordering guarantees we do make, or explicitly stating that we make no ordering guarantees. It looks like all of the consumers are fine with no ordering guarantees, but we should make that explicit.

Copy link
Member Author

@stuhood stuhood Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. And I believe that was already the case for the existing methods. Will expand the comment and stabilize all callers.

@@ -230,7 +234,8 @@ def execute(self):
with self.context.new_workunit(name='execute', labels=[WorkUnitLabel.MULTITOOL]):
vts_to_sources = OrderedDict()
for vt in invalidation_check.all_vts:
synthetic_target_dir = self.synthetic_target_dir(vt.target, vt.results_dir)

synthetic_target_dir = self.synthetic_target_dir(vt.target, vt.current_results_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between results_dir and current_results_dir? Why is this changing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug in this area, since go was actually overriding synthetic_target_dir. d934b2b makes the usage and explanation-of-usage of these methods more consistent.

@stuhood stuhood force-pushed the stuhood/restore-snapshots-from-digests branch from 588c6ad to ca28196 Compare February 14, 2019 00:59
@stuhood
Copy link
Member Author

stuhood commented Feb 14, 2019

I was having trouble running GraphTest locally to actually run/fix/remove the ordering-specific tests, so this now depends on #7243.

@stuhood stuhood force-pushed the stuhood/restore-snapshots-from-digests branch from ca28196 to 7170956 Compare February 14, 2019 22:01
@stuhood
Copy link
Member Author

stuhood commented Feb 14, 2019

#7243 has landed. I added one more commit here to remove Snapshot literal-ordering (explicitly sorting PathStats during Snapshot creation instead to stabilize them).

The commits are still independently reviewable.

@stuhood stuhood force-pushed the stuhood/restore-snapshots-from-digests branch from 7170956 to 47cb0ee Compare February 15, 2019 05:21
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great :) Thanks for putting this together!

relative_sources.add(relative_source)
return fast_relpath(source, source_root.path)

if target.payload.get_field_value('ordered_sources'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we should probably not have this option, and just treat wire as always having ordered_sources=True; this just feels like a footgun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with that is that it will fail for something like globs('*'), where we cannot apply ordering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is that most consumers of this tool use it in ways that don't care about ordering. If I'm wrong there, can reevaluate.

@@ -535,7 +601,7 @@ mod tests {
.unwrap(),
232,
),
path_stats: unsorted_path_stats,
path_stats: sorted_path_stats,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsorted_path_stats is now unused, I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in the from_path_stats constructor to validate that they end up sorted.

@stuhood stuhood merged commit fedc91c into pantsbuild:master Feb 15, 2019
@stuhood stuhood deleted the stuhood/restore-snapshots-from-digests branch February 15, 2019 18:01
stuhood pushed a commit that referenced this pull request Feb 18, 2019
)

As described in #7229, re-capturing `Snapshots` on noop runs in `SimpleCodegenTask` caused a performance regression for larger codegen usecases.

Remove features from the python-exposed `Snapshot` API that would prevent them from being roundtrippable via a `Digest` (including preservation of canonical paths, and preservation of literal ordering... ie. #5802), add support for optimistically loading a `Snapshot` from a `Digest`, and then reuse code to dump/load a `Digest` for the codegen directories to skip `Snapshot` capturing in cases where the `Digest` had already been stored.

Very large codegen noop usecase runtimes reduced from `~15.2` seconds to `~3.05` seconds. Fixes #7229, and fixes #5802.
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this pull request Feb 27, 2019
commit efaae09
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 23:41:55 2019 -0700

    Add debugging to release.sh for linux ucs2

    It looks like the bootstrap part now works completely as intended! It's consistently using UCS2.

    But the release script is failing for some reason. Turn on debugging to wake up to hopefully some insight tomorrow morning..

commit 4cb6cae
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 18:27:22 2019 -0700

    Squashed commit of the following:

    commit 9c754dc
    Merge: 3f30d39 7819724
    Author: Eric Arellano <[email protected]>
    Date:   Tue Feb 26 17:17:34 2019 -0700

        Merge branch 'master' of github.com:pantsbuild/pants into pex-interpreter-constraints

    commit 3f30d39
    Author: Eric Arellano <[email protected]>
    Date:   Tue Feb 26 17:12:01 2019 -0700

        Fix issue with compatibility_or_constraints() returning a tuple

        add_interpreter_constraints() expects a str, so we must unpack the tuple when calling it.

    commit ff17f73
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 22:46:16 2019 -0700

        Revert "Constrain ci.sh to the exact Python interpreter version"

        This reverts commit 887a8ef.

        This change is necessary to fix the original motivation for this PR, but it does not really belong in this PR anymore. Instead, it should be in the Py2 ABI PR (7235). This PR should be kept more generic, and there is no logical connection to the changes being made with ci.sh, beyond that original motivating problem.

    commit 6b07abd
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 21:49:52 2019 -0700

        Remove bad import

        My bad for not catching this before pushing.

    commit 2c6fdb0
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 21:37:41 2019 -0700

        Generify solution by using compatibility_or_constraints()

        Instead of applying a bandaid for only `./pants binary`, John proposed fixing the issue with our PexBuilderWrapper itself. So, we use `compatibility_or_constrains()`, which will first try to return the target's compatibility, else will return the Python Setup subystem's value.

        The wrapper still is not ideal and John proposes killing add_interpreter_constraint() and add_interpreter_constraints_from() to instead automatically be setting the interpreter constraints from the targets graph. This PR does not make that change for the scope, but this should be noted.

    commit b71f164
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 21:03:46 2019 -0700

        Fix typo

    commit 3bca020
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 20:31:14 2019 -0700

        Add global interpreter constraints to Python binary creation

    commit 887a8ef
    Author: Eric Arellano <[email protected]>
    Date:   Fri Feb 22 21:05:50 2019 -0700

        Constrain ci.sh to the exact Python interpreter version

        Earlier we allowed the patch version to float. We discovered in pantsbuild#7235 with the CI run https://travis-ci.org/pantsbuild/pants/jobs/497208431#L891 that PEX was building with 2.7.10 but running with 2.7.13.

        The fix will require having Pants pass interpreter constraints to Pex. Even with that change though, the CI shard would still have the issue because the constraint would be floating.

        Now, we have the constraint be exact.

commit 0d25bcc
Merge: 373ffee 7819724
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 17:41:10 2019 -0700

    Merge branch 'master' of github.com:pantsbuild/pants into py2-wheels-abi-specified

commit 373ffee
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 17:39:25 2019 -0700

    Configure PEX_PYTHON on linux UCS2

    It looks like ./pants binary now completely builds the PEX with 2.7.15 / UCS2! But then when trying to run `./pants.pex -V`, it resolves the runtime interpreter to 2.7.13 :/

    I'm not sure how the runtime interpreter selection is supposed to work, but there is an env var PEX_PYTHON that allows passing a path to the value you always want to use. So, we use this for now.

commit 7819724
Author: Alex Schmitt <[email protected]>
Date:   Tue Feb 26 13:42:36 2019 -0800

    Allow tasks to opt-in to target filtering (pantsbuild#7283)

    Followup to pantsbuild#7275 following [discussion](pantsbuild#7275 (comment)) that the target filter was being applied to tasks that do not support it (e.g. tasks that don't access targets via `get_targets()`)

    This adds a class property to `Task` that allows subclasses to effectively opt-in to the new behavior - and sets that flag to `True` for `fmt` and `lint` tasks.

commit 3bf2d28
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 12:11:17 2019 -0700

    Add Pyenv back to Travis path

    Even though we directly pass $PY, later processes expect the 2.7.15 interpreter to be discoverable so Pyenv must be on the path.

commit 2166efe
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 09:52:10 2019 -0700

    Set $PY to disambiguate which Py2.7 version to use

    Interpreter constraints don't work, as previously noted.

commit 8dd215d
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 09:43:42 2019 -0700

    Allow user to set $PY in ci.sh

    If not set, will resolve to the Python version being used. We should allow the user to set it though in cases like this PR, where we may have to set $PY to a very specific interpreter path.

commit 49fe576
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 09:41:35 2019 -0700

    Stop hardcoding interpreter constraints

    Let ci.sh determine them based on the $PY value. While working on the Linux UCS2 shard, it became clear that what really matters is which interpreter $PY (i.e. `python2`) resolves to. Setting the interpreter constraints will not impact what this resolves to nor how we bootstrap Pants. So, we should focus on setting $PY and let the interpreter constraints be resolved accordingly.

commit c3dd843
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 08:49:44 2019 -0700

    Improve wording.

commit f810849
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 08:43:42 2019 -0700

    Move interpreter constraints to Docker env entry

    Docker does not pull in external env vars. Instead, we must specify this in the Dockerfile.

    This change has added benefit that it moves all of the Py2 logic into the Dockerfile out of .travis.yml, and leaves .travis.yml solely to call the Dockerfile.

commit b9efbf0
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 01:03:21 2019 -0700

    Also set PANTS_PYTHON_SETUP_INTERPRETER_CONSTRAINTS for OSX

    Even though it was resolving correctly already, explicit is better than implicit.

commit 8402c1f
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 01:01:03 2019 -0700

    Ensure Linux UCS2 always uses Py2.7.15 (UCS2)

    It was not enough to install 2.7.15 and use Pyenv global. The 2.7.13 (UCS2) interpreter was still being recognized.

commit 225f153
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 26 00:53:56 2019 -0700

    Remove bad merge lines

commit 1be9e90
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 25 22:41:04 2019 -0700

    Squashed commit of the following:

    commit 6b07abd
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 21:49:52 2019 -0700

        Remove bad import

        My bad for not catching this before pushing.

    commit 2c6fdb0
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 21:37:41 2019 -0700

        Generify solution by using compatibility_or_constraints()

        Instead of applying a bandaid for only `./pants binary`, John proposed fixing the issue with our PexBuilderWrapper itself. So, we use `compatibility_or_constrains()`, which will first try to return the target's compatibility, else will return the Python Setup subystem's value.

        The wrapper still is not ideal and John proposes killing add_interpreter_constraint() and add_interpreter_constraints_from() to instead automatically be setting the interpreter constraints from the targets graph. This PR does not make that change for the scope, but this should be noted.

    commit b71f164
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 21:03:46 2019 -0700

        Fix typo

    commit 3bca020
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 20:31:14 2019 -0700

        Add global interpreter constraints to Python binary creation

    commit 887a8ef
    Author: Eric Arellano <[email protected]>
    Date:   Fri Feb 22 21:05:50 2019 -0700

        Constrain ci.sh to the exact Python interpreter version

        Earlier we allowed the patch version to float. We discovered in pantsbuild#7235 with the CI run https://travis-ci.org/pantsbuild/pants/jobs/497208431#L891 that PEX was building with 2.7.10 but running with 2.7.13.

        The fix will require having Pants pass interpreter constraints to Pex. Even with that change though, the CI shard would still have the issue because the constraint would be floating.

        Now, we have the constraint be exact.

commit f530843
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 25 22:39:29 2019 -0700

    Move unit tests above wheel building shards

    Now that we have 4 wheel building shards (soon 6)—and 2 of them require bootstrapping Pants—we move unit tests above to get more immediate feedback on if the PR is good or not. We still keep them high up relative to others because several major workflows require wheel building output.

commit 223541e
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 25 22:35:02 2019 -0700

    Fix Linux UCS2 using 2.7.13 with UCS4 instead of UCS2 sometimes

    There were two versions of 2.7.13 installed on the system, so Pants would sometimes choose an unintended version and would be inconsistent.

commit 8428376
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 25 21:03:22 2019 -0700

    Fix typo from squashed pex-constraints

commit 48ef4dd
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 25 20:44:37 2019 -0700

    Squashed commit of the following:

    commit 3bca020
    Author: Eric Arellano <[email protected]>
    Date:   Mon Feb 25 20:31:14 2019 -0700

        Add global interpreter constraints to Python binary creation

    commit 887a8ef
    Author: Eric Arellano <[email protected]>
    Date:   Fri Feb 22 21:05:50 2019 -0700

        Constrain ci.sh to the exact Python interpreter version

        Earlier we allowed the patch version to float. We discovered in pantsbuild#7235 with the CI run https://travis-ci.org/pantsbuild/pants/jobs/497208431#L891 that PEX was building with 2.7.10 but running with 2.7.13.

        The fix will require having Pants pass interpreter constraints to Pex. Even with that change though, the CI shard would still have the issue because the constraint would be floating.

        Now, we have the constraint be exact.

commit 78a1aa9
Merge: 04c4ee0 26b0179
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 25 20:44:15 2019 -0700

    Merge branch 'master' of github.com:pantsbuild/pants into py2-wheels-abi-specified

commit 26b0179
Author: Danny McClanahan <[email protected]>
Date:   Mon Feb 25 16:23:46 2019 -0800

    try defining algebraic Executables in the native backend to compose more readable toolchains (pantsbuild#6855)

    ### Problem

    As can be seen in `native_toolchain.py` in e.g. pantsbuild#6800, it is often difficult to follow changes to the native backend, especially changes which modify the order of resources such as library and include directories for our linkers and compilers. This is because we have been patching together collections of these resources "by hand", without applying any higher-level structure (explicitly constructing each `path_entries` and `library_dirs` field for every executable, every time, for example). This was done to avoid creating abstractions that might break down due to the rapidly evolving code. We can now take the step of more clearly defining the relationships between the toolchains we construct hierarchically.

    ### Solution

    - Add an `ExtensibleAlgebraic` mixin which allows declaring list fields which can be immutably modified one at a time with `prepend_field` and `append_field`, or all at once with `sequence`.
    - Add a `for_compiler` method to `BaseLinker` to wrap the specific process required to prep our linker for a specific compiler.
    - Apply all of the above in `native_toolchain.py`.

    ### Result

    The compilers and linkers provided by `@rule`s in `native_toolchain.py` are composed with consistent verbs from `ExtensibleAlgebraic`, leading to increased readability.

commit cd4c773
Author: Nora Howard <[email protected]>
Date:   Mon Feb 25 17:22:08 2019 -0700

    [zinc-compile] fully adopt enum based switches for hermetic/not; test coverage (pantsbuild#7268)

    @cosmicexplorer wrote this as part of pantsbuild#7227. This patch is pulling out just the Zinc changes, with a few differences. I also added a new test for hermetic failures and some additional assertions to ensure that the right message is being communicated on failures, while doing that I discovered that hermetic/non-hermetic appear to produce error messages on different streams.

commit c095f3b
Author: Alex Schmitt <[email protected]>
Date:   Mon Feb 25 10:49:57 2019 -0800

    Update TargetFiltering args for applying criteria (pantsbuild#7280)

    Update the class to take the criteria in the constructor, and helper methods take the targets against which to apply said criteria.

    From suggestion https://github.com/pantsbuild/pants/pull/7275\#discussion_r259554586

commit a87a01b
Author: Danny McClanahan <[email protected]>
Date:   Fri Feb 22 18:53:52 2019 -0800

    don't do a pants run on osx (pantsbuild#7278)

    ### Problem

    Fixes pantsbuild#7247, catching a case that was otherwise missed.

    ### Solution

    - Don't do a `./pants run` on osx using the gnu toolchain in testing, as it doesn't work yet.

    ### Result

    As noted in pantsbuild#7249, it's strange that that PR passes but the nightly job fails -- it may be nondeterministic.

commit a86639e
Author: Alex Schmitt <[email protected]>
Date:   Fri Feb 22 16:52:32 2019 -0800

    Add filtering subsystem to permit skipping targets by tags (pantsbuild#7275)

    This subsystem is responsible for handling options meant to exclude targets from specific tasks

    The application of the logic itself is contained in the TargetFiltering class - which currently only handles excluding targets with provided tags and can be expanded upon for additional filtering options.

commit b34d66f
Author: John Sirois <[email protected]>
Date:   Fri Feb 22 16:46:32 2019 -0800

    Prepare the 1.15.0.dev1 release. (pantsbuild#7277)

commit 8069653
Author: Danny McClanahan <[email protected]>
Date:   Fri Feb 22 11:41:58 2019 -0800

    cache python tools in ~/.cache/pants (pantsbuild#7236)

    ### Problem

    This runs for (on my laptop) about 16 seconds every time I do a `clean-all`:
    ```
    22:27:23 00:02   [native-compile]
    22:27:23 00:02     [conan-prep]
    22:27:23 00:02       [create-conan-pex]
    22:27:39 00:18     [conan-fetch]
    ```

    It doesn't seem like we need to be putting this tool in the task workdir as the python requirements list is pretty static. Conan in particular will be instantiated by invoking almost every goal, and it is a nontrivial piece of software to resolve each time.

    Also, we aren't mixing in interpreter identity to the generated pex filename, which is a bug that has so far gone undetected: see pantsbuild#7236 (comment).

    ### Solution

    - Take the `stable_json_sha1()` of the requirements of each python tool generated by `PythonToolPrepBase` to generate a fingerprinted pex filename.
    - Stick it in the pants cachedir so it doesn't get blown away by a clean-all.
    - Add an `--interpreter-constraints` option to pex tools (where previously the repo's `--python-setup-interpreter-constraints` were implicitly used).
    - Ensure the selected interpreter identity is mixed into the fingerprinted filename.
    - Add a test for the pex filename fingerprinting and that the pex can be successfully executed for python 2 and 3 constraints.

    ### Result

    A significant amount of time spent waiting after clean builds is removed, and pex tools can have their own interpreter constraints as necessary.

commit 04c4ee0
Author: Eric Arellano <[email protected]>
Date:   Fri Feb 22 12:36:56 2019 -0700

    Move debugging to proper location

    It's failing before the release.sh script is even called. The bootstrap command is what's failing.

commit e502f58
Author: Eric Arellano <[email protected]>
Date:   Fri Feb 22 12:24:33 2019 -0700

    Fix linux ucs4 stage being overriden to cron instead of test

commit 2cd72e4
Author: Eric Arellano <[email protected]>
Date:   Fri Feb 22 09:47:16 2019 -0700

    Add back logging to debug osx ucs4

commit c9e1650
Merge: 7da092b 4097052
Author: Eric Arellano <[email protected]>
Date:   Fri Feb 22 09:40:38 2019 -0700

    Merge branch 'master' of github.com:pantsbuild/pants into py2-wheels-abi-specified

commit 4097052
Author: Stu Hood <[email protected]>
Date:   Thu Feb 21 13:49:16 2019 -0800

    Prepare 1.14.0rc3 (pantsbuild#7274)

commit ea33c36
Author: Nora Howard <[email protected]>
Date:   Wed Feb 20 12:43:32 2019 -0700

    [jvm-compile] fix typo: s/direcotry/directory/ (pantsbuild#7265)

    Fix a typo in `jvm_compile.py`

commit 761849e
Author: Danny McClanahan <[email protected]>
Date:   Wed Feb 20 11:38:39 2019 -0800

    Fix nightly cron ctypes enum failure (pantsbuild#7249)

    ### Problem

    Resolves pantsbuild#7247. `ToolchainVariant('gnu')` does not in fact `== 'gnu'`.

    ### Solution

    - Use `.resolve_for_enum_variant()` instead of comparing with equality in that one failing test (I missed this in pantsbuild#7226, I fixed the instance earlier in the file though).
    - Raise an error when trying to use `==` on an enum to avoid this from happening again.
    - Note that in Python 3 it appears that `__hash__` must be explicitly implemented whenever `__eq__` is overridden, and this appears undocumented.

    ### Result

    The nightly cron job should be fixed, and enums are now a little more difficult to screw up.

    # Open Questions
    It's a little unclear why this didn't fail in CI -- either the test was cached, or some but not all travis osx images are provisioned with the correct dylib, causing a nondeterministic error, or something else?

commit 0e6a144
Author: Daniel Wagner-Hall <[email protected]>
Date:   Wed Feb 20 04:14:23 2019 +0000

    Node is Display (pantsbuild#7264)

    Use standard traits, rather than our own methods which happen to do the same thing.

commit 904e3f3
Author: Ekaterina Tyurina <[email protected]>
Date:   Wed Feb 20 01:03:42 2019 +0000

    Allow passing floating point numbers from rust to python (pantsbuild#7259)

    PR allows passing float points from Rust to Python.
    ```
    externs::store_f64(v: f64)
    ```

commit 7da092b
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 17:03:59 2019 -0700

    Fix platform default shards bootstrapping

    - OSX UCS2 shard no longer was setting RUN_PANTS_FROM_PEX anywhere
    - Linux UCS4 had its before_script entry being override by travis_image.

commit 38e1cf7
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 15:35:15 2019 -0700

    Remove unncessary RUN_PANTS_FROM_PEX=0

    Now that we don't it in the base_build_wheels_env, we don't need to set this.

    This was actually causing a failure. ./pants only checks if the env var is set, and not what its value is.

commit 621137e
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 15:16:04 2019 -0700

    Fix improper call to {osx,linux}_config_env

    They don't exist apparently.

commit 0da5f91
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 15:13:11 2019 -0700

    Stop pulling down PEX

    Use the {osx,linux}_config images rather than {osx,linux}_test_config images.

commit 513cd50
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 14:35:19 2019 -0700

    release.sh still needs to run from PEX

    Change how we handle env var to not use PEX when first bootstrapping, then use it in the followup release.sh command.

commit dc36d94
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 14:29:19 2019 -0700

    Also deduplicate Pyenv env vars for OSX

    Realized this is a better design while working on pantsbuild#7261.

commit ad45b2d
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 14:25:28 2019 -0700

    Revert "Turn on PEX_VERBOSE for OSX ucs4 shard"

    This reverts commit 28b9e8b.

commit fb9ef9b
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 14:24:21 2019 -0700

    Revert "Run PEX with -v*9"

    This reverts commit edf81ef.

commit ab534e2
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 14:23:28 2019 -0700

    Bootstrap Pants when using new Python install

    We can't use the PEX from AWS because the Python versions do not match up.

commit 0b59e46
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 12:36:08 2019 -0700

    Fix gcc no input file issues by passing dummy file

    For Linux UCS2, the build was failing due to gcc complaining it could not find any files. This reproduced locally when running `./pants3 setup-py --run="bdist_wheel --py-limited-api cp36" src/python/pants:pants-packaged` on OSX.

    John suggested and gave the code snippet to pass a dummy file so this no longer happens. Thanks John!

commit edf81ef
Author: Eric Arellano <[email protected]>
Date:   Tue Feb 19 09:33:12 2019 -0700

    Run PEX with -v*9

    PEX_VERBOSE only impacts runtime output. -vvv... impacts build time output.

commit 7c17c0a
Merge: 6ecb550 222bc11
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 18 17:47:13 2019 -0700

    Merge branch 'master' of github.com:pantsbuild/pants into py2-wheels-abi-specified

commit 222bc11
Author: Daniel Wagner-Hall <[email protected]>
Date:   Mon Feb 18 21:12:30 2019 +0000

    Revert remote execution from tower to grpcio (pantsbuild#7256)

    We're seeing weird broken connection errors with tower.

    We'll probably just chuck some retries in and be happy, but for now, let's get back to a more stable time.

    * Revert "Remove unused operation wrapper (pantsbuild#7194)"

    This reverts commit 9400024.

    * Revert "Switch operation getting to tower (pantsbuild#7108)"

    This reverts commit 0375b30.

    * Revert "Remote execution uses tower-grpc to start executions (pantsbuild#7049)"

    This reverts commit 28683c7.

commit 3da2165
Author: Ekaterina Tyurina <[email protected]>
Date:   Mon Feb 18 17:16:48 2019 +0000

    Scheduler returns metrics as a dictionary instead of a tuple of tuples (pantsbuild#7255)

    ### Problem
    Scheduler returns metrics as a tuple of tuples (key, value). And later this tuple is transformed into a dictionary.
    It is considered to use metrics to return to python part zipkin span info and dict type will be more convenient.

    ### Solution
    Scheduler_metrics returns a store_dict  instead of store_tuple.

commit 6ecb550
Author: Eric Arellano <[email protected]>
Date:   Sat Feb 16 19:57:25 2019 -0800

    Fix Dockerfile copyright year and add comment

commit 8d69dc2
Author: Eric Arellano <[email protected]>
Date:   Sat Feb 16 19:18:45 2019 -0800

    Improve naming of Build Wheels shards

    Make it more explicit how shard is configured / which wheel building config it has. Whereas for most shards we specify if they run with Py36 vs Py27 in parantheses, it is actually very important we make explicit the wheel building config, as it impacts which wheels we end up producing.

commit 170e9c8
Author: Eric Arellano <[email protected]>
Date:   Sat Feb 16 19:10:24 2019 -0800

    Install OpenSSL on OSX UCS4 shard

    The shard was failing when trying to build cryptography from an sdist because it could not find openssl. So, we now explicitly install it with Brew and modify the env vars to expose it. This is identical to how we install Py3 on OSX.

commit e87f567
Author: Eric Arellano <[email protected]>
Date:   Sat Feb 16 18:14:57 2019 -0800

    Fix typo in Dockerfile path

    The folder path is py27, not py2!

commit cb50136
Author: Stu Hood <[email protected]>
Date:   Fri Feb 15 20:04:46 2019 -0800

    Prepare 1.14.0.rc2 instead. (pantsbuild#7251)

commit 28b9e8b
Author: Eric Arellano <[email protected]>
Date:   Fri Feb 15 13:38:21 2019 -0800

    Turn on PEX_VERBOSE for OSX ucs4 shard

    I can't reproduce the same failure locally. John was suspicious why for the problematic dependencies the sdist isn't being used to build the wheel when the bdist is not released for cp27mu. Hopefully this provides some insight.

commit f7472aa
Merge: 1e83f37 1ece461
Author: Eric Arellano <[email protected]>
Date:   Fri Feb 15 13:15:26 2019 -0800

    Merge branch 'master' of github.com:pantsbuild/pants into py2-wheels-abi-specified

commit 1ece461
Author: Stu Hood <[email protected]>
Date:   Fri Feb 15 10:07:23 2019 -0800

    Prepare 1.14.0 (pantsbuild#7246)

commit fedc91c
Author: Stu Hood <[email protected]>
Date:   Fri Feb 15 10:01:40 2019 -0800

    Avoid capturing Snapshots for previously digested codegen outputs (pantsbuild#7241)

    ### Problem

    As described in pantsbuild#7229, re-capturing `Snapshots` on noop runs in `SimpleCodegenTask` caused a performance regression for larger codegen usecases.

    ### Solution

    Remove features from the python-exposed `Snapshot` API that would prevent them from being roundtrippable via a `Digest` (including preservation of canonical paths, and preservation of literal ordering... ie. pantsbuild#5802), add support for optimistically loading a `Snapshot` from a `Digest`, and then reuse code to dump/load a `Digest` for the codegen directories to skip `Snapshot` capturing in cases where the `Digest` had already been stored.

    ### Result

    Very large codegen noop usecase runtimes reduced from `~15.2` seconds to `~3.05` seconds. Fixes pantsbuild#7229, and fixes pantsbuild#5802.

commit 594f91f
Author: Ekaterina Tyurina <[email protected]>
Date:   Fri Feb 15 01:54:51 2019 +0000

    Add checks if values of flags zipkin-trace-id and zipkin-parent-id are valid (pantsbuild#7242)

    ### Problem
    When pants are called with flags zipkin-trace-id and zipkin-parent-id an assertion error is raised if the values of the flag are of the wrong format. The error is not informative.

    ### Solution
    Checks of values of flags zipkin-trace-id and zipkin-parent-id are added with a better error explanation. Users of the pants are asked to use 16-character  or 32-character hex string.
    Also, tests are added for these checks.

commit bc0536c
Author: Stu Hood <[email protected]>
Date:   Thu Feb 14 13:59:26 2019 -0800

    Remove deprecated test classes (pantsbuild#7243)

    ### Problem

    `BaseTest` and the v1-aware `TaskTestBase` are long deprecated. Additionally, the `GraphTest` classes used v2 APIs that existed before `TestBase` came around.

    ### Solution

    Delete deprecated classes, and port `GraphTest` to `TestBase`.

commit e4456fd
Author: Danny McClanahan <[email protected]>
Date:   Tue Feb 12 18:09:35 2019 -0800

    fix expected pytest output for pytest integration after pinning to 3.0.7 (pantsbuild#7240)

    ### Problem

    pantsbuild#7238 attempted to fix an upstream pytest issue (and therefore unbreak our CI) by pinning the default pytest version in our pytest subsystem to `pytest==3.0.7`. This worked, but broke a few of our other tests which relied on specific pytest output, and master is broken now (sorry!).

    I also hastily merged pantsbuild#7226, which introduced another test failure, which I have fixed. These are the only failing tests, and these all now pass locally on my laptop.

    ### Solution

    - Fix expected pytest output in pytest runner testing.

    ### Result

    I think it's still a good idea to string match pytest output unless we suddenly have to change pytest versions drastically like this again.

commit e382541
Author: Danny McClanahan <[email protected]>
Date:   Tue Feb 12 12:54:49 2019 -0800

    Canonicalize enum pattern matching for execution strategy, platform, and elsewhere (pantsbuild#7226)

    ### Problem

    In pantsbuild#7092 we added [`NailgunTask#do_for_execution_strategy_variant()`](https://github.com/cosmicexplorer/pants/blob/70977ef064305b78406a627e07f4dae3a60e4ae4/src/python/pants/backend/jvm/tasks/nailgun_task.py#L31-L43), which allowed performing more declarative execution strategy-specific logic in nailgunnable tasks. Further work with rsc will do even more funky things with our nailgunnable task logic, and while we will eventually have a unified story again for nailgun and subprocess invocations with the v2 engine (see pantsbuild#7079), for now having this check that we have performed the logic we expect all execution strategy variants is very useful.

    This PR puts that pattern matching logic into `enum()`: https://github.com/pantsbuild/pants/blob/84cf9a75dbf68cf7126fe8372ab9b2f48720464d/src/python/pants/util/objects.py#L173-L174, among other things.

    **Note:** `TypeCheckError` and other exceptions are moved up from further down in `objects.py`.

    ### Solution

    - add `resolve_for_enum_variant()` method to `enum` which does the job of the previous `do_for_execution_strategy_variant()`
    - make the native backend's `Platform` into an enum.
    - stop silently converting a `None` argument to the enum's `create()` classmethod into its`default_value`.
    - add `register_enum_option()` helper method to register options based on enum types.

    ### Result

    We have a low-overhead way to convert potentially-tricky conditional logic into a checked pattern matching-style interface with `enum()`, and it is easier to register enum options.

commit d0432df
Author: Danny McClanahan <[email protected]>
Date:   Tue Feb 12 12:50:59 2019 -0800

    Pin pytest version to avoid induced breakage from more-itertools transitive dep (pantsbuild#7238)

    ### Problem

    A floating transitive dependency of pytest, `more-itertools`, dropped support for python 2 in its 6.0.0 release -- see pytest-dev/pytest#4770. This is currently breaking our and our users' CI: see https://travis-ci.org/pantsbuild/pants/jobs/492004734. We could pin that dep, but as mentioned in pytest-dev/pytest#4770 (comment), pinning transitive deps of pytest would impose requirement constraints on users of pytest in pants.

    ### Solution

    - Pin `pytest==3.0.7` for now.

    ### Result

    python tests should no longer be broken.

commit 3d7a295
Author: Danny McClanahan <[email protected]>
Date:   Mon Feb 11 22:02:21 2019 -0800

    add a TypedCollection type constraint to reduce boilerplate for datatype tuple fields (pantsbuild#7115)

    ### Problem

    *Resolves pantsbuild#6936.*

    There's been a [TODO in `pants.util.objects.Collection`](https://github.com/pantsbuild/pants/blob/c342fd3432aa0d73e402d2db7e013ecfcc76e9c8/src/python/pants/util/objects.py#L413) for a while to typecheck datatype tuple fields.

    pantsbuild#6936 has some thoughts on how to do this, but after realizing I could split out `TypeConstraint` into a base class and then introduce `BasicTypeConstraint` for type constraints which only act on the type, I think that ticket is invalidated as this solution is much cleaner.

    ### Solution

    - Split out logic for basic type checking (without looking at the object itself) into a `BasicTypeConstraint` class, which `Exactly` and friends inherit from.
    - Create the `TypedCollection` type constraint, which checks that its argument is iterable and then validates each element of the collection with a `BasicTypeConstraint` constructor argument.
      - Note that `TypedCollection` is a `TypeConstraint`, but not a `BasicTypeConstraint`, as it has to inspect the actual object object to determine whether each element matches the provided `BasicTypeConstraint`.
    - Move `pants.util.objects.Collection` into `src/python/pants/engine/objects.py`, as it is specifically for engine objects.
    - Use `TypedCollection` for the `dependencies` field of the datatype returned by `Collection.of()`.

    ### Result

    - `datatype` consumers and creators no longer have to have lots of boilerplate when using collections arguments, and those arguments can now be typechecked and made hashable for free!

    ### TODO in followup: `wrapper_type`

    See pantsbuild#7172.

commit 1e83f37
Author: Eric Arellano <[email protected]>
Date:   Mon Feb 11 17:37:31 2019 -0700

    Setup UCS2 vs UCS4 travis shards

    We must now build pantsbuild.pants with both unicode versions for Py2. So, we introduce Py2 to do this.

    We use Pyenv to install interpreter with the relevant encoding where necessary. See https://stackoverflow.com/questions/38928942/build-python-as-ucs-4-via-pyenv.

commit a02dde1
Author: Eric Arellano <[email protected]>
Date:   Thu Feb 7 11:33:53 2019 -0700

    Add ext_modules to BUILD entry

    This is going to be necessary to release Py3 with abi3.

    The line however results in the issue that this PR is going to aim to fix: now Python 2 will be built with abi `cp27m` or `cp27mu`, whereas earlier it was `none`. To test, try running `./pants setup-py --run="bdist_wheel --python-tag cp27 --plat-name=linux_x86_64" src/python/pants:pants-packaged` followed by `ls -l dist/pantsbuild.pants-1.14.0rc0/dist/`.

    Note also that ext_modules is deprecated in favor of distutils.Extension. We use this now as a temporary workaround until we add support for Extension.

commit 874ce34
Author: Chris Livingston <[email protected]>
Date:   Mon Feb 11 13:27:50 2019 -0500

    Validate and maybe prune interpreter cache run over run (pantsbuild#7225)

    * Purge stale interpreters from Interpreter Cache

commit 5d28cf8
Author: Daniel Wagner-Hall <[email protected]>
Date:   Fri Feb 8 14:58:23 2019 +0000

    Prep for 1.15.0.dev0 (pantsbuild#7230)

commit 84cf9a7
Author: Danny McClanahan <[email protected]>
Date:   Wed Feb 6 13:56:35 2019 -0800

    deprecate implicit usage of binary_mode=True and mode='wb' in dirutil methods (pantsbuild#7120)

    ### Problem

    *Resolves pantsbuild#6543. See also [the python 3 migration project](https://github.com/pantsbuild/pants/projects/10).*

    There has been [a TODO](https://github.com/pantsbuild/pants/blob/6fcd7f7d0f8787910cfac01ec2895cdbd5cee66f/src/python/pants/util/dirutil.py#L109) pointing to pantsbuild#6543 to deprecate the `binary_mode` argument to `pants.util.dirutil.safe_file_dump()`, which wasn't canonicalized with a `deprecated_conditional`. This is because `binary_mode` doesn't quite make sense the way it does with file read methods `read_file()` and `maybe_read_file()`, because a file can be appended to as well as truncated (as opposed to reads).

    Separately, defaulting `binary_mode=True` for file read methods means more explicit conversions to unicode in a python 3 world,

    ### Solution

    - Deprecate the `binary_mode` argument to `safe_file_dump()`, as well as not explicitly specifying the `mode` argument.
      - `safe_file_dump()` now also defaults `payload=''`.
      - Also deprecate not specifying the `mode='wb'` argument in `safe_file_dump()`.
    - Deprecate not explicitly specifying the `binary_mode` argument in `{maybe_,}read_file()` and `temporary_file()` so that it can be given a default of unicode when pants finishes [migrating to python 3](https://github.com/pantsbuild/pants/projects/10) -- see pantsbuild#7121.
    - Update usages of `safe_file_dump()` across the repo.

    ### Result

    Pants plugins will see a deprecation warning if they fail to explicitly specify the `binary_mode` for file read methods in preparation for switching the default to unicode for [the python 3 switchover](https://github.com/pantsbuild/pants/projects/10). Several ambiguities in the `safe_file_dump()` method are alleviated.

    pantsbuild#7121 covers the eventual switchover to a default of `binary_mode=False` after the python 3 migration completes.

commit 224c2a0
Author: Borja Lorente <[email protected]>
Date:   Wed Feb 6 19:21:10 2019 +0000

    Make Resettable lazy again (pantsbuild#7222)

    ### Problem

    In the context of pantsbuild#6817, there is a logging issue that manifests when the daemon forks. In particular, in `fork_context`, both the daemon and the client reset some services that implement `Resettable`. Some of those services log at startup, when the client hasn't had time to reconfigure its logging to stderr, and therefore all these startup logs are intermingled in the `pantsd.log` file.

    ### Solution

    If we make `Resettable` lazy again, since we are ensured to only enter `fork_context` under a lock, the logging can only happen when the client has had time to configure its loggers.

    ### Result

    `Resettable` is now lazy. `Resettable::get()` is now implemented in terms of `Resettable::with`.

commit f281642
Author: Danny McClanahan <[email protected]>
Date:   Wed Feb 6 10:26:07 2019 -0800

    fix _raise_deferred_exc() (pantsbuild#7008)

    ### Problem

    The `_raise_deferred_exc(self)` method in `daemon_pants_runner.py` hasn't ever been tested. As a result, it causes an easily fixable error: the issue can be easily reproduced if you register an option twice in the same task and then run with pantsd enabled (you get the wrong exception, because `exc_type` isn't needed to construct the exception again, that's what `exc_value` is for).

    ### Solution

    - Appropriately destructure `sys.exc_info()` (if that was what was used to populate `self._deferred_exception`) and re-raise the exception with its original traceback.

    ### Result

    This error is fixed, but not yet tested -- see pantsbuild#7220.

commit f0a1a9f
Author: Stu Hood <[email protected]>
Date:   Wed Feb 6 07:17:10 2019 -0800

    Prepare 1.14.0rc1 (pantsbuild#7221)

commit b6f045d
Author: Daniel Wagner-Hall <[email protected]>
Date:   Wed Feb 6 10:47:30 2019 +0000

    Resolve all platforms from all python targets (pantsbuild#7156)

    Don't just use the default configured targets.

    This means that _all_ transitive 3rdparty python will need to be
    resolvable in _all_ platforms in any target in the graph. This is not
    ideal (we really want to be doing per-root resolves), but because we
    currently do one global resolve, this is a decent fit.

commit b08c1fd
Author: Ekaterina Tyurina <[email protected]>
Date:   Wed Feb 6 10:40:35 2019 +0000

    Add flag reporting-zipkin-sample-rate (pantsbuild#7211)

    ### Problem
    In the current implementation, every time the pants command is run with zipkin tracing turned on all the zipkin traces will be collected. It is not very convenient when the number of runs is very big.

    ### Solution
    Possibility to set the sample rate will allow us to have the number of traces that fits the constraints of Zipkin server.

    ### Result
    A flag `reporting-zipkin-sample-rate` was added that sets the sample rate at which to sample Zipkin traces. If flags `reporting-zipkin-trace-id` and `reporting-zipkin-parent-id` are set the sample rate will always be 100.0 (no matter what is set in `reporting-zipkin-sample-rate` flag).

commit 95638d3
Author: Stu Hood <[email protected]>
Date:   Tue Feb 5 16:19:17 2019 -0800

    Only lint the direct sources of a linted target. (pantsbuild#7219)

    ### Problem

    The thrift linter currently redundantly lints the transitive dependencies of each target, leading to repetitive errors, and larger tool invokes than necessary.

    ### Solution

    Lint only the directly owned sources of a target, and expand unit tests.

commit 121f98c
Author: Stu Hood <[email protected]>
Date:   Tue Feb 5 16:05:11 2019 -0800

    Do not render the coursier workunit unless it will run. (pantsbuild#7218)

    ### Problem

    Currently the `bootstrap-coursier` workunit is rendered repeatedly, although it only actually runs once.

    ### Solution

    Only render the workunit if it will run.

commit b2f5a49
Author: Marcin Podolski <[email protected]>
Date:   Wed Feb 6 00:17:09 2019 +0100

    documentation for grpcio (pantsbuild#7155)

    ### Problem

    Documentation for grpcio generation tool

commit f73f112
Author: Daniel Wagner-Hall <[email protected]>
Date:   Tue Feb 5 15:33:43 2019 +0000

    Skip flaky test (pantsbuild#7209)

    Relates to pantsbuild#7199

commit f0bb0da
Author: Stu Hood <[email protected]>
Date:   Mon Feb 4 20:51:34 2019 -0800

    Only run master-dependent commithooks on master (pantsbuild#7214)

    ### Problem

    See pantsbuild#7213: some commit hooks are only valid in a context where master is present (and should otherwise be skipped).

    ### Solution

    Move more hooks that reference `master` under the check for the presence of `master`.

    ### Result

    Fixes pantsbuild#7213, and unblocks further iteration on the `1.14.x` stable branch.
stuhood added a commit that referenced this pull request Jun 2, 2019
### Problem

Capturing digests for `coursier` and `ivy` artifacts is currently optional, presumably due to the performance impact of not being cached run over run.

### Solution

#7241 added support for stashing `Digest`s next to digested files, and support for providing a `Digest` hint that will skip snapshotting if the `Digest` is already stored. We use that support here to always-snapshot 3rdparty inputs (and to skip re-snapshotting if a `Digest` was stashed).

### Result

One fewer option, and slightly better performance when re-running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High noop time for SimpleCodegenTask Remove ordering from PathGlobs / Snapshot
2 participants