-
-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./pants --changed ..
does not work for deleted targets
#382
Comments
I had some initial groundwork done for this in the Twitter Internal ticket. Will add you as a watcher there. |
I agree with John's analysis. Mode #1 (using the Does two things: generates a list of changed files (which works fine in the case of deleted files), and for each changed file, tries to map it to a target via Where in a loop, it is analyzing the path of the changed file, and trying to find the nearest BUILD file. Once a candidate BUILD file has been selected, it calls This is the method that examines the actual files on disk, and maps them to a target. As John stated, we can either attempt some git-fu to allow |
I just thought of an interesting hack. What if, pants tracked all of the deleted files separately. Then, before this call: pants could |
I don't find that hack any less scary than the hack I described of rolling back, checking and rolling forward. The predicate mode is currently defeated only by pants/base/validation.py:13 afaict which turns a Fileset into a list, throwing away any predicate mode that might be added to Fileset. |
I think if we made globs lazy, we could do this more easily. If globs / target sources were lazy, then targets would have a list of the globs they use for looking up files. We could then compare those globs to the git changed file list to derive the targets that had changed. |
Another issue with the deleted / changed files detection is that if a BUILD file has a target removed, what changed doesn't know to check for dependees of the targets that no longer exist to ensure they are still valid. |
Seems we have 2 different cases:
For 1 I like Nick's proposal. Make globs/rglobs lazy and use globs in LazySourceMapper instead of mapping from existing files to targets. For 2 it seems we need to walk the build graph anyway. If a BUILD file was change we should automatically mark all targets in it as changed and walk the graph to find targets that have dependencies on targets that were removed from the BUILD file. We should add them to ChangeCalculator#_directly_changed_targets. Changes that needs to be implemented:
|
Does change #3 require walking the entire build graph? |
@areitz only in case of changes to BUILD files. I don't see other way to find targets that depend on a removed one. |
Addressed 1 and 2 in #1944 |
./pants goal changed
does not work for deleted files./pants goal changed
does not work for deleted targets
./pants goal changed
does not work for deleted targets./pants --changed ..
does not work for deleted targets
I've updated the title to account for the current state of the world. This is somewhat mitigated by a workaround that forces the equivalent of A holistic solution to this problem is likely possible in the new engine (and perhaps not even that hard), by adding an explicit mode of operation that applies a filesystem overlay for the length of a request... that would allow us to delta the graph and... do something with it. TBD. |
### Problem #5579 broke detection of deleted targets. ### Solution As described in #382 (mega classic!), it might be possible to more deeply integrate change detection into the v2 engine itself to compute a delta on the graph. But for now we defer a deeper solution and simply ensure that we fail for deleted targets by transitively expanding targets. Adds a test to cover the behaviour. ### Result Due to fully hydrating targets, this represents a linear performance regression from #5579: the runtime of `--changed-include-dependees=transitive` for a small set of roots used to be slightly lower than the runtime for `./pants list ::`, because the operation that occurred "on the entire graph" was computing `HydratedStructs`, rather than computing `TransitiveHydratedTargets`. The impact for exactly that step is constant, and fairly high: ``` # before: DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =Collection.of(Struct)) completed. DEBUG] computed 1 nodes in 1.709688 seconds. there are 8858 total nodes. # after: DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =TransitiveHydratedTargets) completed. DEBUG] computed 1 nodes in 2.989497 seconds. there are 15916 total nodes. ``` ... but the impact on overall runtime is dependent on the count of targets that are transitively affected, because for all affected targets, we're going to need to compute transitive roots anyway. So for the example change from #5579 which affects 567 targets: ``` time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l # before: real 0m4.877s user 0m4.081s sys 0m1.068s # after real 0m5.294s user 0m4.487s sys 0m1.142s ``` For a change impacting only 14 targets the difference is slightly more pronounced: ``` $ time ./pants --changed-diffspec=f35e1e6fb1cdf45fcb5080cfe567bdbae8060125^..f35e1e6 --changed-include-dependees=transitive list | wc -l # before: real 0m4.279s user 0m3.376s sys 0m1.011s # after: real 0m4.954s user 0m4.284s sys 0m1.120s ```
As described in pantsbuild#382 (mega classic!), it might be possible to more deeply integrate change detection into the v2 engine itself to compute a delta on the graph. But for now we defer a deeper solution and simply ensure that we fail for deleted targets by transitively expanding targets. Adds a test to cover the behaviour. Due to fully hydrating targets, this represents a linear performance regression from pantsbuild#5579: the runtime of `--changed-include-dependees=transitive` for a small set of roots used to be slightly lower than the runtime for `./pants list ::`, because the operation that occurred "on the entire graph" was computing `HydratedStructs`, rather than computing `TransitiveHydratedTargets`. The impact for exactly that step is constant, and fairly high: ``` DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =Collection.of(Struct)) completed. DEBUG] computed 1 nodes in 1.709688 seconds. there are 8858 total nodes. DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =TransitiveHydratedTargets) completed. DEBUG] computed 1 nodes in 2.989497 seconds. there are 15916 total nodes. ``` ... but the impact on overall runtime is dependent on the count of targets that are transitively affected, because for all affected targets, we're going to need to compute transitive roots anyway. So for the example change from pantsbuild#5579 which affects 567 targets: ``` time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l real 0m4.877s user 0m4.081s sys 0m1.068s real 0m5.294s user 0m4.487s sys 0m1.142s ``` For a change impacting only 14 targets the difference is slightly more pronounced: ``` $ time ./pants --changed-diffspec=f35e1e6fb1cdf45fcb5080cfe567bdbae8060125^..f35e1e6 --changed-include-dependees=transitive list | wc -l real 0m4.279s user 0m3.376s sys 0m1.011s real 0m4.954s user 0m4.284s sys 0m1.120s ```
So, after #7581, I've come to the conclusion that using While we would not report a deleted target in changed, we would successfully detect any dependents of a deleted target and error for them. And I think that that is all that can be reasonably expected for this mode of operation. |
The WhatChanged task has 2 modes:
Mode 2 uses the data from mode 1 and for each changed file looks for the target that owns the file. This check is done in the current workspace and so no target can ever own deleted files from 1. It seems the task either needs to back up to the prior change safely, check, then roll forward - scary, or else have a way to run payload source globbing against path names without real fs path checks. In otherwords globs would have to run in a predicate mode, "do you select this path" as opposed to the current mode of select all matching paths from the filesystem.
The text was updated successfully, but these errors were encountered: