`./pants --changed ..` does not work for deleted targets #382

jsirois · 2014-07-23T17:13:15Z

The WhatChanged task has 2 modes:

list changed files
list the targets owning the changed files (what targets were affected by the change)

Mode 2 uses the data from mode 1 and for each changed file looks for the target that owns the file. This check is done in the current workspace and so no target can ever own deleted files from 1. It seems the task either needs to back up to the prior change safely, check, then roll forward - scary, or else have a way to run payload source globbing against path names without real fs path checks. In otherwords globs would have to run in a predicate mode, "do you select this path" as opposed to the current mode of select all matching paths from the filesystem.

tejal29 · 2014-07-23T17:55:06Z

I had some initial groundwork done for this in the Twitter Internal ticket. Will add you as a watcher there.

areitz · 2015-03-11T04:42:16Z

I agree with John's analysis. Mode #1 (using the --changed-files argument) works fine in the face of deleted files. It is mode #2 which is the issue. In more detail, this line of code:

https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/core/tasks/what_changed.py#L61

Does two things: generates a list of changed files (which works fine in the case of deleted files), and for each changed file, tries to map it to a target via self._mapper.target_addresses_for_source(changedfile). Digging into this one, pants goes into _find_owners():

https://github.com/pantsbuild/pants/blob/master/src/python/pants/base/lazy_source_mapper.py#L72

Where in a loop, it is analyzing the path of the changed file, and trying to find the nearest BUILD file. Once a candidate BUILD file has been selected, it calls target.sources_relative_to_buildroot():

https://github.com/pantsbuild/pants/blob/master/src/python/pants/base/lazy_source_mapper.py#L100

This is the method that examines the actual files on disk, and maps them to a target.

As John stated, we can either attempt some git-fu to allow target.sources_relative_to_buildroot() to run in the previous SHA (before the file was deleted), or we can create some new machinery to thread the actual changed file path into the code that is looking at the sources that are attached to a target.

areitz · 2015-03-11T05:07:24Z

I just thought of an interesting hack. What if, pants tracked all of the deleted files separately. Then, before this call:

https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/core/tasks/what_changed.py#L61

pants could touch all deleted files back into existence, run the check, and on the other side, re-delete them again. It would be doing some unnecessary work (adding 0-byte files & then removing them), but it would avoid changing the guts of lazy_source_mapper.py.

jsirois · 2015-03-12T18:44:31Z

I don't find that hack any less scary than the hack I described of rolling back, checking and rolling forward. The predicate mode is currently defeated only by pants/base/validation.py:13 afaict which turns a Fileset into a list, throwing away any predicate mode that might be added to Fileset.

baroquebobcat · 2015-03-17T21:34:06Z

I think if we made globs lazy, we could do this more easily. If globs / target sources were lazy, then targets would have a list of the globs they use for looking up files. We could then compare those globs to the git changed file list to derive the targets that had changed.

baroquebobcat · 2015-07-24T15:52:42Z

Another issue with the deleted / changed files detection is that if a BUILD file has a target removed, what changed doesn't know to check for dependees of the targets that no longer exist to ensure they are still valid.

fkorotkov · 2015-08-07T15:44:12Z

Seems we have 2 different cases:

Source/Resource files was changed/removed
BUILD file was changed/removed

For 1 I like Nick's proposal. Make globs/rglobs lazy and use globs in LazySourceMapper instead of mapping from existing files to targets.

For 2 it seems we need to walk the build graph anyway. If a BUILD file was change we should automatically mark all targets in it as changed and walk the graph to find targets that have dependencies on targets that were removed from the BUILD file. We should add them to ChangeCalculator#_directly_changed_targets.

Changes that needs to be implemented:

Make globs lazy
Use globs in LazySourceMapper(Ask a glob if it claims a changed file)
Respect removals in BUILD files(Check build graph for absent dependencies to the BUILD file)

areitz · 2015-08-07T16:36:49Z

Does change #3 require walking the entire build graph?

fkorotkov · 2015-08-07T16:37:59Z

@areitz only in case of changes to BUILD files. I don't see other way to find targets that depend on a removed one.

fkorotkov · 2015-08-10T18:17:09Z

Addressed 1 and 2 in #1944

stuhood · 2018-03-28T01:22:15Z

I've updated the title to account for the current state of the world. This is somewhat mitigated by a workaround that forces the equivalent of ./pants list :: in order to compute --changed-include-dependees. This has the effect of detecting dependents of deleted targets and failing, but without --changed-include-dependees, no changes will be detected at all.

A holistic solution to this problem is likely possible in the new engine (and perhaps not even that hard), by adding an explicit mode of operation that applies a filesystem overlay for the length of a request... that would allow us to delta the graph and... do something with it. TBD.

### Problem #5579 broke detection of deleted targets. ### Solution As described in #382 (mega classic!), it might be possible to more deeply integrate change detection into the v2 engine itself to compute a delta on the graph. But for now we defer a deeper solution and simply ensure that we fail for deleted targets by transitively expanding targets. Adds a test to cover the behaviour. ### Result Due to fully hydrating targets, this represents a linear performance regression from #5579: the runtime of `--changed-include-dependees=transitive` for a small set of roots used to be slightly lower than the runtime for `./pants list ::`, because the operation that occurred "on the entire graph" was computing `HydratedStructs`, rather than computing `TransitiveHydratedTargets`. The impact for exactly that step is constant, and fairly high: ``` # before: DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =Collection.of(Struct)) completed. DEBUG] computed 1 nodes in 1.709688 seconds. there are 8858 total nodes. # after: DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =TransitiveHydratedTargets) completed. DEBUG] computed 1 nodes in 2.989497 seconds. there are 15916 total nodes. ``` ... but the impact on overall runtime is dependent on the count of targets that are transitively affected, because for all affected targets, we're going to need to compute transitive roots anyway. So for the example change from #5579 which affects 567 targets: ``` time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l # before: real 0m4.877s user 0m4.081s sys 0m1.068s # after real 0m5.294s user 0m4.487s sys 0m1.142s ``` For a change impacting only 14 targets the difference is slightly more pronounced: ``` $ time ./pants --changed-diffspec=f35e1e6fb1cdf45fcb5080cfe567bdbae8060125^..f35e1e6 --changed-include-dependees=transitive list | wc -l # before: real 0m4.279s user 0m3.376s sys 0m1.011s # after: real 0m4.954s user 0m4.284s sys 0m1.120s ```

As described in pantsbuild#382 (mega classic!), it might be possible to more deeply integrate change detection into the v2 engine itself to compute a delta on the graph. But for now we defer a deeper solution and simply ensure that we fail for deleted targets by transitively expanding targets. Adds a test to cover the behaviour. Due to fully hydrating targets, this represents a linear performance regression from pantsbuild#5579: the runtime of `--changed-include-dependees=transitive` for a small set of roots used to be slightly lower than the runtime for `./pants list ::`, because the operation that occurred "on the entire graph" was computing `HydratedStructs`, rather than computing `TransitiveHydratedTargets`. The impact for exactly that step is constant, and fairly high: ``` DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =Collection.of(Struct)) completed. DEBUG] computed 1 nodes in 1.709688 seconds. there are 8858 total nodes. DEBUG] Root Select(Collection(dependencies=(DescendantAddresses(directory=u''),)), =TransitiveHydratedTargets) completed. DEBUG] computed 1 nodes in 2.989497 seconds. there are 15916 total nodes. ``` ... but the impact on overall runtime is dependent on the count of targets that are transitively affected, because for all affected targets, we're going to need to compute transitive roots anyway. So for the example change from pantsbuild#5579 which affects 567 targets: ``` time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l real 0m4.877s user 0m4.081s sys 0m1.068s real 0m5.294s user 0m4.487s sys 0m1.142s ``` For a change impacting only 14 targets the difference is slightly more pronounced: ``` $ time ./pants --changed-diffspec=f35e1e6fb1cdf45fcb5080cfe567bdbae8060125^..f35e1e6 --changed-include-dependees=transitive list | wc -l real 0m4.279s user 0m3.376s sys 0m1.011s real 0m4.954s user 0m4.284s sys 0m1.120s ```

stuhood · 2019-05-03T00:14:18Z

So, after #7581, I've come to the conclusion that using --changed-include-dependees=transitive makes this a non-issue, due to that codepath now being eager, and due to test coverage on that fact.

While we would not report a deleted target in changed, we would successfully detect any dependents of a deleted target and error for them. And I think that that is all that can be reasonably expected for this mode of operation.

stuhood changed the title ~~./pants goal changed does not work for deleted files~~ ./pants goal changed does not work for deleted targets Mar 28, 2018

stuhood changed the title ~~./pants goal changed does not work for deleted targets~~ ./pants --changed .. does not work for deleted targets Mar 28, 2018

stuhood mentioned this issue Mar 28, 2018

Fail for deleted-but-depended-on targets in changed #5636

Merged

stuhood closed this as completed May 3, 2019

stuhood mentioned this issue Oct 12, 2021

Should a deleted file still cause its original target generator to run with --changed-since? #13232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`./pants --changed ..` does not work for deleted targets #382

`./pants --changed ..` does not work for deleted targets #382

jsirois commented Jul 23, 2014

tejal29 commented Jul 23, 2014

areitz commented Mar 11, 2015

areitz commented Mar 11, 2015

jsirois commented Mar 12, 2015

baroquebobcat commented Mar 17, 2015

baroquebobcat commented Jul 24, 2015

fkorotkov commented Aug 7, 2015

areitz commented Aug 7, 2015

fkorotkov commented Aug 7, 2015

fkorotkov commented Aug 10, 2015

stuhood commented Mar 28, 2018 •

edited

Loading

stuhood commented May 3, 2019

./pants --changed .. does not work for deleted targets #382

./pants --changed .. does not work for deleted targets #382

Comments

jsirois commented Jul 23, 2014

tejal29 commented Jul 23, 2014

areitz commented Mar 11, 2015

areitz commented Mar 11, 2015

jsirois commented Mar 12, 2015

baroquebobcat commented Mar 17, 2015

baroquebobcat commented Jul 24, 2015

fkorotkov commented Aug 7, 2015

areitz commented Aug 7, 2015

fkorotkov commented Aug 7, 2015

fkorotkov commented Aug 10, 2015

stuhood commented Mar 28, 2018 • edited Loading

stuhood commented May 3, 2019

`./pants --changed ..` does not work for deleted targets #382

`./pants --changed ..` does not work for deleted targets #382

stuhood commented Mar 28, 2018 •

edited

Loading