Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a --query option for unifying target selection #7350

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Mar 10, 2019

Problem

It would be neat to be able to do away with splitting target selection across multiple options and tasks (--owner-of, --changes-since, dependencies, filter, ...), and it would also be great to avoid having to perform multiple pants invocations to extract the final desired target list when running these target selection commands in our internal CI. #7346 was made to discuss one way this could end up looking like.

Please make sure to see #6501 for the impetus and prior discussion of unifying target selection.

Solution

  • Create a v2 ruleset for converting ChangedRequest into an OwnersRequest to allow for a single engine invocation instead of multiple session.product_request()s.
  • Create query.py with QueryComponent objects that have rulesets resolving them to BuildFileAddresses and know how to parse themselves from shlexed strings (the parsing method is subject to change, see ./pants --query for unifying target selection mechanisms to improve performance #7346 (comment)).
  • Add a global --query option and parse it into QueryComponents.
  • Create owner-of, changes-since, and changes-from-diffspec query functions to mimic the behavior of --owner-of and --changes-* options.

Result

Instead of running ./pants --changes-since=HEAD list, the user can run ./pants --query='changes-since HEAD' list, similarly for owner-of. The first step of the plan being discussed in #7346 is laid out, laying the groundwork for composing multiple queries in a followup PR.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks exciting :) Most important question here is around @rule caching. I also have some composition questions, but I'll take those to #7346 :)

src/python/pants/engine/legacy/query.py Outdated Show resolved Hide resolved
src/python/pants/engine/legacy/graph.py Outdated Show resolved Hide resolved
src/python/pants/scm/subsystems/changed.py Outdated Show resolved Hide resolved
@stuhood
Copy link
Member

stuhood commented Mar 14, 2019

Thanks! Will try to take a look before the weekend.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! I don't have any objections to this, although I think that actually implementing a form of pipelining will be necessary to make it truly useful. Would recommend getting the second PR "mostly ready" before landing this one.

src/python/pants/scm/subsystems/changed.py Outdated Show resolved Hide resolved
src/python/pants/scm/subsystems/changed.py Outdated Show resolved Hide resolved
src/python/pants/option/global_options.py Outdated Show resolved Hide resolved
@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Apr 5, 2019

Would recommend getting the second PR "mostly ready" before landing this one.

@stuhood like this? #7356

@stuhood
Copy link
Member

stuhood commented Apr 5, 2019

Would recommend getting the second PR "mostly ready" before landing this one.

@stuhood like this? #7356

Heh, yes, roughly. Out of draft and with no showstoppers would be good.

@cosmicexplorer
Copy link
Contributor Author

Sounds like a plan!

@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch 2 times, most recently from acfba59 to 81efd62 Compare May 20, 2019 05:30
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch from 81efd62 to 811be9a Compare August 6, 2019 02:58
@stuhood stuhood mentioned this pull request Aug 15, 2019
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch 4 times, most recently from 06cd7f2 to 073db8e Compare October 28, 2019 16:48
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch 2 times, most recently from ead735d to dff3194 Compare November 9, 2019 23:19
wisechengyi pushed a commit that referenced this pull request Nov 10, 2019
### Problem

We would like to be able to provide binary versions of all dependencies of specified root targets to an IDE with `export-classpath`, without providing binary versions of the target roots themselves (because the IDE will be indexing those itself, as source files). `--query` from #7350 might help here, but until that's developed a little further, it would be great to have this functionality now.

@olafurpg requested this change to help integrate pants with metals.

### Solution

- Add `--transitive-only` option to the `export-classpath` task which avoids publishing jars for target roots, and instead only pulls in their dependencies!
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch from dff3194 to 0929282 Compare November 27, 2019 10:08
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch from 666974d to 8d29b35 Compare January 8, 2020 00:04
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch from 407f9cf to 2126b73 Compare March 24, 2020 08:00
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks: this continues to be a really interesting patch, particularly now that the new Target API has begun to take form (cc @Eric-Arellano). Adjusting this to consume that API rather than target adaptors or the legacy target API is probably the last blocker from my perspective.

src/python/pants/engine/query.py Outdated Show resolved Hide resolved
src/python/pants/init/specs_calculator.py Outdated Show resolved Hide resolved
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch 3 times, most recently from cfa310f to ac79df0 Compare July 1, 2020 00:10
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch from ac79df0 to 0cec7af Compare July 21, 2020 06:09
raise InvalidSpecConstraint(
# TODO: centralize the error messaging for when an SCM is required, and describe what SCMs
# are supported!
"{} are not available without a recognized SCM (usually git).".format(entity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe related: the sole SCM that Pants supports right now is Git. I think we'll have better error messages if we reflect that reality and solely mention Git. Users don't need to see the word SCM.

We can keep the actual class SCM of course. And if/when we add a second SCM, we can update these error messages to reflect that new reality. I imagine we will then search for all uses of get_scm() + "git", so we'll find this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with "users don't need to see the word SCM" -- if a user tries to use pants on their mercurial repo and it fails, their next response is not likely to be to open an issue against the pants repo and expect help if we assume git.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the error message to specify that we currently only support git.

Comment on lines 154 to 174
if exprs:
# TODO: this should only be necessary for the `changed-since`/etc queries! This can be done by
# returning a dummy ScmWrapper if no `changed-*` queries are used!
scm = scm("The --query option")
# TODO(#7346): deprecate --owner-of and --changed-* in favor of --query versions, allow
# pipelining of successive query expressions with the command-line target specs as the initial
# input!
if len(exprs) > 1:
raise ValueError(
"Only one --query argument is currently supported! Received: {}.".format(exprs)
)

scm_wrapper = UncachedScmWrapper.create(scm)
(expr_addresses,) = session.product_request(
QueryAddresses, [Params(scm_wrapper, exprs[0])]
)
logger.debug("expr addresses: %s", expr_addresses)
dependencies = tuple(
SingleAddress(a.spec_path, a.target_name) for a in expr_addresses.addresses
)
return Specs(
address_specs=AddressSpecs(
dependencies=dependencies, exclude_patterns=exclude_patterns, tags=tags
),
filesystem_specs=FilesystemSpecs(filesystem_specs),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd want to move this up above all this SCM stuff, and after if not changed_options.provided:.

This file was refactored recently, and now it follows the high level flow of:

if not changed_options.provided:
  # assume it's direct file/target args, and return those directly
# all the SCM logic. Note that there's no else clause because the above `if` early returned.

I'm suggesting putting if exprs: in between those two parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this file in between your comments -- please let me know if you like the result.

@coveralls
Copy link

coveralls commented Jul 21, 2020

Coverage Status

Coverage remained the same at 0.0% when pulling cc87ef4 on cosmicexplorer:query-option-for-unifying-target-selection into 45ca5b4 on pantsbuild:master.

@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch from 0b03c48 to 0e56010 Compare July 21, 2020 08:36
msgs.len(),
msgs.join("\n\n ")
))
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using @_uncacheable_rule with changed_files() in this PR would fail with Exception: Rules with errors: 0 without this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... fishy. Would you mind opening a ticket about that?

@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch 2 times, most recently from eabbb61 to ae3a2c3 Compare July 21, 2020 08:39
use the ast module to parse query expressions!!

[ci skip-rust-tests]

answer some review comments. per-file targets are breaking it now

[ci skip-rust-tests]

add do_not_generate_subtargets workaround for subtargets

[ci skip-rust-tests]

review feedback

[ci skip-rust-tests]

attempt to use @_uncacheable_rule

[ci skip-rust-tests]

fix @_uncacheable_rule

fix error about git as RootRule

add a lengthy TODO to fix query_rules() initialization

[ci skip-rust-tests]
@cosmicexplorer cosmicexplorer force-pushed the query-option-for-unifying-target-selection branch from ae3a2c3 to cc87ef4 Compare July 21, 2020 08:42
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Although it remains to be seen what we will do with @_uncacheable_rule, AFAICT this should be correct.

Comment on lines +156 to +160
# TODO: allow returning @union types to avoid this double synchronous engine invocation!
exprs = session.product_request(
QueryComponentWrapper, [QueryParseInput(s) for s in query_expr_strings]
)
exprs = [ex.underlying for ex in exprs]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you might consider not using the engine for parsing? It looks like the parsing is a pure function, so it would likely be cheaper not to bounce in and out like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Beyond being cheaper perf wise, it would likely be simpler code as pure functions are easier to work with and test than rules.

default=[],
metavar="<query-expr>",
help="A list of query expressions which process the input target specs in a "
"pipeline in order.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably include a link to a page (maybe a section of https://www.pantsbuild.org/docs/advanced-target-selection or https://www.pantsbuild.org/docs/project-introspection) on the docsite that explains how to use query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend this go into Advanced Target Selection. Maybe have a tooltip on Project Introspection that cross-references the other page.

You'll want to use https://www.pantsbuild.org/v2.0/docs/advanced-target-selection for now because this feature isn't being added to Pants 1.30.

msgs.len(),
msgs.join("\n\n ")
))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... fishy. Would you mind opening a ticket about that?

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, if the other reviewers are ready to land this, it would probably be reasonable to add a few tests. I can understand having held off while things were up in the air.)

@illicitonion
Copy link
Contributor

I'm not likely to have enough time in the near future to give this particularly meaningful review, but it looks like my one raised. concern of broken caching behaviour has been resolved, so I'll remove myself as a reviewer :) Thanks for pushing forwards with this very exciting feature!

@illicitonion illicitonion requested review from illicitonion and removed request for illicitonion July 21, 2020 21:47
@Eric-Arellano Eric-Arellano dismissed illicitonion’s stale review July 22, 2020 01:04

He said he's dismissing the review, only didn't click the appropriate button.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Could you please update the PR description? It looks like it has some stale parts like --owner-of.

It'd be really helpful to expand the number of examples to demonstrate things like how you specify the dependees behavior of changed_since.

Comment on lines +59 to +66
class ChangedFiles(Collection[str]):
pass


@dataclass(frozen=True)
class ChangedFilesRequest:
options: 'ChangedOptions'
git: Git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if you move this below ChangedOptions, you can remove the quotes, and the dataclass definitions will also be closer to the rule where they're directly used.

@@ -221,7 +222,7 @@ async def find_owners(owners_request: OwnersRequest) -> Owners:
if bfa.rel_path not in sources_set and not matching_files:
continue
deleted_files_matched = bool(set(matching_files) - all_source_files)
if deleted_files_matched:
if deleted_files_matched or owners_request.do_not_generate_subtargets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine to do this for now. I've found generated subtargets unusually confusing to reason about, and I'm proposing in #10423 that we seriously consider removing the (mis)feature.

If we do stick with generated subtargets, do you have any thoughts on how they should integrate with --query? Generally, we've said that we want to use generated subtargets when there's enough precision to indicate that the user wants it, such as file arguments or --changed-since. The tricky part with --query from what I can tell is that you will sometimes have files via --changed-since, but sometimes have normal addresses. So possibly we should always use normal addresses?

Comment on lines +54 to +69
@dataclass(frozen=True)
class OwnerOf(QueryComponent):
files: Tuple[str]

function_name = 'owner_of'

@classmethod
def parse_from_args(cls, *args):
return cls(files=tuple([str(f) for f in args]))


@rule
async def owner_of_request(owner_of: OwnerOf) -> QueryAddresses:
request = OwnersRequest(sources=owner_of.files, do_not_generate_subtargets=True)
owners = await Get(Owners, OwnersRequest, request)
return QueryAddresses(owners)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--owner-of was replaced via file arguments. I'm wondering if we should still have this as a part of --query. Is the motivation so that you could pipe to other parts of the query, whereas file args wouldn't let you do that?

src/python/pants/init/extension_loader.py Show resolved Hide resolved
Comment on lines +97 to +98
# Parse --query expressions into objects which can be resolved into BuildFileAddresses via
# v2 rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale - We now use Address and it's redundant to say "v2".

default=[],
metavar="<query-expr>",
help="A list of query expressions which process the input target specs in a "
"pipeline in order.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend this go into Advanced Target Selection. Maybe have a tooltip on Project Introspection that cross-references the other page.

You'll want to use https://www.pantsbuild.org/v2.0/docs/advanced-target-selection for now because this feature isn't being added to Pants 1.30.

Comment on lines +166 to +167
@_uncacheable_rule
def changed_files(request: ChangedFilesRequest) -> ChangedFiles:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool that it's now a rule!

In the future, I think your TODO is saying that we should refactor git to not use subprocess anymore, right? To confirm, it's safe for now to use subprocess because of @_uncacheable_rule?

@@ -33,6 +35,7 @@ class DependeesOption(Enum):
class ChangedRequest:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing how this is different from ChangedFilesRequest. Would you mind please renaming this to ChangedAddresessRequest?

else:
raise QueryParseError(
f'Query function with name {name} not found (in expr {s})! '
f'The known functions are: {known}.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

function_name = 'changes_for_diffspec'

@classmethod
def parse_from_args(cls, diffspec, dependees=DependeesOption.NONE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused how the user would specify dependees. Could you please add an example to the PR description?

Base automatically changed from master to main March 19, 2021 19:20
@Eric-Arellano
Copy link
Contributor

Closing as stale, which we're doing for all changes that haven't been touched in 1+ years to simplify project management.

This would still be a really neat feature, though. Do feel free to reopen. Thank you for showing what --query could look like for Pants!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants