-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a --query option for unifying target selection #7350
Add a --query option for unifying target selection #7350
Conversation
4c5db44
to
02890ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks exciting :) Most important question here is around @rule
caching. I also have some composition questions, but I'll take those to #7346 :)
tests/python/pants_test/engine/legacy/test_query_integration.py
Outdated
Show resolved
Hide resolved
Thanks! Will try to take a look before the weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! I don't have any objections to this, although I think that actually implementing a form of pipelining will be necessary to make it truly useful. Would recommend getting the second PR "mostly ready" before landing this one.
Sounds like a plan! |
acfba59
to
81efd62
Compare
81efd62
to
811be9a
Compare
06cd7f2
to
073db8e
Compare
ead735d
to
dff3194
Compare
### Problem We would like to be able to provide binary versions of all dependencies of specified root targets to an IDE with `export-classpath`, without providing binary versions of the target roots themselves (because the IDE will be indexing those itself, as source files). `--query` from #7350 might help here, but until that's developed a little further, it would be great to have this functionality now. @olafurpg requested this change to help integrate pants with metals. ### Solution - Add `--transitive-only` option to the `export-classpath` task which avoids publishing jars for target roots, and instead only pulls in their dependencies!
dff3194
to
0929282
Compare
666974d
to
8d29b35
Compare
407f9cf
to
2126b73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks: this continues to be a really interesting patch, particularly now that the new Target API has begun to take form (cc @Eric-Arellano). Adjusting this to consume that API rather than target adaptors or the legacy target API is probably the last blocker from my perspective.
cfa310f
to
ac79df0
Compare
ac79df0
to
0cec7af
Compare
raise InvalidSpecConstraint( | ||
# TODO: centralize the error messaging for when an SCM is required, and describe what SCMs | ||
# are supported! | ||
"{} are not available without a recognized SCM (usually git).".format(entity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe related: the sole SCM that Pants supports right now is Git. I think we'll have better error messages if we reflect that reality and solely mention Git. Users don't need to see the word SCM.
We can keep the actual class SCM of course. And if/when we add a second SCM, we can update these error messages to reflect that new reality. I imagine we will then search for all uses of get_scm()
+ "git", so we'll find this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with "users don't need to see the word SCM" -- if a user tries to use pants on their mercurial repo and it fails, their next response is not likely to be to open an issue against the pants repo and expect help if we assume git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the error message to specify that we currently only support git.
if exprs: | ||
# TODO: this should only be necessary for the `changed-since`/etc queries! This can be done by | ||
# returning a dummy ScmWrapper if no `changed-*` queries are used! | ||
scm = scm("The --query option") | ||
# TODO(#7346): deprecate --owner-of and --changed-* in favor of --query versions, allow | ||
# pipelining of successive query expressions with the command-line target specs as the initial | ||
# input! | ||
if len(exprs) > 1: | ||
raise ValueError( | ||
"Only one --query argument is currently supported! Received: {}.".format(exprs) | ||
) | ||
|
||
scm_wrapper = UncachedScmWrapper.create(scm) | ||
(expr_addresses,) = session.product_request( | ||
QueryAddresses, [Params(scm_wrapper, exprs[0])] | ||
) | ||
logger.debug("expr addresses: %s", expr_addresses) | ||
dependencies = tuple( | ||
SingleAddress(a.spec_path, a.target_name) for a in expr_addresses.addresses | ||
) | ||
return Specs( | ||
address_specs=AddressSpecs( | ||
dependencies=dependencies, exclude_patterns=exclude_patterns, tags=tags | ||
), | ||
filesystem_specs=FilesystemSpecs(filesystem_specs), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd want to move this up above all this SCM stuff, and after if not changed_options.provided:
.
This file was refactored recently, and now it follows the high level flow of:
if not changed_options.provided:
# assume it's direct file/target args, and return those directly
# all the SCM logic. Note that there's no else clause because the above `if` early returned.
I'm suggesting putting if exprs:
in between those two parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this file in between your comments -- please let me know if you like the result.
0b03c48
to
0e56010
Compare
msgs.len(), | ||
msgs.join("\n\n ") | ||
)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using @_uncacheable_rule
with changed_files()
in this PR would fail with Exception: Rules with errors: 0
without this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... fishy. Would you mind opening a ticket about that?
eabbb61
to
ae3a2c3
Compare
use the ast module to parse query expressions!! [ci skip-rust-tests] answer some review comments. per-file targets are breaking it now [ci skip-rust-tests] add do_not_generate_subtargets workaround for subtargets [ci skip-rust-tests] review feedback [ci skip-rust-tests] attempt to use @_uncacheable_rule [ci skip-rust-tests] fix @_uncacheable_rule fix error about git as RootRule add a lengthy TODO to fix query_rules() initialization [ci skip-rust-tests]
ae3a2c3
to
cc87ef4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Although it remains to be seen what we will do with @_uncacheable_rule
, AFAICT this should be correct.
# TODO: allow returning @union types to avoid this double synchronous engine invocation! | ||
exprs = session.product_request( | ||
QueryComponentWrapper, [QueryParseInput(s) for s in query_expr_strings] | ||
) | ||
exprs = [ex.underlying for ex in exprs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you might consider not using the engine for parsing? It looks like the parsing is a pure function, so it would likely be cheaper not to bounce in and out like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Beyond being cheaper perf wise, it would likely be simpler code as pure functions are easier to work with and test than rules.
default=[], | ||
metavar="<query-expr>", | ||
help="A list of query expressions which process the input target specs in a " | ||
"pipeline in order.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably include a link to a page (maybe a section of https://www.pantsbuild.org/docs/advanced-target-selection or https://www.pantsbuild.org/docs/project-introspection) on the docsite that explains how to use query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend this go into Advanced Target Selection. Maybe have a tooltip on Project Introspection that cross-references the other page.
You'll want to use https://www.pantsbuild.org/v2.0/docs/advanced-target-selection for now because this feature isn't being added to Pants 1.30.
msgs.len(), | ||
msgs.join("\n\n ") | ||
)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... fishy. Would you mind opening a ticket about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, if the other reviewers are ready to land this, it would probably be reasonable to add a few tests. I can understand having held off while things were up in the air.)
I'm not likely to have enough time in the near future to give this particularly meaningful review, but it looks like my one raised. concern of broken caching behaviour has been resolved, so I'll remove myself as a reviewer :) Thanks for pushing forwards with this very exciting feature! |
He said he's dismissing the review, only didn't click the appropriate button.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Could you please update the PR description? It looks like it has some stale parts like --owner-of
.
It'd be really helpful to expand the number of examples to demonstrate things like how you specify the dependees behavior of changed_since.
class ChangedFiles(Collection[str]): | ||
pass | ||
|
||
|
||
@dataclass(frozen=True) | ||
class ChangedFilesRequest: | ||
options: 'ChangedOptions' | ||
git: Git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if you move this below ChangedOptions
, you can remove the quotes, and the dataclass definitions will also be closer to the rule where they're directly used.
@@ -221,7 +222,7 @@ async def find_owners(owners_request: OwnersRequest) -> Owners: | |||
if bfa.rel_path not in sources_set and not matching_files: | |||
continue | |||
deleted_files_matched = bool(set(matching_files) - all_source_files) | |||
if deleted_files_matched: | |||
if deleted_files_matched or owners_request.do_not_generate_subtargets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fine to do this for now. I've found generated subtargets unusually confusing to reason about, and I'm proposing in #10423 that we seriously consider removing the (mis)feature.
If we do stick with generated subtargets, do you have any thoughts on how they should integrate with --query
? Generally, we've said that we want to use generated subtargets when there's enough precision to indicate that the user wants it, such as file arguments or --changed-since
. The tricky part with --query
from what I can tell is that you will sometimes have files via --changed-since
, but sometimes have normal addresses. So possibly we should always use normal addresses?
@dataclass(frozen=True) | ||
class OwnerOf(QueryComponent): | ||
files: Tuple[str] | ||
|
||
function_name = 'owner_of' | ||
|
||
@classmethod | ||
def parse_from_args(cls, *args): | ||
return cls(files=tuple([str(f) for f in args])) | ||
|
||
|
||
@rule | ||
async def owner_of_request(owner_of: OwnerOf) -> QueryAddresses: | ||
request = OwnersRequest(sources=owner_of.files, do_not_generate_subtargets=True) | ||
owners = await Get(Owners, OwnersRequest, request) | ||
return QueryAddresses(owners) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--owner-of
was replaced via file arguments. I'm wondering if we should still have this as a part of --query. Is the motivation so that you could pipe to other parts of the query, whereas file args wouldn't let you do that?
# Parse --query expressions into objects which can be resolved into BuildFileAddresses via | ||
# v2 rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale - We now use Address
and it's redundant to say "v2".
default=[], | ||
metavar="<query-expr>", | ||
help="A list of query expressions which process the input target specs in a " | ||
"pipeline in order.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend this go into Advanced Target Selection. Maybe have a tooltip on Project Introspection that cross-references the other page.
You'll want to use https://www.pantsbuild.org/v2.0/docs/advanced-target-selection for now because this feature isn't being added to Pants 1.30.
@_uncacheable_rule | ||
def changed_files(request: ChangedFilesRequest) -> ChangedFiles: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool that it's now a rule!
In the future, I think your TODO is saying that we should refactor git
to not use subprocess
anymore, right? To confirm, it's safe for now to use subprocess
because of @_uncacheable_rule
?
@@ -33,6 +35,7 @@ class DependeesOption(Enum): | |||
class ChangedRequest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing how this is different from ChangedFilesRequest
. Would you mind please renaming this to ChangedAddresessRequest
?
else: | ||
raise QueryParseError( | ||
f'Query function with name {name} not found (in expr {s})! ' | ||
f'The known functions are: {known}.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice.
function_name = 'changes_for_diffspec' | ||
|
||
@classmethod | ||
def parse_from_args(cls, diffspec, dependees=DependeesOption.NONE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused how the user would specify dependees
. Could you please add an example to the PR description?
Closing as stale, which we're doing for all changes that haven't been touched in 1+ years to simplify project management. This would still be a really neat feature, though. Do feel free to reopen. Thank you for showing what |
Problem
It would be neat to be able to do away with splitting target selection across multiple options and tasks (
--owner-of
,--changes-since
,dependencies
,filter
, ...), and it would also be great to avoid having to perform multiple pants invocations to extract the final desired target list when running these target selection commands in our internal CI. #7346 was made to discuss one way this could end up looking like.Please make sure to see #6501 for the impetus and prior discussion of unifying target selection.
Solution
ChangedRequest
into anOwnersRequest
to allow for a single engine invocation instead of multiplesession.product_request()
s.query.py
withQueryComponent
objects that have rulesets resolving them toBuildFileAddresses
and know how to parse themselves from shlexed strings (the parsing method is subject to change, see ./pants --query for unifying target selection mechanisms to improve performance #7346 (comment)).--query
option and parse it intoQueryComponent
s.owner-of
,changes-since
, andchanges-from-diffspec
query functions to mimic the behavior of--owner-of
and--changes-*
options.Result
Instead of running
./pants --changes-since=HEAD list
, the user can run./pants --query='changes-since HEAD' list
, similarly forowner-of
. The first step of the plan being discussed in #7346 is laid out, laying the groundwork for composing multiple queries in a followup PR.