Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros should be able to consume the engine to request files/processes #7022

Closed
stuhood opened this issue Jan 3, 2019 · 19 comments · Fixed by #14075
Closed

Macros should be able to consume the engine to request files/processes #7022

stuhood opened this issue Jan 3, 2019 · 19 comments · Fixed by #14075
Labels

Comments

@stuhood
Copy link
Member

stuhood commented Jan 3, 2019

Multiple problems stem from the v1 macro system (context_aware_object_factory) not being able to access the engine. We should likely replace it with an engine-aware macro system, and/or generally add support in the engine APIs for generating multiple targets from one target/macro.


python_requirements is a context_aware_object_factory that declares multiple targets by reading a requirements.txt file adjacent to its BUILD file.

While its invalidation was recently (#6405) improved, the fact that execution of that macro means that BUILD file parsing depends on the requirements.txt file is still not declared (rather, we declare that any targets that the macro defines depend on the requirements.txt file, which is also true). This means that adding a new requirement to the requirements.txt file does not trigger re-parsing of the BUILD file to discover the new target that should be created.

Fixing this bug relates to a host of other issues:


In #6998, @illicitonion mentioned accessing @rules via BUILD file parsing. While this is not currently possible, it definitely suggests a possible solution to both this bug and to #3561: namely, that we could design an API to replace macros/CAOFs/objects with @rules that produce one or more BUILD symbols. Meanwhile, the use of @rules for this case might also inspire a solution to #6449.

@stuhood stuhood changed the title python_requirements() does not fully declare its file dependencies. python_requirements() does not fully declare its dependencies. Jan 31, 2019
@stuhood stuhood changed the title python_requirements() does not fully declare its dependencies. Macros cannot fully declare their dependencies. Jan 31, 2019
@stuhood
Copy link
Member Author

stuhood commented Jan 31, 2019

I've updated the title of this one, because it applies to

  1. all macros
  2. more than just files: it's also the case that if a macro consumes an option value (via a global subsystem), that will not trigger a restart

@stuhood stuhood added the pantsd label Jan 31, 2019
@TansyArron
Copy link
Contributor

I believe I've just hit this:

  • Add a new requirement (dataclasses) to requirements.txt
  • Add the new requirement to a targets dependencies (3rdparty/python:dataclasses)
  • Run ./pants test <target with new dep>
  • Get a resolve_error - pants.engine.mapper.ResolveError: "dataclasses" was not found in namespace "3rdparty/python"

Running ./pants clean-all fixed this for me.

Full output below.

/c/1toolchain ❯❯❯ ./pants test src/python/toolchain/satresolver::                                                                                     

Exception caught: (pants.build_graph.address_lookup_error.AddressLookupError)
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/bin/daemon_pants_runner.py", line 325, in post_fork_child
    runner.run()
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/bin/local_pants_runner.py", line 164, in run
    self._run()
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/bin/local_pants_runner.py", line 232, in _run
    goal_runner_result = self._maybe_run_v1(run_tracker, reporting)
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/bin/local_pants_runner.py", line 181, in _maybe_run_v1
    return goal_runner_factory.create().run()
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/bin/goal_runner.py", line 126, in create
    goals, context = self._setup_context()
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/bin/goal_runner.py", line 95, in _setup_context
    self._root_dir
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/init/engine_initializer.py", line 227, in create_build_graph
    for _ in graph.inject_roots_closure(target_roots):
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/engine/legacy/graph.py", line 219, in inject_roots_closure
    for address in self._inject_specs(target_roots.specs):
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/engine/legacy/graph.py", line 272, in _inject_specs
    subjects)
  File "/Users/tansypants/.pyenv/versions/3.6.6/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.15.0rc0_py36/lib/python3.6/site-packages/pants/engine/legacy/graph.py", line 239, in _resolve_context
    'Build graph construction failed: {} {}'.format(type(e).__name__, str(e))

Exception message: Build graph construction failed: ExecutionError 1 Exception encountered:
Computing Select(Specs(dependencies<Exactly(tuple)>=(DescendantAddresses(directory='src/python/toolchain/satresolver'),), matcher<Exactly(SpecsMatcher)>=SpecsMatcher(tags<Exactly(tuple)>=(), exclude_patterns<Exactly(tuple)>=())), TransitiveHydratedTargets)
  Computing Task(transitive_hydrated_targets(), Specs(dependencies<Exactly(tuple)>=(DescendantAddresses(directory='src/python/toolchain/satresolver'),), matcher<Exactly(SpecsMatcher)>=SpecsMatcher(tags<Exactly(tuple)>=(), exclude_patterns<Exactly(tuple)>=())), TransitiveHydratedTargets, true)
    Computing Task(transitive_hydrated_target(), src/python/toolchain/satresolver:satresolver, TransitiveHydratedTarget, true)
      Computing Task(transitive_hydrated_target(), 3rdparty/python:dataclasses, TransitiveHydratedTarget, true)
        Computing Task(hydrate_target(), 3rdparty/python:dataclasses, HydratedTarget, true)
          Computing Task(hydrate_struct(), 3rdparty/python:dataclasses, TargetAdaptorContainer, true)
            Computing Task(resolve_unhydrated_struct(), 3rdparty/python:dataclasses, UnhydratedStruct, true)
              Throw("dataclasses" was not found in namespace "3rdparty/python". Did you mean one of:
  :Django
  :PyYAML
  :ansicolors
  :attrdict
  :boto3
  :bs4
  :coloredlogs
  :contexttimer
  :django-extensions
  :django-model-mixins
  :django-polymorphic
  :django-query-logger
  :django-queryinspect
  :django-redshift-backend
  :future
  :gunicorn
  :jinja2
  :mock
  :moto
  :networkx
  :pantsbuild.pants
  :pantsbuild.pants.contrib.codeanalysis
  :pexpect
  :pkginfo
  :plyvel
  :protobuf
  :psycopg2-binary
  :pystache
  :pytest
  :pytest-django
  :pytest-pythonpath
  :requests
  :requirements.txt
  :setuptools
  :six
  :social-auth-app-django
  :srvlookup
  :termcolor
  :whitenoise)

@stuhood
Copy link
Member Author

stuhood commented Mar 27, 2019

@TansyArron : Were you using pantsd? It should only be possible with pantsd.

EDIT: Ah, yes: according to the stack, you are. Cool! So, the workaround for this one is something like:

# Whenever filesystem events are seen for files in these glob paths, the daemon
# will restart itself.
pantsd_invalidation_globs: +[
    '3rdparty/**/requirements.txt',
  ]

stuhood pushed a commit that referenced this issue Apr 22, 2019
The invalidation globs for pantsd (improved in #5567) can be manually configured to cover everything that is known to cause pantsd to need to restart. But many of the things that should trigger a restart are already known to pants. A partial list:
  1. The content of any configured pants.ini files (due to #7022)
  2. The pythonpath of pants itself (since changes to loose-source plugins mean the code that pantsd is running might have changed)

We should automatically include these values (and likely others!) in the values that we use for --invalidation-globs. In cases where the options values point to files that exist outside of the buildroot, we should consider logging a warning (unless that ends up being too noisy).

Fixes #7595.
stuhood added a commit to stuhood/pants that referenced this issue May 22, 2020
…ld#7022.

# Delete this line to force CI to run Clippy and the Rust tests.
[ci skip-rust-tests]
# Delete this line to force CI to run the JVM tests.
[ci skip-jvm-tests]
stuhood added a commit to stuhood/pants that referenced this issue May 28, 2020
@stuhood
Copy link
Member Author

stuhood commented Jun 2, 2020

This continues to affect pantsd, but I'm going to get a PR out that bakes in the workaround mentioned in #7022 (comment) to avoid it being a launch blocker.

stuhood added a commit to stuhood/pants that referenced this issue Jun 2, 2020
stuhood added a commit that referenced this issue Jun 3, 2020
…9946)

### Problem

As explained in #7022, macros that open files can have untracked file dependencies that `pantsd` cannot account for. There is only one (relatively common) case of this: the `python_requirements` macro (which creates a target per line in a `requirements.txt` file).

### Solution

We need to address #7022 in a fundamental way before 2.0. But for now we can ameliorate the only known case by defaulting to invalidating for `requirements.txt` files.

[ci skip-rust-tests]
[ci skip-jvm-tests]
stuhood added a commit that referenced this issue Jun 3, 2020
…9946)

### Problem

As explained in #7022, macros that open files can have untracked file dependencies that `pantsd` cannot account for. There is only one (relatively common) case of this: the `python_requirements` macro (which creates a target per line in a `requirements.txt` file).

### Solution

We need to address #7022 in a fundamental way before 2.0. But for now we can ameliorate the only known case by defaulting to invalidating for `requirements.txt` files.

[ci skip-rust-tests]
[ci skip-jvm-tests]
@stuhood
Copy link
Member Author

stuhood commented Mar 17, 2021

Moving discussion from #11724 (cc @jsirois, @Eric-Arellano):


Perhaps that will change, as CAOFs have major limitations. Iiuc we want to redesign those to use the engine. But for now, I'm trying to unblock a contributor with #10723 and avoiding scope creep of redesigning CAOFs and adding "target gen". The CAOF macros work, even if not very elegant.

Yep. Mentioned this on the linked ticket: #10723. Macros implemented as @rules (as mentioned in the description here) would allow a macro to safely read a file to create multiple targets.

As it stands, finishing this ticket would allow for fixing the requirements.txt hack (#9946), and would generalize setup-py argument plugins to make a custom plugin API unnecessary.

@jsirois
Copy link
Contributor

jsirois commented Mar 17, 2021

That's great for extending Pants more easily? But I'm stuck on a different issue. Rules consume field sets, field sets currently come from targets parsed from BUILD files by rules. There is no reason I can see we can't parse more file types to produce field sets for rules to consume. Notice there is no mention of or need for macros here.

@stuhood
Copy link
Member Author

stuhood commented Mar 17, 2021

"synthetic targets"

I did not mention targets except in that they were middlemen we currently parse from BUILD files in rules to provide fieldsets for other rules to consume. Presumably new rules to parse new file types would eliminate the middleman.

FieldSets are always associated with a particular Address, and addresses are currently only (originally) produced by definitions in BUILD files. A FieldSet that was not associated with an Address, or which was associated with an Address that didn't actually have a home on disk would be akin to a "synthetic target" as they used to exist.

EDIT: So... yes, I think that we're on the same page. The challenge is seemingly just knowing "which noun/identifier you hang the FieldSet off of".

@stuhood
Copy link
Member Author

stuhood commented Mar 18, 2021

I did not mention targets except in that they were middlemen we currently parse from BUILD files in rules to provide fieldsets for other rules to consume. Presumably new rules to parse new file types would eliminate the middleman.

FieldSets are always associated with a particular Address, and addresses are currently only (originally) produced by definitions in BUILD files. A FieldSet that was not associated with an Address, or which was associated with an Address that didn't actually have a home on disk would be akin to a "synthetic target" as they used to exist.

EDIT: So... yes, I think that we're on the same page. The challenge is seemingly just knowing "which noun/identifier you hang the FieldSet off of".

To continue the thought from above... one way to hook in dynamic "creation" of targets/FieldSets would be to dynamically create them when find_owners fails to find a Target owner for a file. Note though that the "Owners" logic currently only runs for CLI arguments: we require that files parsed from dependencies lists in BUILD files are already valid (file) Addresses (after running resolve_address, we have an Address which bakes in the name of the owning target: possibly the default target).

@jsirois
Copy link
Contributor

jsirois commented Mar 18, 2021

OK. What I hear from all that is yes, we can parse files and produce field sets for rules to consume (since we already do). There is no engine change, there are only rule changes. Our current rules are too restrictive in this sense and so that may make the change harder than it otherwise might be - but its fundamentally doable.

@stuhood stuhood changed the title Macros cannot fully declare their dependencies. Macros should be able to consume the engine to request files/processes Jul 26, 2021
@stuhood
Copy link
Member Author

stuhood commented Jul 26, 2021

Another usecase arose recently: that of invoking go to generate multiple external targets for a go_module target. With a macro system that allowed invoking processes (i.e., allowed using the engine), the go_module target could become a macro that generate multiple go_external_module targets.

@Eric-Arellano
Copy link
Contributor

I'm helping a friend set up Pants in their monorepo(!), and to repeat what's known: this is a real pain point. It tripped me up for a solid 2 minutes why Pants wasn't seeing a change to <subdir>/requirements.txt, and I was already well aware of this issue.

@Eric-Arellano
Copy link
Contributor

@tdyas and I had a strawman design session on this today in the context of the Go plugin and agreed for the need for proper "target generation" as a first class citizen of the engine.

Encapsulation:

  • To most consumers, synthetic targets vs. explicit BUILD targets should be encapsulated away.
    • When you run ./pants list ::, you get all targets back, regardless of derivation.
    • Rules requesting Addresses and Targets get back all targets, regardless of derivation.
  • You can use ./pants peek to see metadata about the synthetic target that we inject via the Target API, e.g. if there was a parent target that generated the new target.

Timing of generation:

  • Synthetic target generation would happen early in the rule graph, e.g. in between Specs -> Addresses.

Capabilities needed for generation rules:

  • Access to the filesystem and process APIs.
  • Rules to generate targets need to be able to see explicit BUILD targets, e.g. to see if a target is already explicitly defined.
    • We could achieve this by adding something like ExplicitBuildTargets, which most consumers won't use outside of these target gen rules.
  • Some synthetic targets will need to generate other synthetic targets.
    • For example, Tom was explaining that in Go, we might dynamically generate a go_external_module(?) target, and then we want it to generate a couple more targets for each package(?) contained within.

--

We also discussed that this change could unlock some amazing benefits for Pants, especially boilerplate reduction! We could essentially implement ./pants tailor, but without needing to run as an explicit step checked in to disk. You only create targets when you need to override metadata. And we can do that without needing a fundamental rework of all of our rules.

And we can do that without needing a fundamental rework of all of our rules.

But worth pointing out that that may still be valuable. Rather than leaning into target gen, perhaps a redesign of targets is more important.

@tdyas
Copy link
Contributor

tdyas commented Aug 14, 2021

Some synthetic targets will need to generate other synthetic targets.

To be more precise, I meant that some synthetic targets may be usable as input to target generation rules, not that a synthetic target necessarily itself generates another synthetic target.

Using the Go plugin as an example, target generation might happen as follows:

  • The first rule would scan for go.mod files and generate go_module and corresponding go_external_module targets.
  • Another rule would next scan all of the go_external_module targets (whether explicit or just generated synthetically) and generate _go_ext_mod_package targets to represent the packages within an external module. This rule would be agnostic to what created a go_external_module target used as input.

There is no explicit ordering between those rules right now. A basic implemention could run an iterative algorithm to keep applying target generation rules until no more targets are added.

@Eric-Arellano
Copy link
Contributor

Using the Go plugin as an example, target generation might happen as follows:

Ah got it. To check my understanding: if the user explicitly specified a go_external_module but not _go_ext_mod_package targets, we need to autogenerate _go_ext_mod_package targets. If they did not generate go_external_module, we need to generate both that and it's _go_ext_mod_package.

I think that's possible w/o needing to add special ordering. You'd have two rules for each target gen, and the go_external_module invokes the rule for _go_ext_mod_package. In pseudocode:

@rule 
def generate_go_ext_mod_packages(request: GenGoExtModPackages) -> SyntheticTargets:
    # invoke go code
    return SyntheticTargets([_GoExtModPackage(...), ...])

@rule 
def generate_go_external_module(request: GenGoExternalModule) -> SyntheticTargets:
    external_module_tgt = GoExternalModule(...)
    mod_packages = await Get(SyntheticTargets, GenGoExtModPackages(external_module))
    return SyntheticTargets([external_module_tgt, *mod_packages])

The generic "target gen" code will invoke both rules based on what it detects from what's on disk, as both will be registered with unions.

No need for special ordering, and it's very explicit in each rule what will be generated. The key insight is having a return type of SyntheticTargets which is a heterogenous collection of Target objects.

@tdyas
Copy link
Contributor

tdyas commented Aug 15, 2021

To check my understanding: if the user explicitly specified a go_external_module but not _go_ext_mod_package targets, we need to autogenerate _go_ext_mod_package targets. If they did not generate go_external_module, we need to generate both that and it's _go_ext_mod_package.

Correct. If the user wants to supply a go_external_module target themselves, the Go plugin would still be able to do the package analysis of the module to produce the _go_ext_mod_package targets.

tdyas pushed a commit that referenced this issue Aug 19, 2021
)

Go modules consist of multiple packages (which are compiled one package at a time). In order to support building third-party module source code, the Go plugin needs to understand the dependencies between packages in those modules and dependencies on packages not in those sources.

First-party Go sources already have an associated `go_package` target type that can have dependencies on other first-party code. This PR adds a `_go_ext_mod_package` target type to similarly represent packages in third-party sources (and allow those external packages to be dependencies in the build graph of both first-party and third-party Go sources).

This new target type is generated by `./pants tailor` for now, but will become an implementation detail not visible to users once synthetic target generation is implemented. (See #7022 for full discussion of synthetic target generation ideas.)

This PR does not add dependency inference on packages in third-party modules. That will be in the immediate follow-up PR.
@benjyw benjyw removed engine labels Sep 9, 2021
@stuhood
Copy link
Member Author

stuhood commented Oct 26, 2021

@Eric-Arellano added a new implementation for macros based around target generation! Am going to close this one in favor of #12915, which discusses using the new macros to fix the problem with python_requirements from this issue's description.

@stuhood stuhood closed this as completed Oct 26, 2021
Eric-Arellano added a commit that referenced this issue Jan 14, 2022
…acro in favor of target generation (#14075)

Closes #12915.

## Benefits of switching to target generation

* Closes #7022.
* Target generators show up in `./pants help`.
* Consistent mechanism for target generation, as explained at https://www.pantsbuild.org/docs/targets.

## Deprecation plan

This PR adds a new global option `--use-deprecated-python-macros` with a default of `True`. When True, `parser.py` is taught to load the old macros; else it loads the new targets.

The PR deprecates the default of `--use-deprecated-python-macros` so that it will default to `False` in Pants 2.11. Then, we'll deprecate the option outright and remove it in Pants 2.12, meaning the macros will be removed then too.

## How users upgrade

```
❯ ./pants
17:06:21.30 [WARN] DEPRECATED: the option `--use-deprecated-python-macros` defaulting to true will be removed in version 2.11.0.dev0.

In Pants 2.11, the default for the global option `--use-deprecated-python-macros` will change to false.

To fix this deprecation, explicitly set `use_deprecated_python_macros = true` in the `[GLOBAL]` section of `pants.toml`. Or, When you are ready to upgrade to the improved target generation mechanism, follow these steps:

  1. Run `./pants update-build-files --fix-python-macros`
  2. Check the logs for an ERROR log to see if you have to manually add `name=` anywhere.
  3. Set `use_deprecated_python_macros = false` in `[GLOBAL]` in pants.toml.

(Why upgrade from the old macro mechanism to target generation? Among other benefits, it makes sure that the Pants daemon is properly invalidated when you change `requirements.txt` and `pyproject.toml`.)
```
Eric-Arellano added a commit that referenced this issue Mar 25, 2022
…et generators (#14842)

The last step in #7022 :D 

[ci skip-build-wheels]
[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants