Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for proc_macro::{tracked_env, tracked_path} #99515

Open
3 of 9 tasks
m-ou-se opened this issue Jul 20, 2022 · 28 comments
Open
3 of 9 tasks

Tracking Issue for proc_macro::{tracked_env, tracked_path} #99515

m-ou-se opened this issue Jul 20, 2022 · 28 comments
Labels
A-proc-macros Area: Procedural macros C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@m-ou-se
Copy link
Member

m-ou-se commented Jul 20, 2022

Feature gate: #![feature(proc_macro_tracked_env, track_path)]

This is a tracking issue for proc_macro::tracked*, to allow adding files and environment variables to the build system's dependency tracking.

Public API

// proc_macro

mod tracked_env {
    pub fn var<K: AsRef<OsStr> + AsRef<str>>(key: K) -> Result<String, VarError>;
}

mod tracked_path {
    pub fn path<P: AsRef<str>>(path: P);
}

Steps / History

Unresolved Questions

  • The full API design.
    • The implementaiton PRs were mostly focussed on the implementation details behind the scenes of proc_macro, but there's barely been any discussion on the public interface.
@m-ou-se m-ou-se added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. A-proc-macros Area: Procedural macros labels Jul 20, 2022
@m-ou-se
Copy link
Member Author

m-ou-se commented Jul 20, 2022

This feature is not in a great state. The original PRs were mostly about the implementation details, with almost no discussion on the public API. There's still many unanswered design questions, such as naming, whether path should return something (a File? a String? a TokenStream? a PathBuf? nothing?), how it should be documented, how these functions should relate to each other, whether they should report errors through panics or through Result, whether it should take AsRef<OsStr> or AsRef<str>, and so on.

Instead of discussing every single detail separately, it'd probably be best to start with a more complete proposal for how these APIs should look, and then discuss that.

(Usually most of this happens before an unstable feature is merged and tracked, but this has already been merged, hence this tracking issue.)

@mitsuhiko
Copy link
Contributor

Considering these APIs effectively mirror the cargo:rerun-if-changed and cargo:rerun-if-env-changed functionality from build scripts I think it's reasonable to assume they don't return anything but just register interest.

Right now the main reason you run into this issue is because someone already has functionality in a build script and wants to convert it into a proc_macro. I would also strongly recommend for it to not report an error if the file does not exist as absence and presence of a file is both legitimate for proc macros the same way as it works for build scripts again.

@bjorn3
Copy link
Member

bjorn3 commented Aug 28, 2022

Considering these APIs effectively mirror the cargo:rerun-if-changed and cargo:rerun-if-env-changed functionality from build scripts I think it's reasonable to assume they don't return anything but just register interest.

I don't think that is a good argument. It is relatively common to forget to register interest with build scripts I believe. In addition it is pretty common to write helper functions that register interest and at the same time read the file/env var to prevent forgetting this. The reason that no such function exists by default for build scripts is because there is no way for cargo to provide a crate to build scripts with said functions. Furthermore returning the value is essential for making proc macros work with virtual filesystems and with compiler session local env vars.

@mitsuhiko
Copy link
Contributor

It is relatively common to forget to register interest with build scripts I believe

Maybe? But that is probably a very different argument to begin with. Today where I'm standing there is functionality available with build.rs that you cannot have with proc_macro and the same arguments for the use of it with build.rs are equally appying to proc macros.

In addition it is pretty common to write helper functions that register interest and at the same time read the file/env var to prevent forgetting this.

By coupling these things together this has a high change of turning into a never ending argument because there will always be a disagreement about how this should work. In fact, even today you will find no agreement on how to best use the build script equivalents and if we had to come to an agreement about his this should work there, we probably wouldn't have solution there today with build scripts.

If there is a need for such an API it can always be added at a later point.

The reason that no such function exists by default for build scripts is because there is no way for cargo to provide a crate to build scripts with said functions.

Of course there is. It would always have been possible for cargo to ship a utility library that is linked against build scripts the same way as proc_macro exposes itself.

Furthermore returning the value is essential for making proc macros work with virtual filesystems and with compiler session local env vars.

That same argument can be brought up for build scripts.

@bjorn3
Copy link
Member

bjorn3 commented Aug 28, 2022

If there is a need for such an API it can always be added at a later point.

The interest registering version can be trivially implemented in terms of the version that returns the value by discarding the return value, but not the other way around. In addition the interest registering versions may become entirely useless if we start sandboxing proc macros depending on the way we do this.

That same argument can be brought up for build scripts.

For build scripts it isn't all that important that changing a file in an editor without saving it reruns them. It is essential for proc macros. In addition rust-analyzer is already having a lot of trouble with proc macros going outside of their implied (unenforced) sandbox, while it doesn't have any issues with build scripts as those run in the same environment as with a regular cargo build due to rust-analyzer using cargo for invoking them.

@mitsuhiko
Copy link
Contributor

I want to leave another piece of input here that I believe are relevant: rerun-if-changed today can track a directory and will in fact track changes to the contents. My expectation as a user would be that I can replicate the same functionality for proc_macros. This is in fact right now what I care about was I'm trying to build something that bundles folders.

The interest registering version can be trivially implemented in terms of the version that returns the value by discarding the return value, but not the other way around.

If returning contents turns out to be useful it can always be added as an additional API. I think coupling these things will make this feature stay in not stable land for a lot longer than if those discussions are separated.

@est31
Copy link
Member

est31 commented Dec 12, 2022

I think tracked_path::path should be redesigned to also open read access to the specified path, as suggested by RFC 3200, because the current API leaves that question unresolved, which basically begs for the proc macro to use outside methods to access files, which would be a break of current policy that this is unsupported.

@wdanilo
Copy link

wdanilo commented Feb 18, 2023

Just a note here that the current implementation does not work with relative paths. It works if I use absolute ones though.

@ehuss
Copy link
Contributor

ehuss commented Feb 18, 2023

@wdanilo Can you put together a reproduction? It seems to work fine for me. The path will be relative to cwd where rustc is run, which for Cargo is the workspace root (which matches the way a proc-macro would access relative paths).

@wdanilo
Copy link

wdanilo commented Feb 18, 2023

@ehuss Not this kind of relative path, sorry for not being clear enough here. In my case it was something along the lines of /some/absolute/path/.../.../../../some/other/file and it did not work until I run canonicalize on my PathBuf here. Could you tell me if that was enough for you to repro it, or should I try creating a minimal scene, please? :)

@ehuss
Copy link
Contributor

ehuss commented Feb 19, 2023

Yes, please create a minimal reproduction if you can. Also, please post it as a new issue (and cc me). Tracking issues are not a good way to discuss individual issues.

@usamoi
Copy link
Contributor

usamoi commented Mar 29, 2024

Can I imitate the behaviors of these two APIs by adding const _: Option<&str> = option_env!("tracked_env"); and const _: &str = include_bytes!("tracked_path"); to returned token streams? Is it guaranteed that the env and the path are tracked even if they do not really make a difference to the program?

@programmerjake
Copy link
Member

Can I imitate the behaviors of these two APIs by adding const _: Option<&str> = option_env!("tracked_env"); and const _: &str = include_bytes!("tracked_path"); to returned token streams? Is it guaranteed that the env and the path are tracked even if they do not really make a difference to the program?

first, include_bytes! doesn't return &str, it returns &[u8; N].

second, afaict this would make rustc recompile the output of the proc-macro, not necessarily rerun the proc-macro itself.

@djc
Copy link
Contributor

djc commented Mar 29, 2024

Can I imitate the behaviors of these two APIs by adding const _: Option<&str> = option_env!("tracked_env"); and const _: &str = include_bytes!("tracked_path"); to returned token streams? Is it guaranteed that the env and the path are tracked even if they do not really make a difference to the program?

second, afaict this would make rustc recompile the output of the proc-macro, not necessarily rerun the proc-macro itself.

I'm pretty sure the macros get rerun if the file used for include_bytes!() is changed. I've been using this as a trick to avoid having to have build scripts that explicitly rerun the build when template files change in Askama.

@programmerjake
Copy link
Member

I'm pretty sure the macros get rerun if the file used for include_bytes!() is changed. I've been using this as a trick to avoid having to have build scripts that explicitly rerun the build when template files change in Askama.

I wouldn't depend on that since rustc in incremental mode likes to cache stuff on disk and it's plausible to cache proc-macro outputs.

@jannic
Copy link
Contributor

jannic commented May 28, 2024

I'm pretty sure the macros get rerun if the file used for include_bytes!() is changed. I've been using this as a trick to avoid having to have build scripts that explicitly rerun the build when template files change in Askama.

I wouldn't depend on that since rustc in incremental mode likes to cache stuff on disk and it's plausible to cache proc-macro outputs.

I don't think we need to worry about that: include_bytes, via load_binary_file, adds the file to a list of dependencies:

let file = self.new_source_file(path.to_owned().into(), text);

So as I understand it, the file is tracked for changes explicitly, and any relevant caches (if any) should be invalidated automatically. Everything else would break incremental builds fundamentally.

@programmerjake
Copy link
Member

programmerjake commented May 28, 2024

I wouldn't depend on that since rustc in incremental mode likes to cache stuff on disk and it's plausible to cache proc-macro outputs.

I don't think we need to worry about that: include_bytes, via load_binary_file, adds the file to a list of dependencies:

So as I understand it, the file is tracked for changes explicitly, and any relevant caches (if any) should be invalidated automatically.

the problem isn't that include_bytes won't be rerun, but that whatever macro expanded to include_bytes may not be rerun, since that macro that produced include_bytes didn't tell the compiler that it depends on the file, and not merely whatever macros are in its output. The compiler is allowed to cache the intermediate output of running the proc macro and just rerun include_bytes itself, instead of being required to rerun both the proc-macro and include_bytes. Idk if the compiler currently behaves that way, but I see no reason it shouldn't be allowed to.

@jannic
Copy link
Contributor

jannic commented May 28, 2024

the problem isn't that include_bytes won't be rerun, but that whatever macro expanded to include_bytes may not be rerun, since that macro that produced include_bytes didn't tell the compiler that it depends on the file, and not merely whatever macros are in its output.

Ok, that may be true, I have no idea how the compiler tracks which files affect the results of a proc macro. Wouldn't that affect cargo:rerun-if-changed in the same way? Or does that implicitly invalidate all cached proc macro outputs whenever any of the files change?

@bjorn3
Copy link
Member

bjorn3 commented May 28, 2024

Apart from tracked_path and rustc does not track which files affect the proc macro output at all. It currently reruns all proc macros every time rustc runs, though in the future it will probably start to cache proc macros in the incremental compilation cache. And cargo reruns rustc only when any of the files mentioned in the dep info file change. tracked_path adds an entry for the given file to the dep info file.

@djc
Copy link
Contributor

djc commented May 28, 2024

though in the future it will probably start to cache proc macros in the incremental compilation cache.

Do you know if this is tracked somewhere? Would be great to have.

@bjorn3
Copy link
Member

bjorn3 commented May 28, 2024

I don't know of a tracking issue for this, but #125356 would one step towards this (caching decl macros) There have been discussions in a variety of places including the internals forum and zulip.

@futile
Copy link
Contributor

futile commented Jul 30, 2024

Hi, I have some time, and wanted to see if I can help out on this effort to cache proc macro expansions :)

Having seen https://www.coderemote.dev/blog/faster-rust-compiler-macro-expansion-caching/ (their changes are closed source though), which found that up to 40% of incremental build times are sometimes due to proc macro expansion, I also did some profiling on a project of mine that uses bevy (which has a lot of custom derives), and found that proc macro expansion also makes up for 40% of my incremental build time.

Additionally there was around 20% of time spent in codegen afterwards, which I thought might also go down a bit with proc macro caching, but I have not measured this(!). I think the 40% are worth it anyway, even if no additional (codegen) benefits show up in the end.

Ongoing Progress I found:

Possible Progress I see:

I did not yet see a way to "opt into" this dependency tracking feature for proc macros (please let me know if I missed it). Cargo has the common cargo::rerun-if-changed=build.rs-pattern for build scripts to signal that only the build script itself is relevant for change detection (link to cargo docs). I think a similar mechanism might be needed for proc macros to opt into tracking/caching, because I think we can't change the behavior of existing/unadapted macros.

Maybe something like proc_macro::track_input_tokens() (name up for bikeshedding) could serve that purpose? In that case, could that also be a first step for stabilization? I.e., as a first step we allow proc macros to opt-in to signaling that they only depend on their input tokens (and their own definition, of course), and anything that relies on files or env-vars will simply not opt-in for now. This would make progress possible and would allows questions such as #99515 (comment) to be solved independently, hopefully making it easier to arrive at an ergonomic solution and API.

Given that this might substantially improve build times for incremental builds, I think it would make sense to try and drive a path forwards here, and would be happy to help/implement something for that :)

@Skgland
Copy link
Contributor

Skgland commented Jul 30, 2024

I think tracked_path::path should be redesigned to also open read access to the specified path, as suggested by RFC 3200, because the current API leaves that question unresolved, which basically begs for the proc macro to use outside methods to access files, which would be a break of current policy that this is unsupported.

What about macros like include_dir! that would want to enumerate folders and track the folder content for new files?

@futile
Copy link
Contributor

futile commented Jul 31, 2024

Ok, taking a closer look at proc-macro caching, it seems like #125356 (mentioned in #99515 (comment)) definitely makes sense as a first step, since the basic machinery for caching decl macros will probably also be relevant for proc-macro caching.

So would be happy to help with that as well, but not sure about the PR status, so I'll inquire there 👍

@est31
Copy link
Member

est31 commented Aug 1, 2024

@Skgland that's actually a good question. It's a limitation to the entire tracked_path feature I'd say: right now it assumes that it gets files passed to it, instead of arbitrary paths that are either files or directories. The correct behaviour when passed a directory would be, at each build, to walk that directory and all of its children and if any child had modifications, or the list of children is different, then mark it as dirty.

The issue is .d files, which probably don't support this. So integration into build systems like make or others would be poor.

Maybe we should rename path to file to make it clear that we only support files?

In classical unix terminology, directories are special types of "file"s, but we already have the is_file function in our API (where the "file" is an abbreviation for "regular file") so there is precedent already.

@Skgland
Copy link
Contributor

Skgland commented Aug 1, 2024

Based on rust-lang/rfcs#3200 (comment) I would have though directories are handled as expected.

FWIW, the tracked_path API now supports tracking any changes within a directory. The change happened alongside the buildscript rerun-if getting that functionality to replace the older, mostly useless behavior of watching the directory entry itself for (i.e. metadata) changes.

Also isn't this already something cargo needs to track e.g. cargo build needs to know if new build targets are added i.e. a src/main.rs is added to a previously library only package, or a new file in src/bin.
Though that's probably a concern at a different level.

Edit:

I think that comment was referring to rust-lang/cargo#8973 though not sure how that effects tracked_path as I don't see a corresponding PR for rust itself.

@est31
Copy link
Member

est31 commented Aug 1, 2024

not sure how that effects tracked_path as I don't see a corresponding PR for rust itself.

rustc outputs .d files (write_out_deps function in compiler/rustc_interface/src/passes.rs), which then get read by cargo.

Cargo then calls the mtime_recursive function:

https://github.com/rust-lang/cargo/blob/11dccd1df09b0f50706d72f6e886b67f1f00b732/src/cargo/core/compiler/fingerprint/mod.rs#L1928

So yeah, you are right, directories are handled (and handled recursively).

@Skgland
Copy link
Contributor

Skgland commented Aug 1, 2024

This would need to be properly handled by rust as well in case proc macro outputs are cached in the future (#99515 (comment)), so that the cache is invalidated if the directory content changes. Doesn't help if cargo reruns rustc as it detects the change, but rustc doesn't detect the change and reuses a cached result.

I think this might be a new case to handle for rustc as currently I don't know of anything that enumerates folders. There is module_name.rs vs. module_name/mod.rs, but all I can think of results in a finite list of files that are checked, rather than enumerating a folder.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 6, 2024
Make declarative macro expansion a part of query system (cont. of rust-lang#125356)

This is a continuation of the effort to make declarative macro expansion a part of the query system from rust-lang#125356 by `@SparrowLii.`

#### Description from that PR:

> This is an attempt to make the macro expansion a part of incremental compilation.
>
> Processing of procedural macros is difficult since they may involve interactions with the external environment. Declaring macros is a good start.
>
> **It is not yet possible to test the effect of this PR on incremental compilation since the new query is declared as eval_always.**

#### Status of this PR:

* It is rebased against a much more recent `master` commit
* It contains the original commits from rust-lang#125356 (in a rebased form)
* It adds the missing implementation for retrying macro matching that provides (better) error diagnostics
  * This was done by refactoring `rustc_expand::mbe::diagnostics::failed_to_match_macro()` to only require a `ParseSess` instead of an `ExtCtxt`. Otherwise, `ExtCtxt` would need to be in the interface of `TcxMacroExpander`, which is not possible, because `ExtCtxt` is part of `rustc_expand`, which depends on the crate that contains `TcxMacroExpander`, `rustc_middle`, and thus would introduce a circular dependency.
  * This refactoring moved the retrying down into the `impl TcxMacroExpander for MacroRulesMacroExpander` (this is just a change compared to the original PR, otherwise not important to know).
* This PR passes `./x test tests/ui`, which produced errors before that were all due to the missing implementation of retry macro matching.
* This PR does **not** yet contain changes for the open discussions from rust-lang#125356. I wanted to fix the tests first before tackling them, and I also need to figure out what the best solutions for them are. I'd welcome any help/support with this, e.g., opening up a corresponding discussion on this PR with a summary/the final decided fix if available :)

In general, I'm new to working on rustc, so would be thankful for a bit more background/explanations in discussions and comments :) Thanks! :)

(tangentially relevant: rust-lang#99515)
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 14, 2024
…ng, r=<try>

Experimental: Add Derive Proc-Macro Caching

# On-Disk Caching For Derive Proc-Macro Invocations

This PR adds on-disk caching for derive proc-macro invocations using rustc's query system to speed up incremental compilation.

The implementation is (intentionally) a bit rough/incomplete, as I wanted to see whether this helps with performance before fully implementing it/RFCing etc.

I did some ad-hoc performance testing.

## Rough, Preliminary Eval Results:

Using a version built through `DEPLOY=1 src/ci/docker/run.sh dist-x86_64-linux` (which I got from [here](https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization)).

### [Some Small Personal Project](https://github.com/futile/ultra-game):

```console
# with -Zthreads=0 as well
$ touch src/main.rs && cargo +dist check
```

Caused a re-check of 1 crate (the only one).

Result:
| Configuration | Time (avg. ~5 runs) |
|--------|--------|
| Uncached | ~0.54s |
| Cached | ~0.54s |

No visible difference.

### [Bevy](https://github.com/bevyengine/bevy):

```console
$ touch crates/bevy_ecs/src/lib.rs && cargo +dist check
```

Caused a re-check of 29 crates.

Result:
| Configuration | Time (avg. ~5 runs) |
|--------|--------|
| Uncached | ~6.4s |
| Cached | ~5.3s |

Roughly 1s, or ~17% speedup.

### [Polkadot-Sdk](https://github.com/paritytech/polkadot-sdk):

Basically this script (not mine): https://github.com/coderemotedotdev/rustc-profiles/blob/d61ad38c496459d82e35d8bdb0a154fbb83de903/scripts/benchmark_incremental_builds_polkadot_sdk.sh

TL;DR: Two full `cargo check` runs to fill the incremental caches (for cached & uncached). Then 10 repetitions of `touch $some_file && cargo +uncached check && cargo +cached check`.

```console
$ cargo update # `time` didn't build because compiler too new/dep too old
$ ./benchmark_incremental_builds_polkadot_sdk.sh # see above
```

_Huge_ workspace with ~190 crates. Not sure how many were re-built/re-checkd on each invocation.

Result:
| Configuration | Time (avg. 10 runs) |
|--------|--------|
| Uncached | 99.4s |
| Cached | 67.5s |

Very visible speedup of 31.9s or ~32%.

---

**-> Based on these results I think it makes sense to do a rustc-perf run and see what that reports.**

---

## Current Limitations/TODOs

I left some `FIXME(pr-time)`s in the code for things I wanted to bring up/draw attention to in this PR. Usually when I wasn't sure if I found a (good) solution or when I knew that there might be a better way to do something; See the diff for these.

### High-Level Overview of What's Missing For "Real" Usage:

* [ ] Add caching for `Bang`- and `Attr`-proc macros (currently only `Derive`).
  * Not a big change, I just focused on `derive`-proc macros for now, since I felt like these should be most cacheable and are used very often in practice.
* [ ] Allow marking specific macros as "do not cache" (currently only all-or-nothing).
  * Extend the unstable option to support, e.g., `-Z cache-derive-macros=some_pm_crate::some_derive_macro_fn` for easy testing using the nightly compiler.
  * After Testing: Add a `#[proc_macro_cacheable]` annotation to allow proc-macro authors to "opt-in" to caching (or sth. similar). Would probably need an RFC?
  * Might make sense to try to combine this with rust-lang#99515, so that external dependencies can be picked up and be taken into account as well.

---

So, just since you were in the loop on the attempt to cache declarative macro expansions:

r? `@petrochenkov`

Please feel free to re-/unassign!

Finally: I hope this isn't too big a PR, I'll also show up in Zulip since I read that that is usually appreciated. Thanks a lot for taking a look! :)

(Kind of related/very similar approach, old declarative macro caching PR: rust-lang#128747)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-proc-macros Area: Procedural macros C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

13 participants