Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New rustc and Cargo options to allow path sanitisation by default #3127

Merged
merged 32 commits into from
May 13, 2023
Merged
Changes from 23 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
139d7f0
Initial version of trim-path RFC
cbeuw May 23, 2021
bcbd131
Update text/3127-trim-path.md
cbeuw May 30, 2021
97e4104
Update text/3127-trim-path.md
cbeuw Jun 1, 2021
d8344ef
Use plural
cbeuw Jun 1, 2021
408dc50
Add `--remap-scope` proposal
cbeuw Aug 31, 2021
f92a321
Fix typos
cbeuw Oct 3, 2021
2bd2792
Rename flag to --remap-path-scope and typo fixes
cbeuw Dec 4, 2021
998ecf4
Add scoped mapping discussion
cbeuw Dec 4, 2021
ba6b2d8
Elaborate on linkers for separate debuginfo
cbeuw Dec 5, 2021
d790948
Add split-debuginfo-path scope
cbeuw Dec 5, 2021
d33e029
Rename split-debuginfo-path to split-debuginfo-file
cbeuw Apr 24, 2022
0688580
Typo fixes
cbeuw Apr 25, 2022
aee42a6
Add `unsplit-debuginfo` scope
cbeuw Apr 25, 2022
8e33a46
Use names instead of numbers for trim-paths options
cbeuw Apr 25, 2022
0b59e5c
Replace debuginfo with split-debuginfo option
cbeuw Apr 25, 2022
604dcb0
Add scope alias as a future possibility
cbeuw Apr 25, 2022
2d49c09
Document the ambiguity of comma separated scopes
cbeuw Apr 26, 2022
23394fc
Add object and all as alias scopes
cbeuw Apr 26, 2022
e15308c
Clarify the effects of multiple --remap-path-scope
cbeuw Apr 26, 2022
dec1901
Improve wordings
cbeuw Apr 28, 2022
bb079f4
List options specifically for cargo
cbeuw Apr 28, 2022
7c533d3
Change it back to `split-debuginfo-path`
cbeuw May 26, 2022
6e45014
Make the Cargo option effective to all compile modes
cbeuw May 26, 2022
34d4386
Clarify the sysroot situation
cbeuw Jul 9, 2022
785c229
Simplify possible scopes for Cargo
cbeuw Jul 12, 2022
a357827
Add CARGO_TRIM_PATHS
cbeuw Jul 17, 2022
5286008
Add example usages
cbeuw Jul 17, 2022
8ba3510
Clarify that not all options are intended to be stabilized
cbeuw Oct 14, 2022
3f59b7b
Add option rationales
cbeuw Feb 8, 2023
16edfb2
Change CARGO_TRIM_PATHS to the profile option
cbeuw Feb 8, 2023
cbcb1df
Current working directory -> current package
cbeuw Feb 20, 2023
451f163
Update tracking issue
ehuss May 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
261 changes: 261 additions & 0 deletions text/3127-trim-paths.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
- Feature Name: trim-paths
- Start Date: 2021-05-24
- RFC PR: [rust-lang/rfcs#3127](https://github.com/rust-lang/rfcs/pull/3127)
- Rust Issue: N/A

# Summary
[summary]: #summary

Cargo should have a [profile setting](https://doc.rust-lang.org/cargo/reference/profiles.html#profile-settings) named `trim-paths`
to sanitise absolute paths introduced during compilation that may be embedded in the compiled binary executable or library.

`cargo build` with the default `release` profile should not produce any host filesystem dependent paths into binary executable or library. But
it will retain the paths inside separate debug symbols file, if one exists, to help debuggers and profilers locate the source files.

To facilitate this, a new flag named `--remap-path-scope` should be added to `rustc` controlling the behaviour of `--remap-path-prefix`, allowing us to fine
tune the scope of remapping, specifying paths under which context (in macro expansion, in debuginfo or in diagnostics)
should or shouldn't be remapped.

# Motivation
[motivation]: #motivation

## Sanitising local paths that are currently embedded
Currently, executables and libraries built by Rust and Cargo have a lot of embedded absolute paths. They most frequently appear in debug information and
panic messages (pointing to the panic location source file). As an example, consider the following package:

`Cargo.toml`:

```toml
[package]
name = "rfc"
version = "0.1.0"
edition = "2018"

[dependencies]
rand = "0.8.0"
```
`src/main.rs`

```rust
use rand::prelude::*;

fn main() {
let r: f64 = rand::thread_rng().gen();
println!("{}", r);
}
```

Then run

```bash
$ cargo build --release
$ strings target/release/rfc | grep $HOME
```

We will see some absolute paths pointing to dependency crates downloaded by Cargo, containing our username:

```
could not initialize thread_rng: /home/username/.cargo/registry/src/github.com-1ecc6299db9ec823/rand-0.8.3/src/rngs/thread.rs
/home/username/.cargo/registry/src/github.com-1ecc6299db9ec823/rand_chacha-0.3.0/src/guts.rsdescription() is deprecated; use Display
/home/username/.cargo/registry/src/github.com-1ecc6299db9ec823/getrandom-0.2.2/src/util_libc.rs
```

This is undesirable for the following reasons:

1. **Privacy**. `release` binaries may be distributed, and anyone could then see the builder's local OS account username.
Additionally, some CI (such as [GitLab CI](https://docs.gitlab.com/runner/best_practice/#build-directory)) checks out the repo under a path where
non-public information is included. Without sanitising the path by default, this may be inadvertently leaked.
2. **Build reproducibility**. We would like to make it easier to reproduce binary equivalent builds. While it is not required to maintain
reproducibility across different environments, removing environment-sensitive information from the build will increase the tolerance on the
inevitable environment differences. This helps with build verification, as well as producing deterministic builds when using a distributed build
system.

cbeuw marked this conversation as resolved.
Show resolved Hide resolved
## Handling sysroot paths
At the moment, paths to the source files of standard and core libraries, even when they are present, always begin with a virtual prefix in the form
of `/rustc/[SHA1 hash]/library`. This is not an issue when the source files are not present (i.e. when `rust-src` component is not installed), but
when a user installs `rust-src` they may want the path to their local copy of source files to be visible. Hence the default behaviour when `rust-src`
is installed should be to use the local path. These local paths should be then affected by path remappings in the usual way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precompiled standard library libraries would use the virtual prefix even if rust-src is installed, so you have to add the remap for debuggers either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When rust-src is installed, rustc internally does a hack to map the virtual prefix to the corresponding real path on the user's local fs: https://github.com/rust-lang/rust/blob/d1a9a9551741c3e888d350d8d4f4821a5addccb2/compiler/rustc_metadata/src/rmeta/decoder.rs#L1477-L1478

Though this is real path is not emitted because it wasn't affected by --remap-path-prefix which makes reproducible builds impossible: rust-lang/rust#73167.

However, some people want to see the real path for debugging: rust-lang/rust#85463.

The point here is to satisfy both use cases: without rust-src you don't have a choice but to see the virtual path. With rust-src you'll see the real path if you don't do anything else, but if you want build reproducibility you can use --remap-path-prefix to remap them.

Copy link

@ghost ghost May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbeuw i think rust-analyzer requires rust-src. does this mean people using rust-analyzer will not have sanitization by default which i thought would happen by default from other comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanitisation by default will happen when you do cargo build --release (or cargo test --release), regardless of the presence of rust-src. For default debug builds people with rust-src will now see the local paths.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds like i was mixing things up then, sorry.

so regardless of rust-analyzer/rust-src if someone is doing --release then its still clean. just reiterating to confirm i have things straight since the discussion was confusing for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so regardless of rust-analyzer/rust-src if someone is doing --release then its still clean.

Correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precompiled standard library libraries would use the virtual prefix even if rust-src is installed, so you have to add the remap for debuggers either way.

I think what @bjorn3 is referring to is that the virtual prefixes already embedded in the debuginfo of the standard library, that is, rustc won't have access to them anymore and does not have a way of remapping those paths. Only the linker will touch that debuginfo.


## Preserving debuginfo to help debuggers
At the moment, `--remap-path-prefix` will cause paths to source files in debuginfo to be remapped. On platforms where the debuginfo resides in a
separate file from the distributable binary, this may be unnecessary and it prevents debuggers from being able to find the source. Hence `rustc`
should support finer grained control over paths in which contexts should be remapped.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## The rustc book: Command-line arguments

### `--remap-path-scope`: configure the scope of path remapping

When the `--remap-path-prefix` option is passed to rustc, source path prefixes in all output will be affected by default.
The `--remap-path-scope` argument can be used in conjunction with `--remap-path-prefix` to determine paths in which output context should be affected.
This flag accepts a comma-separated list of values and may be specified multiple times, in which case the scopes are aggregated together. The valid scopes are:

- `macro` - apply remappings to the expansion of `std::file!()` macro. This is where paths in embedded panic messages come from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a descriptive name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic is what most users care about, but it's not really accurate for those using std::file!() directly or indirectly for other things. So file-macro maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file!() isn't the only way to get a source location anymore (I don't know if its even still used for panics), we now also have intrinsics::caller_location (including stable wrappers).

I don't have any good naming suggestions (maybe intrinsics?), but I believe it would be good to also mention the intrinsic or its wrappers explicitly in the RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expansion maybe?

- `diagnostics` - apply remappings to printed compiler diagnostics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful on it's own? It can't prevent the original path from ending up in the crate metadata without also requiring all other remappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-lang/rust#88363

The issue is fixed by making --remap-path-prefix remap diagnostic messages again
...
In the future we might want to give more fine-grained control over this behavior via compiler flags

Maybe people won't use it on its own, but it can't be merged with other options either so it has to be there

Copy link

@ghost ghost May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since analyzer is popular that means nothing is stripped by default then? will it be a single option and not a mess of remap-prefix options then to strip the paths? otherwise it will be a choice of privacy/reproducibility or the usefulness of analyzer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--remap-path-prefix is a stable option and will do the same things as documented after this RFC (i.e. remap everything everywhere). You only need --remap-path-scopes if you want finer grained control.

- `unsplit-debuginfo` - apply remappings to debug information only when they are written to compiled executables or libraries, but not when they are in split debuginfo files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidtwco, do you know if we can actually implement this (when using the LLVM backend)? I.e. can we control what paths show up in what context? As far as I know, we produce a single LLVM metadata description and then LLVM takes care of splitting things apart, right?

- `split-debuginfo` - apply remappings to debug information only when they are written to split debug information files, but not in compiled executables or libraries
- `split-debuginfo-path` - apply remappings to the paths pointing to split debug information files. Does nothing when these files are not generated.
- `object` - an alias for `macro,unsplit-debuginfo,split-debuginfo-path`. This ensures all paths in compiled executables or libraries are remapped, but not elsewhere.
- `all` and `true` - an alias for all of the above, also equivalent to supplying only `--remap-path-prefix` without `--remap-path-scope`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a need for all those different options. I think having "codegen", "debuginfo" and "all" as the only three options would be fine. "codegen" would mean that stripping an executable, cdylib or staticlib from it's debuginfo (if any) will remove all unmapped paths and thus would be a good option when shipping programs without debuginfo. "debuginfo" will mean that both the executable, cdylib or staticlib and it's debuginfo only contain mapped paths and thus would be a good option when shipping programs with debuginfo. "all" would mean that all intermediate artifacts (like rlibs or rust dylibs), rustc diagnostics and everything only contain mapped paths and would be a good option for cloud builds or rust's CI.

note: For rust dylibs the all scope would be necessary as they contain crate metadata which needs to contain unmapped paths if diagnostics use unmapped paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reasonably experiment with having all of these available, but I think that when we go to stabilize this, we should consider just a subset rather than full granularity.


Debug information are written to split files when the separate codegen option `-C split-debuginfo=packed` or `unpacked` (whether by default or explicitly set).
cbeuw marked this conversation as resolved.
Show resolved Hide resolved

## Cargo

`trim-paths` is a profile setting which enables and controls the sanitisation of file paths in compilation outputs. It corresponds to the `--remap-path-scope` flag of rustc and accepts all valid scope, or combination of scopes that `--remap-path-scope` accepts, in addition to the `none` or `false` option which disables path sanitisation completely. Possible values are:

- `none` and `false` - disable path sanitisation
- `macro` - sanitise paths in the expansion of `std::file!()` macro. This is where paths in embedded panic messages come from
- `diagnostics` - sanitise paths in printed compiler diagnostics
- `unsplit-debuginfo` - sanitise paths in debug information in compiled executables or libraries. Does nothing if debug information are in split files
- `split-debuginfo` - sanitise paths in debug information in split debuginfo files. Does nothing if debug information are in compiled executables or libraries
- `split-debuginfo-path` - sanitise paths pointing to split debug information files. Does nothing if these files are not generated.
- `object` - an alias for `macro,unsplit-debuginfo,split-debuginfo-path`. This ensures all paths in compiled executables or libraries are sanitised, but not elsewhere.
- `all` and `true` - an alias for all of the above

It is defaulted to `none` for debug profiles, and `object` for release profiles. You can manually override it by specifying this option in `Cargo.toml`:
```toml
[profile.dev]
trim-paths = all
cbeuw marked this conversation as resolved.
Show resolved Hide resolved

[profile.release]
trim-paths = none
cbeuw marked this conversation as resolved.
Show resolved Hide resolved
```

The default release profile setting (`object`) sanitises only the paths in emitted executable or library files. It always affects paths from macros such as panic messages, and in debug information
only if they will be embedded together with the binary (the default on platforms with ELF binaries, such as Linux and windows-gnu),
but will not touch them if they are in separate files (the default on Windows MSVC and macOS). But the path to these separate files are sanitised.

If `trim-paths` is not `none` or `false`, then the following paths are sanitised if they appear in a selected scope:

1. Path to the source files of the standard and core library (sysroot) will begin with `/rustc/[rustc commit hash]`.
E.g. `/home/username/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs` ->
`/rustc/fe72845f7bb6a77b9e671e6a4f32fe714962cec4/library/core/src/result.rs`
2. Path to the working directory will be stripped. E.g. `/home/username/crate/src/lib.rs` -> `src/lib.rs`.
3. Path to packages outside of the working directory will be replaced with `[package name]-[version]`. E.g. `/home/username/deps/foo/src/lib.rs` -> `foo-0.1.0/src/lib.rs`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name-version is only unique - at best - within a single registry. And for code coming from git, the specific commit id is probably important to encode, since it can change without version changes.

Ideally it would be nice to encode a hash of the source file itself somewhere, but that probably doesn't fit well into this scheme. (I think Dwarf has a way to encode this, so it can be done in a case by case basis.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should use the full package id as used by cargo? Package id's look like libc 0.2.126 (registry+https://github.com/rust-lang/crates.io-index). The remapping could then be /cargo/libc 0.2.126 (registry+https://github.com/rust-lang/crates.io-index)/src/lib.rs.


When a path to the source files of the standard and core library is *not* in scope for sanitisation, the emitted path will depend on if `rust-src` component
is present. If it is, then the real path pointing to the copy of the source files on your file system will be emitted; if it isn't, then they will
show up as `/rustc/[rustc commit hash]/library/...` (just like when it is selected for sanitisation). Paths to all other source files will not be affected.

This will not affect any hard-coded paths in the source code, such as in strings.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

## `trim-paths` implementation in Cargo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this entire section, it doesn't seem to me that this functionality needs to live in Cargo. rustc could directly provide a -C trim-paths option with this behavior, since AFAICT rustc has all the information needed to do so. Cargo can then just pass through trim-paths to rustc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is rustc aware of the path to the current package? This is relevant to dependencies living under $HOME/.cargo/registry. Currently there is also the possibility (mentioned in unresolved questions) to encode more information about the package to the sanitised path, such as registry name and git commit hash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, fair point. I thought that rustc had the necessary information, but perhaps not.

It might make sense to pass the requisite information into rustc, but then let rustc make the decision for how to use that information. I'd rather not have cargo making that decision.

Copy link
Contributor Author

@cbeuw cbeuw Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc always knows the absolute path to files, but only the build tool can tell which part of it is environment-sensitive. E.g. rustc knows it's using /home/cbeuw/.cargo/registry/src/github.com-1ecc6299db9ec823/rand_core-0.6.3/src/lib.rs, but the /home/cbeuw/.cargo/registry/src/github.com-1ecc6299db9ec823/rand_core-0.6.3 part is up to Cargo, and it would be different if a different registry were used, or the structure of that path may be completely different for Bazel dependencies - but rustc doesn't know that.

This "requisite information" is already included in provided mappings in --remap-path-prefix. I can't see how a separate argument that lets the build tool pass in "the sensitive part of the current path" would look like. It'd probably be either too build tool-specific, requiring rustc to know about things like Cargo registries or Bazel repositories, or too "free", ultimately being the same as the existing --remap-path-prefix


If `trim-paths` is `none` (`false`), no extra flag is supplied to `rustc`.

If `trim-paths` is anything else, then its value is supplied directly to `rustc`'s `--remap-path-scope` option, along with two `--remap-path-prefix` arguments:
- From the path of the local sysroot to `/rustc/[commit hash]`.
- If the compilation unit is under the working directory, from the the working directory absolute path to empty string.
weihanglo marked this conversation as resolved.
Show resolved Hide resolved
If it's outside the working directory, from the absolute path of the package root to `[package name]-[package version]`.

The default value of `trim-paths` is `object` for release profile. As a result, panic messages (which are always embedded) are sanitised. If debug information is embedded, then they are sanitised; if they are split then they are kept untouched, but the paths to these split files are sanitised.

Some interactions with compiler-intrinsic macros need to be considered:
1. Path (of the current file) introduced by [`file!()`](https://doc.rust-lang.org/std/macro.file.html) *will* be remapped. **Things may break** if
the code interacts with its own source file at runtime by using this macro.
2. Path introduced by [`include!()`](https://doc.rust-lang.org/std/macro.include.html) *will* be remapped, given that the included file is under
the current working directory or a dependency package.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "path introduced by include!() will be remapped". The path that's actually included is not subject to remapping. Do you mean a file!() macro in an include!()ed file will be remapped? If so, yeah, I think that would be the expected behaviour. Presumably if the path is out of the scope of any of the remappings (include!("/tmp/randomcode.rs")) then it will be left as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a panic is raised inside a file that's been include!()ed, it's the file that directly contains the panic gets embedded and printed, not the includer. This sentence in the RFC means that the includee path is sanitised

e.g.

// a.rs
include!("b.rs");

fn main() {
    bar();
}
// b.rs
fn bar(){
    panic!("bar");
}
$ rustc -g a.rs
$ ./a
thread 'main' panicked at 'bar', b.rs:2:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is somewhat redundant. The remapping happens for the file corresponding to the source span. include!() puts the included file in the source span, and only keeps the file which contained the include!() as macro expansion source. Other macros may make other choices.


If the user further supplies custom `--remap-path-prefix` arguments via `RUSTFLAGS`
or similar mechanisms, they will take precedence over the one supplied by `trim-paths`. This means that the user-defined remapping arguments must be
supplied *after* Cargo's own remapping.

## Changing handling of sysroot path in `rustc`

The virtualisation of sysroot files to `/rustc/[commit hash]/library/...` was done at compiler bootstrapping, specifically when
`remap-debuginfo = true` in `config.toml`. This is done for Rust distribution on all channels.

At `rustc` runtime (i.e. compiling some code), we try to correlate this virtual path to a real path pointing to the file on the local file system.
Currently the result is represented internally as if the path was remapped by a `--remap-path-prefix`, from local `rust-src` path to the virtual
path.
Only the virtual name is ever emitted for metadata or codegen. We want to change this behaviour such that, when `rust-src` source files can be
discovered, the virtual path is discarded and therefore the local path will be embedded, unless there is a `--remap-path-prefix` that causes this
local path to be remapped in the usual way.
michaelwoerister marked this conversation as resolved.
Show resolved Hide resolved

## Split Debuginfo

When debug information are not embedded in the binary (i.e. `split-debuginfo` is not `off`), absolute paths to various files containing debug
information are embedded into the binary instead. Such as the absolute path to `.pdb` file (MSVC, `packed`), `.dwo` files (ELF, `unpacked`),
and `.o` files (ELF, `packed`). This can be undesirable. As such, `split-debuginfo-path` is made specifically for these embedded paths.

On macOS and ELF platforms, these paths are introduced by `rustc` during codegen. With MSVC, however, the path to `.pdb` file is generated and
embedded into the binary by the linker `link.exe`. The linker has a `/PDBALTPATH` option allows us to change the embedded path written to the
binary, which could be supplied by `rustc`

# Drawbacks
[drawbacks]: #drawbacks

The user will not be able to `Ctrl+click` on any paths provided in panic messages or backtraces outside of the working directory. But
there shouldn't be any confusion as the combination of package name and version can be used to pinpoint the file.

As mentioned above, `trim-paths` may break code that relies on `std::file!()` to evaluate to an accessible path to the file. Hence enabling
it by default for release builds may be a technically breaking change. Occurrences of such use should be extremely rare but should be investigated
via a Crater run. In case this breakage is unacceptable, `trim-paths` can be made an opt-in option rather than default in any build profile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust-analyzer uses expect-test which does exactly this to update snapshot tests. Most of the time tests would run in debug mode I think though.

Copy link
Contributor Author

@cbeuw cbeuw May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test profile is inherited from dev so the default behaviour of cargo test won't change. The line "We only need to change the behaviour for Test and Build compile modes." means other compile modes like cargo check can simply ignore the new trim-paths, though the wording is a bit ambiguous.


# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

There has been an issue (https://github.com/rust-lang/rust/issues/40552) asking for path sanitisation to be implemented and enabled by default for
release builds. It has, over the past 4 years, gained a decent amount of popular support. The remapping rule proposed here is very simple to
implement.

Path to sysroot crates are specially handled by `rustc`. Due to this, the behaviour we currently have is that all such paths are virtualised.
Although good for privacy and reproducibility, some people find it a hindrance for debugging: https://github.com/rust-lang/rust/issues/85463.
Hence the user should be given control on if they want the virtual or local path.

An alternative is to extend the syntax accepted by `--remap-path-prefix` or add a new option called `--remap-path-prefix-scoped` which allows
scoping rules to be explicitly applied to each remapping. This can co-exist with `--remap-path-scope` so it will be discussed further in
[Future possibilities](#future-possibilities) section.

# Prior art
[prior-art]: #prior-art

The name `trim-paths` came from the [similar feature](https://golang.org/cmd/go/#hdr-Compile_packages_and_dependencies) in Go. An alternative name
`sanitize-paths` was first considered but the spelling of "sanitise" differs across the pond and down under. It is also not as short and concise.

Go does not enable this by default. Since Go does not differ between debug and release builds, removing absolute paths for all build would be
a hassle for debugging. However this is not an issue for Rust as we have separate debug build profile.

GCC and Clang both have a flag equivalent to `--remap-path-prefix`, but they also both have two separate flags one for only macro expansion and
the other for only debuginfo: https://reproducible-builds.org/docs/build-path/. This is the origin of the `--remap-path-scope` idea.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Should we use a slightly more complex remapping rule, like distinguishing packages from registry, git and path, as proposed in
[Issue #40552](https://github.com/rust-lang/rust/issues/40552)?
- With debug information in separate files, debuggers and Rust's own backtrace rely on the path embedded in the binary to find these files to display
source code lines, columns and symbols etc. If we sanitise these paths to relative paths, then debuggers and backtrace must be invoked
in specific directories for these paths to work. [For instance](https://github.com/rust-lang/rust/issues/87825#issuecomment-920693005), if the
absolute path to the `.pdb` file is sanitised to the relative `target/release/foo.pdb`, then the binary must be invoked under the crate root as
`target/release/foo` to allow the correct backtrace to be displayed.
- Should we treat the current working directory the same as other packages? We could have one fewer remapping rule by remapping all
package roots to `[package name]-[version]`. A minor downside to this is not being able to `Ctrl+click` on paths to files the user is working
on from panic messages.
- Will these cover all potentially embedded paths? Have we missed anything?

# Future possibilities
[future-possibilities]: #future-possibilities

## Per-mapping scope control
If it turns out that we want to enable finer grained scoping control on each individual remapping, we could use a `scopes:from=to` syntax.
E.g. `split-debuginfo,unsplit-debuginfo,diagnostics:/path/to/src=src` will remove all references to `/path/to/src` from compiler diagnostics and debug information, but
they are retained in panic messages.

How exactly this new syntax will look like is, of course, up to further discussion. Using comma as a separator for scopes may look ambiguous as `macro,diagnostics:/path/from=to` could be interpreted as `macro`
and `diagnostics:/path/from=to`.

This syntax can be used with either a brand new `--remap-path-prefix-scoped` option, or we could extend the
existing `--remap-path-prefix` option to take in this new syntax.

If we were to extend the existing `--remap-path-prefix`, there may be an ambiguity to whether `:` means a separator between scope list and mapping,
or is it a part of the path; if the first `:` supplied belongs to the path then it would have to be escaped. This could be technically breaking.

In any case, future inclusion of this new syntax will not affect `--remap-path-scope` introduced in this RFC. Scopes specified in `--remap-path-scope`
will be used as default for all mappings, and explicit scopes for an individual mapping will take precedence on that mapping.