-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-reproducible -C metadata=hash passed to rustc depending on the compiling OS #8140
Comments
This seems to come from some information about
In particular: cargo/src/cargo/core/compiler/context/compilation_files.rs Lines 672 to 675 in 3dcfdef
I also wonder why the host hashing is skipped in the following cases: cargo/src/cargo/core/compiler/context/compilation_files.rs Lines 658 to 664 in 3dcfdef
cc @ehuss who added this Given that my tests are with |
There's more discussion about this in #7873. It will likely be difficult to remove the host from the hash due to proc macros and build dependencies. One possible approach discussed in that PR is to keep the filenames different, but keep the symbols the same. I'm still uncertain if the filenames (for rlibs at least) leak into the resulting binaries. |
If I understand correctly, the first concern is the following (#7873 (comment)).
And the second one (#7873 (comment)).
In our case:
So if I understood well the concerns, the fact that we provide an explicit But if our use case is too specific, could there at least be a flag to tell Cargo to only hash the |
FWIW at least from an abstract point of view I believe that we should fix this issue. If you're using the same source code for the entire toolchain (e.g. same rust version, same linker version, etc), then any host should compile the same result for a particular target. That being said what goes into filenames/metadata/etc is pretty tricky and nuanced, so this may not be an easy "flip the switch" fix. It's a good goal to work towards, but I suspect it will require even more investigation beyond what's already at the top of @ehuss's head right now. |
Definitely makes sense. I was trying to understand what the blockers were, and indeed it seems trickier than I thought. I'm still wondering whether it would be possible and make sense to have some experimental/unstable flag in Cargo to remove the host from the metadata hash, so that we can experiment with it? For example, it would help understand whether the metadata hash is the only blocker left to have reproducible builds across hosts, or if there are also other things to investigate. I'm not familiar with developing Cargo, but would be happy to help adding such a flag if it makes sense. In any case, being able to have reproducible builds for a given host compiler is already a very nice milestone, so thanks to everyone who made it possible! |
I'm also staring at this; we compile our project for many architectures via many cross compile toolchains (such as for Android). We use a single "reference" copy of the compiler from a specific revision of clang to target all platforms (we are currently using the one from the Android NDK for this to make it very easy for everyone to download and install it and so that "doctoring it" would require messing with a very large number of downstream developers and the NDK itself is open source so people can get the source code and compile it from scratch), as well as reproducible sysroots (manually extracted by downloading packages from distribution repositories such as MinGW/CentOS or simple "use a specific known copy of Xcode", etc.) in order to ensure that people can reproduce all of our build artifacts. With respect to this issue, one of the tests that we do--as in, this is actually part of our CI system (something that apparently is also what @gendx was working on)--is to compile the code on Linux and on macOS for the same target and verify that the resulting binary is the same. Since I recently started using a library written in Rust, it was a bit frustrating trying to figure out how to get our build to be reproducible correctly again. (The symptom I was running into was that because all of the symbols had hashes in them, they were being put in random orders, which caused the imports and string constants they caused to end up in slightly different orders.) FWIW, I did figure out a workaround for the metadata issue that is working just barely enough for me, and maybe it will work for some other people: what I'm doing is setting RUSTC_WRAPPER to a shell script that iterates the command line arguments, finds the metadata=* argument, and replaces it with metadata=path/to/file.rs (which is still needed at minimum as sometimes the same crate is included multiple times). I think this technique could be used for google/OpenSK#94, if they are still trying to go down that road. #!/bin/bash
set -e
file=
for arg in "$@"; do
if [[ ${arg} == ${CARGO_HOME}/* && -z ${file} ]]; then
file=${arg##${CARGO_HOME}}
fi
done
args=()
for arg in "$@"; do
if [[ ${arg} == metadata=* ]]; then
args+=(metadata="${file}")
else
args+=("${arg}")
fi
done
exec "${args[@]}" It isn't clear to me that this is really the only problem, though... the object files I'm getting out are internally in a different order, with high-level symbol groups staying together but being shuffled. I only happen to care about the reproducibility of the final stripped release production binary, and so this happens to be working well enough for me right now, but other people with different requirements--particularly if you care about the output from Cargo itself, as you are shipping the library you get in a package instead of linking it with a more deterministic linker and shipping that--might see the .a files you get being different as unacceptable. To explain what I mean by "high-level symbol groups", I'm going to give an example; for reference, the crate I'm trying to use and compile right now is boringtun. One of the smaller files being output in libboringtun.a is boringtun.boringtun.3a1fbbbh-cgu.1.rcgu.o, and the symbols I'm getting when compiled from a Linux machine are clumped into the following order, but when compiled on macOS the symbols are in the order C B E A D. Maybe this might mean that rust-lang/rust#71361 will need to be re-opened? Group A:
Group B:
Group C:
Group D:
Group E:
|
Interesting analysis @saurik! Yes I agree that this is worth re-opening rust-lang/rust#71361. Regarding google/OpenSK#94, for now we've agreed to have two sets of reproduced binaries, those built on Linux and those built on MacOS. Ideally it would be preferable for us to have this issue fixed upstream in Cargo rather than having to fiddle with the compilation scripts (as it may be more brittle). Also, we only tried to reproduce binaries in a stripped form obtained via https://github.com/tock/elf2tab (no ELF packaging, no symbols, only relevant sections are extracted). |
OK, so now I'm failing to replicate the symbol/section shuffling, so maybe I did something unclean there while doing a bunch of local tests... (if so, I'm sorry)? Though I also don't remember what target I was using to test that at the time; I have noticed there is a difference i'm getting with Android (which is what I thought I was using before) as the path of jnienv.rs is ending up inside of libboringtun.a, but that is something I bet I can fix with some remap flags that I've so far not had to use. I thereby actually have thankfully failed so far to find any issue in rust itself (I figured since I had invested some time into this already I'd go and come up with some simple reproduction for the other closed issue thread, even though it wasn't affecting me, but then thankfully failed ;P). Note that I am getting .o filename changes in the resulting .a files; again, thankfully, this doesn't affect me as I only care about the final linked binary... but this would totally affect other people who are intending to ship the .a file, and having all of these intermediate files be different definitely makes debugging reproduction issues harder (and I will make that same argument about having the host toolchain hashed into extra-filename also: that should just be differentiated by the parent folder under target). Regardless, putting people in the position where they can start trying to find the more subtle issues by removing the host toolchain type from the hash (even if it is only behind some temporary unstable flag) seems very useful. |
It might be nice (for use cases like Tock's) to just have a cargo flag that would disable |
Is this closed by #14107? |
In my testing, #14107 only partially addresses this issue. I am still seeing that proc macros and build scripts leak their host-specific metadata into wasm-target metadata. This affects projects that use e.g. wasm-bindgen (via its build script and its wasm-bindgen-macros dependency). As a naive strawman, this diff excludes such dependencies and results in the same metadata for a crate that depends on wasm-bindgen, when compiled on either macos or linux:
I am sure that is not an acceptable solution as it excludes any information at all about dependencies that deviate from the "current" compilation kind, but if someone could give me guidance on what would be acceptable I am willing to work toward a patch. Tested with cargo fa64658 |
I think that might actually be along the lines of what to do here? It's been awhile for me and @ehuss would probably know better, but IIRC there's a decoupling of Regardless I think something like that is the way to go, perhaps with some comments around the comparison and maybe only dropping |
Problem
Tock is an operating system written in Rust for embedded platforms (e.g.
thumbv7em-none-eabi
target). We've been trying to make builds reproducible, with good progress (tock/tock#1666). In particular:--remap-path-prefix
,cargo rustc
to avoid passing custom linker arguments (that include paths on the filesystem) to the dependencies.With these we've managed to make the builds reproducible across various Linux machines.
However, the builds are different on Linux (
nightly-2020-02-03-x86_64-unknown-linux-gnu
) and MacOS (nightly-2020-02-03-x86_64-apple-darwin
), as evidenced by CI on a GitHub workflow (google/OpenSK#94 (comment)).In particular, the issue seems to stem from a different
-C metadata=
(and-C extra-filename=
) passed torustc
(while the steps that I mentioned above make sure that the samemetadata
hash is passed across Linux machines for each crate in the project).These are built with the
--verbose
parameter passed tocargo
, and an examplerustc
invocation is the following (tock_cells
crate of the Tock project).rustc --crate-name tock_cells libraries/tock-cells/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C opt-level=z -C panic=abort -C debuginfo=2 -C metadata=92487154152022d3 -C extra-filename=-92487154152022d3 --out-dir /home/runner/work/OpenSK/OpenSK/third_party/tock/target/thumbv7em-none-eabi/release/deps --target thumbv7em-none-eabi -L dependency=/home/runner/work/OpenSK/OpenSK/third_party/tock/target/thumbv7em-none-eabi/release/deps -L dependency=/home/runner/work/OpenSK/OpenSK/third_party/tock/target/release/deps -C link-arg=-Tlayout.ld -C linker=rust-lld -C linker-flavor=ld.lld -C relocation-model=dynamic-no-pic -C link-arg=-zmax-page-size=512 -C link-arg=-icf=all --remap-path-prefix=/home/runner/work/OpenSK/OpenSK/third_party/tock/= -D warnings
rustc --crate-name tock_cells libraries/tock-cells/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C opt-level=z -C panic=abort -C debuginfo=2 -C metadata=9fc982d890c0358d -C extra-filename=-9fc982d890c0358d --out-dir /Users/runner/runners/2.169.0/work/OpenSK/OpenSK/third_party/tock/target/thumbv7em-none-eabi/release/deps --target thumbv7em-none-eabi -L dependency=/Users/runner/runners/2.169.0/work/OpenSK/OpenSK/third_party/tock/target/thumbv7em-none-eabi/release/deps -L dependency=/Users/runner/runners/2.169.0/work/OpenSK/OpenSK/third_party/tock/target/release/deps -C link-arg=-Tlayout.ld -C linker=rust-lld -C linker-flavor=ld.lld -C relocation-model=dynamic-no-pic -C link-arg=-zmax-page-size=512 -C link-arg=-icf=all --remap-path-prefix=/Users/runner/runners/2.169.0/work/OpenSK/OpenSK/third_party/tock/= -D warnings
Steps
git clone https://github.com/tock/tock
cd tock/boards/nordic/nrf52840dk
V=1 make
This will invoke
cargo
with the following parameters:RUSTFLAGS="-C link-arg=-Tlayout.ld -C linker=rust-lld -C linker-flavor=ld.lld -C relocation-model=dynamic-no-pic -C link-arg=-zmax-page-size=512 -C link-arg=-icf=all --remap-path-prefix=/path/to/tock/= " cargo rustc --verbose --target=thumbv7em-none-eabi --package nrf52840dk --bin nrf52840dk --release -- -C link-arg=-L/path/to/tock/boards/nordic/nrf52840dk
. One of the resultingrustc
invocations is detailed above.The exact
/path/to/tock/
depends on the machine, but this has no impact across the Linux machines we've tested.More detailed steps are available in the GitHub workflow (https://github.com/google/OpenSK/pull/94/checks?check_run_id=602439620, https://github.com/google/OpenSK/pull/94/checks?check_run_id=602439631)
Possible Solution(s)
The various steps that we've taken have allowed to obtain the same
-C metadata=hash
parameter across Linux machines, at which points the builds are reproducible across these machines. Given that when runningcargo
in--verbose
mode we observe differentmetadata
hashes, the computation of such hashes is likely the problem.nightly-2020-02-03-x86_64-unknown-linux-gnu
vs.nightly-2020-02-03-x86_64-apple-darwin
)? I'd assume it should only depend on thenightly-2020-02-03
part and the--target thumbv7em-none-eabi
, but not on the OS we compile from.Notes
See also rust-lang/rust#71361 and google/OpenSK#94
Output of
cargo version
(GitHub workflow machines withactions-rs/toolchain@v1
):The text was updated successfully, but these errors were encountered: