-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc generates invalid DWARF when LTO is enabled #66118
Comments
Some observations:
|
I saw a similar error in dwarf parsing tool I was working on when LTO did inlining across compilation units which the tool incorrectly assumed wasn't possible. Could it be something similar here @fitzgen ? |
I came across this problem recently try to use here is a piece of info by
the address According to the dwarf4 specification, two die link by BTW, will config |
Any update on this? |
Having this still, only in riscv64 architecture when using rust 1.43.0 when compiling mozjs (part of firefox):
Once again, setting |
I am starting to experiencing the same thing with rust packages in Debian (grcov & the new version of fd-find) |
Exporting
|
@sylvestre: To workaround this option mismatch, you could try using Cargo's env vars to specify the LTO mode, which should prevent Cargo from emitting |
Nice to see you here @jryans ;) |
I have attempted to reproduce this issue following the steps in the dwarf-test sample repo, but everything seems to work correctly for me. I tested on Ubuntu 18.04 for x86_64 with both Rust stable (1.49.0) and nightly versions. Do the original steps still reproduce for you, @hannobraun? Are there additional steps or details I might be missing? |
Since the issue seems to focus on |
Thanks for looking into this, @jryans. I can confirm that the issue is no longer reproducible using the dwarf-test repo. I don't think there were any additional steps beyond what's documented in the README. I'm no longer involved in the project where the original issue (freezing GDB, see issue description) occurred, so I'm not set up to verify things there (it was a client project I was working on at the time, and I don't think it remains active). I assume that whatever fixed the DWARF errors fixed the original issue too. As far as I'm concerned, this issue can be closed, but maybe the others who posted here can confirm that the issue is fixed for them too. |
Hi, all. I create a not so minimized reproducing program: use tokio::runtime::Builder;
use tonic::client::Grpc;
use tonic::codec::ProstCodec;
use tonic::codegen::http::uri::PathAndQuery;
use tonic::transport::Channel;
use tonic::IntoRequest;
#[derive(::prost::Message)]
pub struct Proto;
fn main() {
let f = async {
let client = unsafe { &mut *(0x1usize as *mut Grpc<Channel>) };
let _: Result<tonic::Response<Proto>, tonic::Status> = client
.unary(
Proto.into_request(),
PathAndQuery::from_static(""),
ProstCodec::<Proto, Proto>::default(),
)
.await;
};
let rt = Builder::new_current_thread().build().unwrap();
rt.block_on(f);
} Dependency lock file can be found at https://github.com/sticnarf/dwarf-error. If compiled with Running gimli's dwarf-validate on the produced binary by Rust 1.55.0 on x86-64 Linux will output:
I can also confirm the binary compiled from this repo on Linux ARM is also affected by this issue. Because the binary uses prost, it includes lots of dependecies. But I cannot reproduce this issue in a simpler program. |
Starting to debug this: RUSTFLAGS="--emit llvm-ir,obj" cargo build --release
../../gimli/target/debug/examples/dwarf-validate target/release/deps/dwarf_error-37590773ac18128b.o
DWARF error in target/release/deps/dwarf_error-37590773ac18128b.o: Invalid intra-unit reference in unit 0x0 from DIE 0x37ea9 to 0x352f3
etc
llc target/release/deps/dwarf_error-37590773ac18128b.ll -filetype=obj -O0 -o fail.o
../../gimli/target/debug/examples/dwarf-validate fail.o
DWARF error in fail.o: Invalid intra-unit reference in unit 0x0 from DIE 0x3bbaf to 0x38812
etc So I need to work on reducing that LLVM-IR now. I expect that the only relevant parts of it will be the debuginfo, which I don't think bugpoint can help with, so this might take some time. The original error in https://github.com/hannobraun/dwarf-test can be reproduced using older rust versions (e.g. 1.38) and the DWARF error looks similar. |
This seems to be #46346 again. It has all the same symptoms (multiple CUs in the IR, and a function in one CU is being inlined in another CU). That bug occurred for ThinLTO, and we never fixed it in LLVM. Instead, we changed rust's ThinLTO to put all the IR in a single CU ( #46772). But in this issue, we still have multiple CUs, so the bug can still occur. I'm assuming that it doesn't make sense to try the same workaround again because these really are separate CUs. Additionally, this doesn't seem to be a problem for C++, so we must be emitting debuginfo differently from C++ to trigger this bug in LLVM. I suspect that part of the problem is that the function definitions are within namespaces, and if I recall correctly, C++ places the declaration in the namespace, but not the definition. The namespace can still be determined because |
Simplified IR: https://gist.github.com/philipc/59f8c1acf6c55ffdbf917c2c95d80772 Neither of these have namespaces, so that isn't the problem, but I still think that the declaration vs definition distinction is likely to be significant. See https://reviews.llvm.org/D107076 (which reverts a patch that caused definitions to be shared for C++ and which was causing the same problem we are seeing), and https://reviews.llvm.org/D94976#2508262 (describes a bit about declaration and definitions). So I think this is a problem that upstream LLVM wants to fix too, but it's not easy to fix. Rust reproduction
pub struct Xxxxx;
impl Xxxxx {
#[inline(never)]
pub fn pppp() {
// The debuginfo for this inlined function call expects `pppp`
// to be in the same unit as the declaration of `qqqq`.
qqqq();
}
}
#[inline(always)]
fn qqqq() {
String::from("s");
}
struct Yyyyy {
// This moves the debuginfo for `Xxxxx` into this unit,
// including its methods.
_xxxx: lib::Xxxxx,
}
impl Yyyyy {
fn rrrr() {
lib::Xxxxx::pppp();
}
}
fn main() {
// Use `Yyyyy` and `Xxxxx::pppp`.
Yyyyy::rrrr();
} rustc --crate-name lib lib.rs --crate-type lib --edition=2018 -C codegen-units=1 -C debuginfo=2 --out-dir out
rustc --crate-name main main.rs --crate-type bin --edition=2018 -C lto -C codegen-units=1 -C debuginfo=2 --out-dir out --extern lib=out/liblib.rlib
llvm-dwarfdump -verify out/main |
…nagisa Work around invalid DWARF bugs for fat LTO This PR applies the same workaround in rust-lang#46772 to fat LTO. It seems to fix the bug reported in rust-lang#66118 (comment).
…nagisa Work around invalid DWARF bugs for fat LTO This PR applies the same workaround in rust-lang#46772 to fat LTO. It seems to fix the bug reported in rust-lang#66118 (comment).
This seems to have been fixed in LLVM 16, as the reproducer has no errors with 1.70-beta or 1.71-nightly. However, I think that fix is also related to the new errors in #109730 and #109934 -> llvm/llvm-project#61932, which may lead to a revert in LLVM 16. But I have a rustc fix for that in #111167, and I confirmed that this reproducer still look ok with that change too. So, depending on how that shakes out, we may be almost ready to close this... |
I think this should be done, but it could use a regression test. @rustbot label +E-needs-test |
Maybe #46772 is no longer necessary. |
See #111364 for an attempt to remove that workaround. |
I think #114760 is sufficient as a test case. |
When combining
lto = true
withdebug = true
in Cargo.toml, the compiler generates invalid DWARF, according to Gimli.I've created a minimal example here: https://github.com/hannobraun/dwarf-test
If you follow the instructions in the example's README, you should see something like this:
I've seen this first in a larger project. There, the combination of
lto = true
anddebug = true
also causes GDB to freeze when reading the symbols from the binary. I wasn't able to reproduce this in my minimal example, but in the larger project, all binaries that caused GDB to freeze produced these DWARF errors, while I didn't see these errors in the binaries that worked normally.So while I'm not 100% sure that these DWARF errors are a real problem, I suspect that they're the cause of the GDB freezes I was seeing.
The text was updated successfully, but these errors were encountered: