Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't compile when using a workspace #117

Closed
Rahix opened this issue Nov 4, 2018 · 24 comments
Closed

Can't compile when using a workspace #117

Rahix opened this issue Nov 4, 2018 · 24 comments
Assignees
Labels
has-local-patch A patch exists but has not been applied to upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem

Comments

@Rahix
Copy link

Rahix commented Nov 4, 2018

After upgrading to the new compiler, I can no longer build my project that is part of a cargo workspace. Building outside of a workspace works. I get the following error:

rustc: /home/rahix/.../rust/src/llvm/lib/IR/Constants.cpp:1512: static llvm::Constant* llvm::ConstantExpr::getCast(unsigned int, llvm::Constant*, llvm::Type*, bool): Assertion `CastInst::castIsValid(opc, C, Ty) && "Invalid constantexpr cast!"' failed.

I get this error even with a completely stripped down file like:

#![no_std]
#![no_main]
#![feature(lang_items, panic_handler)]

#[no_mangle]
pub extern fn main() {
    loop { }
}

pub mod std {
    #[lang = "eh_personality"]
    pub unsafe extern "C" fn rust_eh_personality(
        _state: (),
        _exception_object: *mut (),
        _context: *mut (),
    ) -> () {
    }

    #[panic_handler]
    fn panic(_info: &::core::panic::PanicInfo) -> ! {
        loop { }
    }
}

The only other observation I was able to make is that is does not happen when I remove lto = true from the root Cargo.toml. For reference:

[package]
name = "..."
version = "0.0.0"
authors = ["..."]

[workspace]
members = ["...", "example"]

[profile]

[profile.release]
codegen-units = 1
debug = false
lto = true  # Commenting out this line allows building
opt-level = "s"
@TimNN
Copy link

TimNN commented Nov 4, 2018

Assertion `CastInst::castIsValid(opc, C, Ty) && "Invalid constantexpr cast!"' failed

Ugh, I hit that assertion myself, but for me compiling with codegen-units = 1 fixed the issue. IIRC this issue also happens when compiling with --emit=llvm-ir which makes it more annoying to debug.

My guess (and it really is only a wild guess) would be that this could be something address space related. I'll try to reproduce this again tomorrow and get at least the types involved, which should hopefully help understanding what is going on.

@shepmaster
Copy link
Member

FWIW, it seems incredibly unlikely that workspaces have anything to do with the root of the problem. LTO is more likely, but I've compiled with LTO and not experienced it.

One thing that I have run into before is arbitrary reordering of pieces of the standard library cause LLVM to process things in a specific order and then choke on that. This can make reproducing a problem a very tricky business.

@dylanmckay
Copy link
Member

dylanmckay commented Nov 5, 2018

My guess (and it really is only a wild guess) would be that this could be something address space related.

This is my hunch too. After my initial patch that added support for functions in non-default address spaces (and the subsequent commit of an adaption of it by another LLVM developer), I've hit a few casting related problems in LLVM due to AVR's program memory address space now being set to 1 and that breaking various scattered (but easy to fix) assumptions within LLVM.

The usual culprit - PointerType::getUnqual, which gets the pointer type representing a pointer in the unqualified, default address space zero. Any latent code in LLVM that calls getUnqual which happen to sometimes execute on function pointers will likely bug out in AVR. I've fixed a bunch of these (and a few more that I haven't committed in clang).

@dylanmckay
Copy link
Member

@Rahix

If you have a debugger handy, in gdb you could execute rustc under gdb. LLVM assertion errors automatically break out into the debugger console.

In GDB, type

(gdb) up # go up from the assertion handler functions to the buggy code
(gdb) up
(gdb) print Ty->dump()

This will print the textual IR representation of the type being casted. If it looks like a function pointer, or it contains the text addrspace(1), or both, then @TimNN's wild guess it probably right

@Rahix
Copy link
Author

Rahix commented Nov 5, 2018

IIRC this issue also happens when compiling with --emit=llvm-ir which makes it more annoying to debug.

Yes, noticed that as well :(

FWIW, it seems incredibly unlikely that workspaces have anything to do with the root of the problem. LTO is more likely, but I've compiled with LTO and not experienced it.

I was also pretty irritated when I noticed this. The thing is, it worked when I recreated the project outside the workspace. I was then able to reliably reproduce it in the workspace project and fix it by removing the workspace ...

Maybe in some way related to the fact that you have to specify profiles in the workspace root instead of the crate's Cargo.toml?

If you have a debugger handy

I will try that and report back, thanks!

@TimNN
Copy link

TimNN commented Nov 5, 2018

(gdb) print Ty->dump()

Note that (from my experiences in the past) you may need an unoptimized LLVM for that to work (optimized + debug info is not enough since, IIRC, local variables may have been optimized or gdb may not find methods).

@TimNN
Copy link

TimNN commented Nov 5, 2018

In my case:

$ frame 4
$ p Ty->dump()
void (%"str::traits::{{impl}}::index::{{closure}}"*)*
$ p C->dump()
define internal fastcc void @"core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::Range<usize>>::index::{{closure}}"(...) unnamed_addr addrspace(1) #2 { ... }
$ p opc
llvm::Instruction::BitCast
$ frame 8
$ p I.dump()
call fastcc addrspace(0) void bitcast (void (%"type 0x7fffdc629830"*)* @"core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::Range<usize>>::index::{{closure}}" to void (%0*)*)(%"str::traits::{{impl}}::index::{{closure}}"* noalias nocapture nonnull dereferenceable(6) %4) #3, !noalias !38
$ bt
#4  llvm::ConstantExpr::getCast (oc=47, C=0x7fffdc78bcb8, Ty=0x7fffdc2c0e90, OnlyIfReduced=false) 
    at [...]/src/llvm/lib/IR/Constants.cpp:1512
#5  llvm::ConstantExpr::getWithOperands (this=0x7fffdc78d288, Ops=..., Ty=0x7fffdc2c0e90, OnlyIfReduced=false, SrcTy=0x0)
    at [...]/src/llvm/lib/IR/Constants.cpp:1206
#6  (anonymous namespace)::Mapper::mapValue (this=0x7fffdc003d40, V=0x7fffdc78d288) at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:475
#7  (anonymous namespace)::Mapper::remapInstruction (this=0x7fffdc003d40, I=0x7fffdc793730)
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:873
#8  (anonymous namespace)::Mapper::remapFunction (this=0x7fffdc003d40, F=...) 
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:955
#9  (anonymous namespace)::Mapper::flush (this=0x7fffdc003d40) at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:855
#10 (anonymous namespace)::FlushingMapper::~FlushingMapper (this=0x7fffe8cd9bb0, __in_chrg=<optimized out>)
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:1073
#11 llvm::ValueMapper::mapValue (this=0x7fffe8cda280, V=...) 
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:1098
#12 (anonymous namespace)::IRLinker::run (this=0x7fffe8cd9eb0) 
    at [...]/src/llvm/lib/Linker/IRMover.cpp:1355
#13 llvm::IRMover::move(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, llvm::ArrayRef<llvm::GlobalValue*>, std::function<void (llvm::GlobalValue&, std::function<void (llvm::GlobalValue&)>)>, bool) (this=0x555555a9ab60, Src=std::unique_ptr<llvm::Module> containing 0x0, ValuesToLink=..., AddLazyFor=..., IsPerformingImport=false) 
    at [...]/src/llvm/lib/Linker/IRMover.cpp:1476
#14 (anonymous namespace)::ModuleLinker::run (this=0x7fffe8cda510) 
    at [...]/src/llvm/lib/Linker/LinkModules.cpp:557
#15 llvm::Linker::linkInModule(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, unsigned int, std::function<void (llvm::Module&, llvm::StringSet<llvm::MallocAllocator> const&)>) (this=0x555555a9ab60, Src=std::unique_ptr<llvm::Module> containing 0x0, Flags=0, InternalizeCallback=...)
    at [...]/src/llvm/lib/Linker/LinkModules.cpp:579
#16 LLVMRustLinkerAdd (L=0x555555a9ab60, BC=<optimized out>, Len=<optimized out>)
    at ../rustllvm/Linker.cpp:64
#17 rustc_codegen_llvm::back::lto::fat_lto::{{closure}} ()

@TimNN
Copy link

TimNN commented Nov 5, 2018

A few things to note:

  • The -O3 libcore has some functions defined / referenced as addrspace(0) IR gist
  • All these involve fastcc
  • IIUC, the Rust compiler won't generate fastcc by itself
    • This is probably confirmed by the fact that a -O0 libcore has no fastcc
  • A -O0 libcore also has no addrspace(0) functions / calls
  • Which leads me to my guess: the fastcc optimization is messing up the address space

@TimNN
Copy link

TimNN commented Nov 5, 2018

Minimized:

target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

define internal { i16, i16 } @will_be_tail_call([0 x i8]*) addrspace(1) {
start:
  store [0 x i8]* %0, [0 x i8]** undef, align 1
  ret { i16, i16 } undef
}

define void @main() addrspace(1) personality i32 (...) addrspace(1)* @rust_eh_personality {
start:
  invoke addrspace(1) { i16, i16 } @will_be_tail_call([0 x i8]* undef)
          to label %bb29 unwind label %cleanup

bb29:
  unreachable

cleanup:
  landingpad { i8*, i32 }
          cleanup
  resume { i8*, i32 } undef
}

declare i32 @rust_eh_personality(...) addrspace(1)

If you run this code through opt -O1, you'll notice that will_be_tail_call will get addrspace(0).

@TimNN
Copy link

TimNN commented Nov 5, 2018

The usual culprit - PointerType::getUnqual

I've been skimming the uses of getUnqual in LLVM and found some potentially interesting things (haven't had the time yet to try fixing them).

@dylanmckay: Do you have a link to an example fix patch?

  • In DevirtModule::applyICallBranchFunnel: I don't things this is the one triggered by the repro above (which was minimized from libcore) but this may have caused my original project-repro from which I got the backtrace above.
  • Another suspicious one: InstCombiner::transformCallThroughTrampoline

@dylanmckay
Copy link
Member

Great work.

One way to confirm whether or not fastcc logic is the cause; run llc -print-after-all, when will dump the in-memory program representation after every single LLVM pass. I usually find the name of the function in question and grep the llc -print-after-all logs for it. Usually you'll get ~40 matches lines, or so, then it's just a matter of scrolling up the logs to the previous "*** IR Dump after pass: ***" line

@dylanmckay
Copy link
Member

@dylanmckay: Do you have a link to an example fix patch?

  • r344243 Generalize an IR verifier check to work with non-zero program address spaces
  • My local branch dylanmckay/clang@8e7e3d6 and the changes to clang's lib/CodeGen/CodeGenModule.cpp

@dylanmckay
Copy link
Member

dylanmckay commented Nov 6, 2018

I've been skimming the uses of getUnqual in LLVM and found some potentially interesting things (haven't had the time yet to try fixing them).

Yeah, I am a bit concerned that there's latent bugs hidden deep within LLVM caused by address space assumptions, but on the plus side, the LLVM IR verifier will always catch it and trigger an error and so it'll never silently break.

This is made a bit more likely due to AVR being an experimental backend and so only the experimental buildbots compile it. This means LLVM developers may still introduce uses of getUnqual and friends and break AVR without any feedback loop. Once the backend is no longer experimental, so long as our test suite is good enough, AVR would break the tests and would need to be fixed before the patch could be merged.

@dylanmckay
Copy link
Member

N.B. The only thing stopping me from submitting an RFC to remove the experimental status now is the lack of integration test suite. Once I've finished up the local branch I linked above, I should have integration tests working.

@TimNN
Copy link

TimNN commented Nov 6, 2018

One way to confirm whether or not fastcc logic is the cause; run llc -print-after-all,

Thanks for reminding me of this, I hadn't thought of -print-after-all. I'll try this this evening (CET).

@TimNN
Copy link

TimNN commented Nov 6, 2018

-print-after-all actually pointed me at DeadArgumentEliminationPass, and I think I fixed the issue there.

I'll see if there are other situation in which addrspace(0) is generated for libcore and post a patch afterwards.

Edit: One more in ArgumentPromotion. The culprit was Function::Create both times.

@TimNN
Copy link

TimNN commented Nov 6, 2018

So, I can compile libcore with -O3 now without having any addrspace(0) in there.

However, both issues I've found have two things in common:

(1) They happened due to Function::Create being used to transform a function.
(2) The were located somewhere in lib/Transforms/IPO.

I then did a brief search for Function::Create in lib/Transforms/IPO. Upon a cursory glance, it looks like none of them (except the once I fixed) set the Address Space correctly. I'm gonna assume that they are all wrong since, IIUC, there is no way for executable code to be in any place other than the program data address space. I'll look at them in more details and post a patch.


I took an even shorter look at other uses of Function::Create and saw more suspicious ones in, e.g. /Linker/*. I'll take a look at those as well.


Btw, the addrspace(0) free libcore fixed my repro of the originally reported issue.

@TimNN
Copy link

TimNN commented Nov 6, 2018

Here is the commit fixing all the uses in lib/Transforms/IPO: TimNN/llvm@fce7769

I didn't run any tests on this (yet), but it compiles and in general makes sense, as far as I can tell.

I fixed the calls to Function::Create by doing either of the following:

(1) Copying the address space from the function that was being replaced.
(2) Using the address space from the DataLayout when new function were made up.

@dylanmckay: Can you commit this directly to LLVM or should I post this to reviews.llvm.org?

Edit: I added some (GitHub) comments to the commit about things I noticed / wasn't 100% sure about.

@dylanmckay
Copy link
Member

I've realised that the verifier will not always catch instances of this; it may not come up if the addrspace(0) function is only used in one place, a call or invoke. When creating call or invoke IR instructions, I believe LLVM would have no problem because the address space of the destination will just be taken from the address space of the function, which will also be addrspace(0). I think the verifier will only catch bugs when the frontend generates IR that references an addrspace(1) function in something like a function pointer, like the C++ static constructor section and the array of function pointers included within. In that case, the verifier will notice that originally-valid frontend IR is now invalid because there's now a part of the IR that specifies the functions signature as addrspace(1), and a separate part introduced by LLVM that refers to the same function as addrspace(0). In this case, the verifier fails because the same function signature in two different address spaces is actually two completely separate, non-isomorphic types.

I wrote up LLVM bug PR39573 to track the creation of a purpose-built AVR-specific pass that ensures every single function is in addrspace(1).

@dylanmckay
Copy link
Member

I didn't run any tests on this (yet), but it compiles and in general makes sense, as far as I can tell.

Extremely low chance of regression; there is not a single in-tree LLVM backend that sets program address space to 1 except for AVR, so that for all official backends, it is guaranteed to use the same default behaviour of 0.

@dylanmckay: Can you commit this directly to LLVM or should I post this to reviews.llvm.org?

Either or. reviews.llvm.org is good for code reviews, but you cannot commit via it. If posted to reviews.llvm.org and accepted by somebody else or me, then you'd still need to find somebody with push access to commit it.

It looks like you've got four LLVM commits to your name; you could definitely get push access to LLVM trunk if you wanted it. I like to encourage this because it means your work will come up under your own name (as opposed to a "Patch by" line) and attached to your GitHub so you can win social points if that's your kind of thing. That way it doesn't just look like I did everybody's work at a glance.

Usually after a few LLVM patches, people send an email to Chris and he gives it. I think I got commit access after about 2-3 patches. Completely up to you though, I have no problem committing and/or reviewing the patches

$ git log | grep "Tim Neu"
    Patch by Peter Nimmervoll (@brainlag) and Tim Neumann (@TimNN)
    Patch by Tim Neumann
    Patch by Tim Neumann!
    Patch by Tim Neumann.

@TimNN
Copy link

TimNN commented Nov 7, 2018

Either or. reviews.llvm.org is good for code reviews, but you cannot commit via it.

I know that :) I was mostly asking: Are you comfortable with committing this patch without another review from someone else.

you could definitely get push access to LLVM trunk if you wanted it.

I'm fine with you committing this patch for now, I'll think about getting commit access.

@shepmaster shepmaster added has-reduced-testcase A small LLVM IR file exists that demonstrates the problem has-local-patch A patch exists but has not been applied to upstream LLVM labels Nov 10, 2018
@dylanmckay dylanmckay self-assigned this Dec 1, 2018
@dylanmckay
Copy link
Member

I added a few tests and upstreamed in r349469.

@dylanmckay
Copy link
Member

I guess that now we just need to update LLVM or cherry-pick the patch.

@dylanmckay
Copy link
Member

Cherry-picked the fix in a3b0834.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-local-patch A patch exists but has not been applied to upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem
Projects
None yet
Development

No branches or pull requests

4 participants