Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate to the LLVM toolchain #9367

Closed
3 of 4 tasks
thestinger opened this issue Sep 20, 2013 · 53 comments
Closed
3 of 4 tasks

migrate to the LLVM toolchain #9367

thestinger opened this issue Sep 20, 2013 · 53 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. P-low Low priority

Comments

@thestinger
Copy link
Contributor

Especially on Windows, we shouldn't have a dependency on either MinGW or MSVC++. LLVM's toolchain is not entirely mature, but they do have C/C++ working on Windows independently of other toolchains.

http://blog.llvm.org/2013/09/a-path-forward-for-llvm-toolchain-on.html

@brson
Copy link
Contributor

brson commented Oct 1, 2013

nominating. At the least we need to switch our linker away from gcc to something else on windows in order to get away from requiring a mingw install just to run rustc.

@catamorphism
Copy link
Contributor

1.0, high. Windows needs to work.

@alexcrichton
Copy link
Member

I was doing some testing with this recently, and here's all the troubles I ran into:

  • We'd have to have a submodule for clang, compiler-rt, and lld (not that bad)
  • It looks like lld will only build with cmake (we'd have to introduce this dependency)
  • I could not get compiler-rt when not using cmake (it hit a build error)
  • I could not get compiler-rt to build when using cmake (find . -name '*compiler-rt*' came up with nothing interesting)
  • I could not get lld to link a hello-world binary on darwin (lots of complaints about undefined symbols)
  • I could build libcxxabi from LLVM on osx/linux, but it would probably require that we build clang as well. It requires a fairly new C++ compiler (above llvm 3.2 I think?) and some distros are still lagging behind on 3.0 as the official clang version. OSX has a new-enough clang by default, and it appears that freebsd does as well, so we could perhaps do some easy version detection to figure out whether we need to build clang or not.
  • On OSX, we can build libunwind out of the folder in libcxxabi (yay!)
  • On linux, it looks like we may be able to rely on libgcc_eh.a for _Unwind_Resume and friends

I have done little testing on windows, but I believe that the story will be very similar except for figuring out where _Unwind_Resume comes from. I'm also probably just not doing anything correctly, but right now I'm having a hard time of figuring out what...

@vadimcn
Copy link
Contributor

vadimcn commented Jan 11, 2014

FWIW, I've checked out a recent Windows build of lld (from llvm.org/builds), and it could not link objects produced by mingw:

  • in "gnu" mode it did not recognize the file format of COFF .obj files.
  • in "link" mode it asserted about not being able to process "IMAGE_SYM_DEBUG" section.

@vadimcn
Copy link
Contributor

vadimcn commented Feb 2, 2014

@brson, @alexcrichton: Looks like I've succeeded in compiling compiler-rt on mingw. (I hope other platforms will be easier because it was actually intended to be built there). This opens up 2 possible venues for further work:

  • migration of rustc dependencies from libgcc to compiler-rt,
  • using Address Sanitizer instead of (or in addition to) Valgrind.
    Which one do you consider more important?

@vadimcn
Copy link
Contributor

vadimcn commented Feb 2, 2014

cc #749

@alexcrichton
Copy link
Member

Interesting! I think there's no question about moving to compiler-rt instead of libgcc. I'd be more than willing to help you integrate it with the current build system!

How does using compiler-rt instead of libgcc relate to valgrind vs address sanitizer?

@vadimcn
Copy link
Contributor

vadimcn commented Feb 2, 2014

Address sanitizer runtime is a part of compiler-rt. Rust could use compiler-rt just for that, for starters.
Also, please note that we'll still be depending on libgcc for the unwinding library, at least on windows.

@alexcrichton
Copy link
Member

Hm, if we could not compile the asan part of compiler-rt for now I think that may be the best way to go.

Depending on libgcc is fine for unwinding, moving to compiler-rt is the motivation for fixing bugs like #8449.

@vadimcn
Copy link
Contributor

vadimcn commented Feb 2, 2014

asan compiles just fine actually. But using it will require some plumbing in rustc, such as emitting sanitize_address attributes on functions and running asan pass in llvm.

ok I'll look into migration from libgcc.

@alexcrichton
Copy link
Member

Looks like this is essentially done except for lld. From what I've heard, LLVM will not be making much effort to have lld work on linux because ld.gold is quite excellent already, but they're working on getting it to work on windows and darwin which is what would be nice for us.

@alexcrichton
Copy link
Member

Nominating for removal from the 1.0 milestone. I don't think there's much we can do about this except "wait and see"

@thestinger
Copy link
Contributor Author

Agreed, the feasible part is already finished.

@pnkfelix
Copy link
Member

Since work on this is gated on lld (which is moving fast, but nonetheless is not under our control), we cannot let it block the 1.0 milestone.

Removing from 1.0 milestone, and demoting to P-low (since the high priority items have now been addressed).

@pnkfelix pnkfelix added P-low and removed P-high labels Apr 10, 2014
@pnkfelix pnkfelix removed this from the 1.0 milestone Apr 10, 2014
@brson brson removed the I-nominated label Apr 10, 2014
@retep998
Copy link
Member

Migrating away from MinGW completely is actually within the realm of possibility.
By using a combination of MSVC tools and libraries and LLVM tools it is possible to compile a basic Hello World application that is built completely without MinGW.
I can use clang for the C/C++ bits and llvm-ar to create libraries, but it seems lld still doesn't work correctly so I have to use link for the time being.
https://gist.github.com/retep998/e7569294e89c28dea9d2

@thestinger
Copy link
Contributor Author

Rust can't redistribute link.exe and it's still an extra executable to spawn so it would be a step backwards from MinGW-w64, not progress. The current toolchain is open-source and be redistributed. It works fine with MSVC C libraries and Rust can't bind directly to C++ anyway.

@alexchandel
Copy link

As shown in IRC earler, LLD is now capable of dynamically linking Rust programs.

@sanxiyn sanxiyn added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jan 25, 2015
@tamird
Copy link
Contributor

tamird commented Jul 18, 2015

I briefly explored getting rustc built with LLD today, but found the current linker situation complicated enough to discourage me from further work. In summary:

  • In all cases except msvc:
    • rustc uses cc or gcc with -Wl to link stuff. This uses ld under the hood.
  • In the MSVC case:
    • rustc uses link.exe to link stuff, and this logic lives in a parallel universe from everything else.

Note that LLD intentionally implements several command-line interfaces, designed to emulate clang, gcc, and link.exe. These are selected with the -flavor flag. This means that moving to LLD can greatly simplify MSVC support and bring it inline with the other platforms.

There's a final snag:

  • LLVM is migrating to cmake. The "old" build system is deprecated, but still present.
  • rustc builds LLVM using the "old" build system (except in the msvc case).
  • LLD fails to build using LLVM's "old" build system.

This means that in order to use LLD, rustc must depend on cmake.

If anyone is interested in picking up this torch (maybe @alexcrichton or @alexchandel?), see https://github.com/tamird/rust/tree/use-lld.

@alexcrichton
Copy link
Member

@tamird do you just need help getting lld integrated into our build system? Or do you need help getting it integrated into the compiler itself? You may be able to get by just installing lld into your PATH and having the compiler shell out to it. The compiler could possibly just have logic to pass different arguments in that situation.

@tamird
Copy link
Contributor

tamird commented Jul 20, 2015

@alexcrichton yep, that's what I've done for now:

PATH=x86_64-apple-darwin/llvm/bin:$PATH time /usr/bin/make -j4 check-stage1

A bunch of tests are passing (but some are failing), so:

  • how do I get the build system to put lld somewhere rustc can find it? I thought tamird@44f6fb3 would do it, but apparently not
  • should I open a PR? It'd be cool to have a try build with lld.

EDIT: test run just finished, all the failures are: fatal runtime error: Could not unwind stack, error = 5

@tamird
Copy link
Contributor

tamird commented Jul 20, 2015

@eddyb
Copy link
Member

eddyb commented Aug 29, 2015

@tamird Was that before or after @alexcrichton's change to unwinding? The one that made -Z no-landing-pads not emit the panic handler, thus always causing that error when panicking at stage1.
Have you tried check-stage2?

@tamird
Copy link
Contributor

tamird commented Aug 29, 2015

@eddyb that exact error happens on a snapshot (rustc 1.4.0-nightly (d503524 2015-08-29)). See repro instructions https://github.com/tamird/rust_lld

EDIT: those instructions include running ld next to lld, and ld doesn't produce that error, while lld does.

@lygstate
Copy link
Contributor

Microsoft has already opensource it's PDB/CodeView format and preparing implemented it in LLVM, that's would be a great news for lld.

@japaric
Copy link
Member

japaric commented Feb 15, 2016

My experience using lld to link Rust programs on Linux: (Spoilers it didn't work at all)

I started with a hello world program:

$ cat hello.rs
fn main() {
    println!("Hello, world!");
}

Then produced an object file and got the linker arguments used by rustc:

$ rustc -C linker=/usr/bin/false hello.rs
"/usr/bin/false"
"-Wl,--as-needed"
"-Wl,-z,noexecstack"
"-m64"
"-L"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib"
"hello.0.o"
"-o"
"hello"
"-Wl,--gc-sections"
"-pie"
"-nodefaultlibs"
"-L"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib"
"-Wl,-Bstatic"
"-Wl,-Bdynamic"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-fd663c41.rlib"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcollections-fd663c41.rlib"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_unicode-fd663c41.rlib"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/librand-fd663c41.rlib"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-fd663c41.rlib"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc_jemalloc-fd663c41.rlib"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-fd663c41.rlib"
"/home/japaric/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-fd663c41.rlib"
"-l"
"dl"
"-l"
"pthread"
"-l"
"gcc_s"
"-l"
"pthread"
"-l"
"c"
"-l"
"m"
"-l"
"rt"
"-l"
"compiler-rt"

These arguments don't work with lld, so I made the following changes:

  • First argument: -flavor gnu
  • Add /usr/lib to search path: -L /usr/lib, or lld won't be able to find C libraries like libc and libm
  • Change arguments that look like -Wl,$arg to just $arg
  • Add crt1.o and crti.o as input object files or I get these errors undefined symbol: _start and undefined symbol: _init
  • These flags are not accepted so I omitted them:
    • -m64: unknown emulation
    • -pie: unknown argument
    • -nodefaultlibs: unknown argument (but not required I think because lld doesn't link to any library by default)

I ended with a lld call that looks like this:

target=x86_64-unknown-linux-gnu
sysroot=$(rustc --print sysroot)
libdir=${sysroot}/lib/rustlib/${target}/lib

lld \
  -flavor gnu \
  -L /usr/lib \
  --as-needed \
  -z,noexecstack \
  -L ${libdir} \
  hello.0.o \
  /usr/lib/crt1.o \
  /usr/lib/crti.o \
  -o hello \
  --gc-sections \
  -Bstatic \
  -Bdynamic \
  ${libdir}/libstd-fd663c41.rlib \
  ${libdir}/libcollections-fd663c41.rlib \
  ${libdir}/librustc_unicode-fd663c41.rlib \
  ${libdir}/librand-fd663c41.rlib \
  ${libdir}/liballoc-fd663c41.rlib \
  ${libdir}/liballoc_jemalloc-fd663c41.rlib \
  ${libdir}/liblibc-fd663c41.rlib \
  ${libdir}/libcore-fd663c41.rlib \
  -l dl \
  -l pthread \
  -l gcc_s \
  -l c \
  -l m \
  -l rt \
  -l compiler-rt

Then I got the following segfault with the above lld call:

#0 0x0000000000515f35 (/usr/x86_64-pc-linux-gnu/bin/lld+0x515f35)
#1 0x0000000000514006 (/usr/x86_64-pc-linux-gnu/bin/lld+0x514006)
#2 0x000000000051417f (/usr/x86_64-pc-linux-gnu/bin/lld+0x51417f)
#3 0x00007f7f68a41c70 __restore_rt (/usr/x86_64-pc-linux-gnu/bin/../lib/libpthread.so.0+0x10c70)
#4 0x00000000017ba7b3 (/usr/x86_64-pc-linux-gnu/bin/lld+0x17ba7b3)
#5 0x000000000177668e (/usr/x86_64-pc-linux-gnu/bin/lld+0x177668e)
#6 0x0000000001787d00 (/usr/x86_64-pc-linux-gnu/bin/lld+0x1787d00)
#7 0x00000000017dc22d (/usr/x86_64-pc-linux-gnu/bin/lld+0x17dc22d)
#8 0x00000000017651d8 (/usr/x86_64-pc-linux-gnu/bin/lld+0x17651d8)
#9 0x00000000004a137e _init (/usr/x86_64-pc-linux-gnu/bin/lld+0x4a137e)
#10 0x00000000017688b5 (/usr/x86_64-pc-linux-gnu/bin/lld+0x17688b5)
#11 0x00000000004a90f1 _init (/usr/x86_64-pc-linux-gnu/bin/lld+0x4a90f1)
#12 0x000000000045b5ce _init (/usr/x86_64-pc-linux-gnu/bin/lld+0x45b5ce)
#13 0x00007f7f67e21620 __libc_start_main (/usr/x86_64-pc-linux-gnu/bin/../lib/libc.so.6+0x20620)
#14 0x00000000004a8439 _init (/usr/x86_64-pc-linux-gnu/bin/lld+0x4a8439)
Stack dump:
0.      Program arguments: (...)

Interestingly removing the --gc-sections flags "fixes" the segfault and produces a 2MB binary. However, running the binary generates a new segfault.


Other interesting bit is that rustc -g produces an object file that lld can't use:

$ rustc -g -C linker=/usr/bin/false hello.rs
$ lld -flavor gnu hello.0.o -o hello
SHF_MERGE section size must be a multiple of sh_entsize
$ echo $?
1

Version

LLVM Linker Version: 3.9(/var/cache/paludis/distfiles/scm/lld e5f40891704022e82687c317c814377a485d73ba)

Can anyone reproduce these errors? Or perhaps someone can point out a fatal error I may have committed in the above steps?

@nagisa
Copy link
Member

nagisa commented Feb 15, 2016

ELF lld is being rewritten and the old version is not maintained much anymore to my knowledge. I wouldn’t be surprised about it (either version) not quite working yet.

@japaric
Copy link
Member

japaric commented Feb 15, 2016

Aha, that would explain the errors. I'll try again at a later date 😄.

@tamird
Copy link
Contributor

tamird commented Feb 15, 2016

FWIW, I've updated https://github.com/tamird/rust_lld, and things seem to work as before against LLVM HEAD ($(brew --prefix llvm)/bin/llvm-config --version prints 3.9.0svn).

That is, "hello world" works, but panic doesn't (illegal instruction). I'm not sure what you were doing wrong @japaric but I'd be happy to collaborate if you want to use my work as a starting point.

@alexchandel
Copy link

@nagisa It will be fixed eventually right? It'd be a pity to only be able to link mach-o and windows binaries.

FWIW, by having a shell script called link.exe that adds the ./lib folder of ~/i686-pc-windows-msvc (itself a union of ~/i686-pc-windows-msvc-vc6 and ~/i686-pc-windows-msvc-w10) to the LIB env variable and then calls lld-link, (and doing a similar thing for cl.exe for projects with a build.rs,) I can cross-compile basically any Rust project to windows.

@nagisa
Copy link
Member

nagisa commented Feb 15, 2016

@alexchandel they’re rewriting ELF backend from scratch.

@japaric
Copy link
Member

japaric commented Feb 15, 2016

@tamird I think lld works in your case because you are using the mach-o backend (macos) instead of the elf backend (linux). I tried your scripts, I had to modify them to run on Linux, but lld is not being used at all. I think it's because rustc uses cc as a linker, cc calls collect2 (instead of ld) with all the linker arguments and collect2 then calls a prefixed ld (i.e. /usr/bin/x86_64-pc-linux-gnu-ld) on my system. Maybe collect2 is ignoring the -B . flag used in your script?

Anyhow, I intended to add a -Z use-lld flag to rustc that forced the compiler to use lld as a linker as a way to start testing integration with lld but without a way to test locally I won't be able to. If anyone want to give that idea a try I can provide some pointers.

@tamird
Copy link
Contributor

tamird commented Feb 15, 2016

I think it's because rustc uses cc as a linker, cc calls collect2 (instead of ld)...Maybe collect2 is ignoring the -B . flag used in your script?

You must be using GCC; -B is an undocumented flag respected by clang.

Also, note that some of the scripts invoke rustc and lld in two steps, so even on your system, those should be using lld. See https://github.com/tamird/rust_lld/blob/master/manual_dynamic.sh#L11 and https://github.com/tamird/rust_lld/blob/master/manual_static.sh#L11.

@retep998
Copy link
Member

Just saying, if we had #30027 then we wouldn't depend on import libraries from mingw/msvc on Windows.

@alexchandel
Copy link

@retep998 This would mean that, using LLD, you could target windows without a toolchain right? Or would you still need a single msvcrt.lib for the entry point?

@retep998
Copy link
Member

If you don't care about statically linking with C/C++ code then you can easily write your own entry point and skip CRT initialization entirely.

@alexchandel
Copy link

@retep998 So we could link against msvcrt.dll for memcpy/memmove/memcmp/memset/strlen using generated idata, because it maintains those symbols and does its own initialization, but skip CRT initialization in a pure rust executable? I wonder how microsoft got it so wrong, while apple got it so right; their entry point (and crt initialization i.e. crt1.o) is literally inside libSystem.dylib.

@nagisa They've finished rewriting and are working on feature parity now.

@nagisa
Copy link
Member

nagisa commented Apr 1, 2016

working on feature parity now.

I’m not sure I would call that state finished yet, but sure.

@retep998
Copy link
Member

retep998 commented Apr 1, 2016

@alexchandel How is that Microsoft getting it wrong? Is there a reason that statically linking the entry point might be a bad idea? On Windows every binary (exe or dll) has an entry point which is responsible for initializing all the statics and such in that binary. If you're using C/C++ in a given binary, then you need the CRT entry point. Since Rust doesn't have runtime static initializers, or any life outside of main for that matter, as long as you don't call certain CRT functions (such as atexit whose tables are module local!) then a pure Rust binary would have no need for the CRT entry point and could have its own minimal entry point. C/C++ code that is in DLLs, since they have their own entry points, they'll continue to work fine. If you link the CRT statically then you definitely need the CRT entry point since that brings in C/C++ with a bunch of statics and stuff directly into your binary and thus it needs to be initialized correctly.

Whether we link to msvcrt via generated idata or an import library is rather irrelevant to whether we skip the CRT entry point. As long as the CRT is dynamically linked and we don't have any statically linked C/C++ then we can skip the CRT entry point. Note that generating idata instead of linking to an import library is a move that needs to be considered very carefully. Imagine if we generated idata to some of the math functions and someone statically linked in some C/C++ code that calls those math functions and checks the result via errno. If it pulls in the math functions from the generated idata, but the errno from an import library, then they could very well end up being different CRTs and things would fail horribly. Also consider the case of someone trying to use Rust to make a UWP app instead of a regular desktop app. They need to link to onecore.lib instead of things like kernel32.lib so references can resolve to different DLLs. If libstd decides to generate idata for some of the system functions it uses it could end up generating them for the wrong DLL and things would fail horribly.

@alexchandel
Copy link

@retep998 There's more than just the entry point and static initializers in there, and the versioned vcruntime140.dll that the UCRT requires dynamically linked binaries to link has all kinds of stuff, including of exception handling, dispatch, and unwinding. Yet everything that Microsoft here claims must change between compiler versions, OSX has kept permanently stable since the introduction of libSystem by keeping it behind the libSystem facade. That's getting it wrong vs right. Rust executables on OS X, even those statically linked with C/C++ code, don't need anything other than a dynamic link to libSystem.dylib.

It sounds like Rust would have to reason whether any import libraries were included when deciding to generate idata. But in the case where C/C++ code is statically linked, a use case I often deal with on Windows, is it then necessary to use import libraries from a Windows toolchain?

@japaric
Copy link
Member

japaric commented Sep 1, 2016

Update: #36120 is a PoC PR that embeds lld into rustc itself. It has only been tested with x86_64 ELFs though.

@alexcrichton
Copy link
Member

This is a pretty old bug at this point and the only remaining task, lld, is best tracked in a future issue (if necessary)

@alexchandel
Copy link

Is it still the intention to eventually switch to using lld?

@alexcrichton
Copy link
Member

If it makes sense to, sure!

@bstrie
Copy link
Contributor

bstrie commented Feb 17, 2017

I've filed #39915 for tracking LLD integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. P-low Low priority
Projects
None yet
Development

No branches or pull requests

17 participants