Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo >nightly-2018-09-30 crashes on IBM POWER9 ppc64le with Fedora 29 #57345

Closed
llebout opened this issue Jan 4, 2019 · 28 comments · Fixed by #58986
Closed

cargo >nightly-2018-09-30 crashes on IBM POWER9 ppc64le with Fedora 29 #57345

llebout opened this issue Jan 4, 2019 · 28 comments · Fixed by #58986
Labels
O-PowerPC Target: PowerPC processors T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@llebout
Copy link

llebout commented Jan 4, 2019

hello, cargo >nightly-2018-09-30 shipped through rustup crashes on startup on ppc64le.

cargo binary compiled on my machine from sources on master or branch rust-1.32.0 runs fine

crash log

Program received signal SIGILL, Illegal instruction.
0x0000000100000004 in ?? ()
(gdb) bt
#0  0x0000000100000004 in ?? ()
#1  0x00000001008a9650 in rand_pool_acquire_entropy ()
#2  0x00000001008a8e48 in rand_drbg_get_entropy ()
#3  0x00000001008a6618 in RAND_DRBG_instantiate ()
#4  0x00000001008a7868 in drbg_setup ()
#5  0x00000001008a7988 in do_rand_drbg_init_ossl_ ()
#6  0x00007ffff7ef4e08 in __pthread_once_slow (once_control=0x100cfa6f8 <rand_drbg_init>, init_routine=0x1008a78e0 <do_rand_drbg_init_ossl_>)
    at pthread_once.c:116
#7  0x00000001008cc2d8 in CRYPTO_THREAD_run_once ()
#8  0x00000001008a7e30 in RAND_DRBG_get0_public ()
#9  0x00000001008a7f18 in drbg_bytes ()
#10 0x00000001008a9230 in RAND_bytes ()
#11 0x00000001007f0330 in SSL_CTX_new ()
#12 0x0000000100708ec8 in git_openssl_stream_global_init ()
#13 0x00000001006bc1d4 in init_once ()
#14 0x00007ffff7ef4e08 in __pthread_once_slow (once_control=0x100cf830c <_once_init>, init_routine=0x1006bc100 <init_once>) at pthread_once.c:116
#15 0x00000001006bc344 in git_libgit2_init ()
#16 0x0000000100695fd0 in std::sync::once::Once::call_once::{{closure}} ()
#17 0x00000001009f0df4 in std::sync::once::Once::call_inner () at src/libstd/sync/once.rs:387
#18 0x0000000100695f78 in libgit2_sys::init ()
#19 0x0000000100686c90 in git2::config::Config::open_default ()
#20 0x0000000100452de0 in cargo::ops::registry::http_proxy ()
#21 0x000000010045145c in cargo::ops::registry::needs_custom_http_transport ()
#22 0x00000001000c9e78 in cargo::main ()
#23 0x00000001000bf084 in std::rt::lang_start::{{closure}} ()
#24 0x00000001009f2a34 in std::rt::lang_start_internal::{{closure}}::{{closure}} () at src/libstd/rt.rs:49
#25 std::sys_common::backtrace::__rust_begin_short_backtrace () at src/libstd/sys_common/backtrace.rs:135
#26 0x00000001009f501c in std::rt::lang_start_internal::{{closure}} () at src/libstd/rt.rs:49
#27 std::panicking::try::do_call () at src/libstd/panicking.rs:297
#28 0x0000000100a03b94 in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:92
#29 0x00000001009f5ea8 in std::panicking::try () at src/libstd/panicking.rs:276
#30 std::panic::catch_unwind () at src/libstd/panic.rs:388
#31 std::rt::lang_start_internal () at src/libstd/rt.rs:48
#32 0x00000001000cc4d8 in main ()

My machine is a RaptorCS Talos II, it has an IBM POWER9 processor, my system is Fedora 29 with latest updates.
I would guess this is a build environment issue on your end; please indicate how I can help, thanks a lot for your time!

@nagisa
Copy link
Member

nagisa commented Jan 4, 2019

Thanks for the report. Seems very similar rust-lang/cargo#6320 that occurs on big endian...

@llebout
Copy link
Author

llebout commented Jan 4, 2019

Thanks for the report. Seems very similar rust-lang/cargo#6320 that occurs on big endian...

I saw that other issue before; the backtrace is largely different, and recompiling just works so. I am confused as to how it could not build correctly, I built mine with stable 1.30.1 and it's OK. I had to modify the source a little to allow use of experimental clippy features.

@nagisa
Copy link
Member

nagisa commented Jan 4, 2019

recompiling just works

And so it does for the big endian. Why I think these are similar issues is because just like in the other report the pc is somewhere around 0x100000000 or some similar very-round number. Basically it feels to me that we simply embed a bad openssl for either PPC. It could be caused by an older toolchain, older openssl source or some other reason. I do agree that it is likely an infrastructure issue.

@nagisa nagisa added T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. O-PowerPC Target: PowerPC processors labels Jan 4, 2019
@mpe
Copy link

mpe commented Jan 18, 2019

The SIGIILL is happening because the program has branched somewhere that's not code. In fact it branches to 0 (relocated) which is the start of the ELF.

It seems to be coming from here:

   │0x10088715c <rand_pool_acquire_entropy+124>     nop                                                                                │
   │0x100887160 <rand_pool_acquire_entropy+128>     mr      r4,r3                                                                      │
   │0x100887164 <rand_pool_acquire_entropy+132>     beq     cr3,0x1008871c0 <rand_pool_acquire_entropy+224>                            │
   │0x100887168 <rand_pool_acquire_entropy+136>     mr      r4,r31                                                                     │
   │0x10088716c <rand_pool_acquire_entropy+140>     bl      0x100000000                                                                │
  >│0x100887170 <rand_pool_acquire_entropy+144>     nop    

Which is also what we see in the obdump:

00000000008870e0 <rand_pool_acquire_entropy>:
  8870e0:       47 00 4c 3c     addis   r2,r12,71
  8870e4:       20 88 42 38     addi    r2,r2,-30688
  ...
  887158:       b1 f3 ff 4b     bl      886508 <rand_pool_add_begin+0x8>
  88715c:       00 00 00 60     nop
  887160:       78 1b 64 7c     mr      r4,r3
  887164:       5c 00 8e 41     beq     cr3,8871c0 <rand_pool_acquire_entropy+0xe0>
  887168:       78 fb e4 7f     mr      r4,r31
  88716c:       95 8e 77 4b     bl      0

So for some reason we have a branch to address 0, but there's nothing at address 0 and in fact there isn't even code there. Which looks like some sort of toolchain screwup.

If I build openssl and look at the objdump of ./crypto/rand/libcrypto-lib-rand_unix.o I think this matches up with this code (code-gen is not identical):

 230:   78 f3 c4 7f     mr      r4,r30
 234:   78 eb a3 7f     mr      r3,r29
 238:   01 00 00 48     bl      238 <rand_pool_acquire_entropy+0x88>
                        238: R_PPC64_REL24      rand_pool_add_begin
 23c:   00 00 00 60     nop
 240:   78 f3 c4 7f     mr      r4,r30
 244:   01 00 00 48     bl      244 <rand_pool_acquire_entropy+0x94>
                        244: R_PPC64_REL24      getentropy
 248:   00 00 00 60     nop

So it's the call to getentropy that's resulted in the bad branch. And interestingly getentropy is a weak symbol:

$ nm ./crypto/rand/libcrypto-lib-rand_unix.o | grep getentropy
                 w getentropy

And:

extern int getentropy(void *buffer, size_t length) __attribute__((weak));

That leads us to this issue:
openssl/openssl#6591

Which suggests the getentropy call requires GLIBC 2.25 or newer to work.

I don't actually know where or how the rust ppc64le binaries are built. But it looks like they are built against glibc 2.17:

$ readelf -V cargo
...
Version needs section '.gnu.version_r' contains 6 entries:
 Addr: 0x0000000000003458  Offset: 0x003458  Link: 5 (.dynstr)
  000000: Version: 1  File: libdl.so.2  Cnt: 1
  0x0010:   Name: GLIBC_2.17  Flags: none  Version: 9
  0x0020: Version: 1  File: libm.so.6  Cnt: 1
  0x0030:   Name: GLIBC_2.17  Flags: none  Version: 8
  0x0040: Version: 1  File: libgcc_s.so.1  Cnt: 3
  0x0050:   Name: GCC_3.3  Flags: none  Version: 7
  0x0060:   Name: GCC_3.0  Flags: none  Version: 6
  0x0070:   Name: GCC_4.2.0  Flags: none  Version: 5
  0x0080: Version: 1  File: libc.so.6  Cnt: 1
  0x0090:   Name: GLIBC_2.17  Flags: none  Version: 4
  0x00a0: Version: 1  File: libpthread.so.0  Cnt: 1
  0x00b0:   Name: GLIBC_2.17  Flags: none  Version: 3
  0x00c0: Version: 1  File: ld64.so.2  Cnt: 1
  0x00d0:   Name: GLIBC_2.17  Flags: none  Version: 2

Presumably building on a system with GLIBC 2.25 or newer would fix the issue.

@llebout
Copy link
Author

llebout commented Jan 20, 2019

I have GLIBC 2.28 over here, so that's probably it, thanks for the detailed investigation

@infinity0
Copy link
Contributor

cc @jrtc27

<jrtc27> infinity0: ok I've found the problem
<jrtc27> https://sources.debian.org/src/openssl/1.1.1a-1/crypto/rand/rand_unix.c/?hl=127#L286-L287
<jrtc27> on whatever old build machine upstream's cargo was linked against, there was no getentropy symbol in their libc
<jrtc27> however, the linker has done stupid things
<jrtc27> getting the address of getentropy for the function pointer != NULL comparison has remained, as it's loading it from the TOC (PPC's version of the GOT), and there's a dynamic relocation there to fill in the entry
<jrtc27> so on stretch (or other older systems) the entry gets filled in as NULL at runtime and the condition gives false
<jrtc27> but on sid, the entry is filled in with the non-NULL address for the function pointer and the condition gives true, so it then goes to actually call getentropy
<jrtc27> however, the *call* to getentropy is done in the assembly as `bl getentropy`, ie not using the function pointer
<jrtc27> and for whatever reason, the linker didn't create a PLT stub for that, but instead went "oh, call to non-existent weak function, that's a call to 0, ie crash"
<jrtc27> so you get inconsistency
<jrtc27> infinity0: I can reproduce it with jessie's binutils 2.25 using ld.bfd
<jrtc27> but the linker bug appears to have been fixed by 2.28
<jrtc27> tried to find the relevant commit, but there are lots between 2.25 and 2.28 that mention undefined weak symbols, so I give up
<jrtc27> anyway, point is, rust upstream just needs to not just an ancient build environment

@nagisa
Copy link
Member

nagisa commented Feb 10, 2019

cc @rust-lang/infra

Updating the docker images should be fairly easy, I think.

@Mark-Simulacrum
Copy link
Member

cc @alexcrichton

I'm not sure if it's as easy as just updating our images since I suspect that'll break other use cases in practice (we're on them for a reason after all).

@mati865
Copy link
Contributor

mati865 commented Feb 18, 2019

Upgrading only the binutils usually doesn't break compatibility with older systems but without running it on older PPC distro you can't really know.

@mpe
Copy link

mpe commented Feb 19, 2019

Where/how are the builds done currently?

@alexcrichton
Copy link
Member

If it's just a binutils upgrade that's fine to land at any time, and they're located in src/ci/docker/* files which configures all our distribution builds.

ajdlinux added a commit to ajdlinux/rust that referenced this issue Mar 5, 2019
Update to recent binutils to avoid a linker bug that causes crashes when
linking against OpenSSL.

Closes: rust-lang#57345
Signed-off-by: Andrew Donnellan <[email protected]>
cuviper added a commit to cuviper/rust that referenced this issue Mar 7, 2019
Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620
ajdlinux added a commit to ajdlinux/rust that referenced this issue Mar 7, 2019
Update to recent binutils to avoid a linker bug that causes crashes when
linking against OpenSSL.

Closes: rust-lang#57345
Signed-off-by: Andrew Donnellan <[email protected]>
kennytm added a commit to kennytm/rust that referenced this issue Mar 11, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620
kennytm added a commit to kennytm/rust that referenced this issue Mar 15, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620

r? @alexcrichton
sanxiyn added a commit to sanxiyn/rust that referenced this issue Mar 18, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620

r? @alexcrichton
kennytm added a commit to kennytm/rust that referenced this issue Mar 19, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620

r? @alexcrichton
@cuviper
Copy link
Member

cuviper commented Mar 21, 2019

I've confirmed that the official builds are fixed as of rustc 1.35.0-nightly (82e2f3ec2 2019-03-20).

@llebout
Copy link
Author

llebout commented Mar 21, 2019

I confirm too, however, the stable builds are not, could you run a rebuild? Thanks.

@cuviper
Copy link
Member

cuviper commented Mar 21, 2019

The official builds are only updated as part of the release process. I'll nominate my PR for beta and stable, but I'm not aware of any plans for a 1.33.z point release.

@llebout
Copy link
Author

llebout commented Mar 21, 2019

Well the builds don't work at all, it's not a new release, rather a build system update? Don't change the version? I don't know if that's OK with rustup.

@pietroalbini
Copy link
Member

Our build system fetches its configuration from the source code it's building, so to get @cuviper's patch in we'd have to make an actual point release.

@llebout
Copy link
Author

llebout commented Mar 21, 2019

@pietroalbini Okay then, when (approximate) should I expect the next stable release?

@cuviper
Copy link
Member

cuviper commented Mar 21, 2019

You can see projected release dates on this page: https://forge.rust-lang.org/

@llebout
Copy link
Author

llebout commented Mar 21, 2019

Thank you @cuviper
PowerPC will not have working stable releases from rustup until the 11th of April 2019, unless you make a point release. I'd appreciate if you did make one, if the same thing happened on x86, you'd make one.

Thanks for all the great work.

@cuviper
Copy link
Member

cuviper commented Mar 21, 2019

if the same thing happened on x86, you'd make one.

Certainly, but x86 bootstraps everything else -- it cannot be left broken. There is also a notion of tiered support, where i686 and x86_64 are tier 1, and other arches are tier 2, "not guaranteed to produce a working build." (but also "patches are always welcome!" as I have provided in this case.)

Since you're on Fedora, is there a reason you can't just use its rust packages?
(I'm also the package maintainer there.)

@llebout
Copy link
Author

llebout commented Mar 21, 2019

@cuviper
There's no platform with tier1 support that can run without proprietary firmware, or else, old computers.

Fedora packages are not nightly and installing both rustup and Fedora packages is not supported. I often have to switch between stable and nightly for diverse projects with different requirements.

@mati865
Copy link
Contributor

mati865 commented Mar 21, 2019

@Leo-LB it's not officially supported but you can point rustup to your system toolchain as explained here: https://github.com/rust-lang/rustup.rs#working-with-distribution-rust-packages

This way you can make system toolchain the default on and use cargo +nightly build.

@llebout
Copy link
Author

llebout commented Mar 21, 2019

@mati865 Oh cool! Thank you, that will come in handy. I had previously been looking for such a thing and did not find it.

@llebout
Copy link
Author

llebout commented Mar 21, 2019

Now there's another problem, it's that IntelliJ IDEA with Rust support wont install additional components automatically because the "system" toolchain does not support that.

But it's OK, I can code with nightly and compile with stable or nightly as needed.

@kgardas
Copy link

kgardas commented Apr 11, 2019

Thank you @cuviper
PowerPC will not have working stable releases from rustup until the 11th of April 2019, unless you make a point release. I'd appreciate if you did make one, if the same thing happened on x86, you'd make one.

Unfortunately it looks like 1.34 also exhibits this issue.

@cuviper
Copy link
Member

cuviper commented Apr 11, 2019

Correct, the infrastructure team declined the backport: #58986 (comment)

The current beta 1.35 and forward should be fine.

@llebout
Copy link
Author

llebout commented Apr 11, 2019

Correct, the infrastructure team declined the backport: #58986 (comment)

The current beta 1.35 and forward should be fine.

It should have been fixed for 1.34.
It's not like a binutils update requires any kind of stabilization, especially in front of something that does not work.

@pietroalbini
Copy link
Member

Explained the reason why the backport to 1.34 was rejected in a comment in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-PowerPC Target: PowerPC processors T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants