Skip to content
This repository has been archived by the owner on Jun 16, 2021. It is now read-only.

OpenSSL 1.1.0i and later segfaults #92

Open
florian-vuillemot opened this issue Apr 4, 2020 · 12 comments
Open

OpenSSL 1.1.0i and later segfaults #92

florian-vuillemot opened this issue Apr 4, 2020 · 12 comments
Assignees

Comments

@florian-vuillemot
Copy link
Contributor

The openssl dependancy seems to have moved from https://www.openssl.org/source/openssl-1.0.1e.tar.gz to https://openssl.org/source/old/1.0.1/openssl-1.0.1.tar.gz.

ERROR '/home/user/Documents/shadow/shadow-plugin-tor/build/openssl-1.0.1e.tar.gz' is not a tarfile
2020-04-04 11:53:53,721 INFO returning code '-1'
@florian-vuillemot
Copy link
Contributor Author

I provided a patch to allow the users to continue working/building. But maybe should we think to migrate on a new version of OpenSSL ?

@robgjansen
Copy link
Member

Yeah, it would be nice to use the new version of OpenSSL. I pushed a PR here: https://github.com/shadow/shadow-plugin-tor/pull/94/files

@florian-vuillemot could you test this new version of OpenSSL on your local machine to see if it works?

@robgjansen
Copy link
Member

Hmm, OK it looks like that new version of OpenSSL failed to pass the CI tests:
#94

So maybe we should first accept the PR for the old version, and then figure out the issues with the newer version.

@robgjansen robgjansen changed the title Error on setup dependencies Migrate to latest version of OpenSSL Apr 6, 2020
@robgjansen
Copy link
Member

I merged the URL change in #93, which continues to use openssl v1.0.1e. Let's use this issue to track updating to something newer, e.g., openssl v1.1.1f.

@sporksmith
Copy link
Contributor

sporksmith commented Apr 7, 2020

Bumping OpenSSL alone as in #94 causes libevent to fail to compile. Bumping libevent as well (to 2.1.11) gets it to compile, but in the Github CI run segfaults. Locally I don't get a segfault, but the simulated processes abort.

Continuing to debug - going to try changing Shadow's emulated abort to abort for real so that I can get a core dump.

Alternatively I suppose it'd be nice if we could get a debuggable core dump from the github run. We'd need to grab the core dump itself, the compiled binaries, and any relevant compiled libraries.

@sporksmith
Copy link
Contributor

gdb hangs trying to load the core.

Running shadow under gdb I was able to get a stack trace. gdb just gives addresses without symbols, but cross-referencing with /proc/x/maps, it looks like the elf loader is involved. Perhaps best to punt on this pending shadow/shadow#738.

Btw I also tried commenting out the crypto overrides - in that case the simulation didn't segfault but seemed to hang for a while and then fail.

I also fixed some type errors in those overrides; that didn't seem to make a difference but I'll send a PR to incorporate them.

@jtracey
Copy link
Contributor

jtracey commented Apr 7, 2020

gdb hangs trying to load the core.

It's likely trying to load each plugin with a scan of the entire linkmap, resulting in quadratic time. It's a known issue with upstream gdb we never got around to fixing.

Running shadow under gdb I was able to get a stack trace. gdb just gives addresses without symbols, but cross-referencing with /proc/x/maps, it looks like the elf loader is involved.

You can find instructions for working with gdb in the Shadow documentation. Basically, you have to use elf-loader functions to only load the symbols you need, otherwise the quadratic operation I mentioned makes it unusable. Let me know if you have any questions.

Perhaps best to punt on this pending shadow/shadow#738.

As a heads up, debugging is actually likely to get more difficult after moving to multi-process. That's actually the primary reason why NS3's DCE opted for creating elf-loader instead of going multi-process. You can read more about that in this paper from them.

@sporksmith
Copy link
Contributor

It's likely trying to load each plugin with a scan of the entire linkmap, resulting in quadratic time. It's a known issue with upstream gdb we never got around to fixing.

Ah, good to know.

Running shadow under gdb I was able to get a stack trace. gdb just gives addresses without symbols, but cross-referencing with /proc/x/maps, it looks like the elf loader is involved.

You can find instructions for working with gdb in the Shadow documentation. Basically, you have to use elf-loader functions to only load the symbols you need, otherwise the quadratic operation I mentioned makes it unusable. Let me know if you have any questions.

Oh cool, I'll take another look using the bt_load helper.

Perhaps best to punt on this pending shadow/shadow#738.

As a heads up, debugging is actually likely to get more difficult after moving to multi-process. That's actually the primary reason why NS3's DCE opted for creating elf-loader instead of going multi-process. You can read more about that in this paper from them.

Fair enough; the linked workarounds should make debugging in the current mode better than I thought, and yes going to multiprocess certainly adds new complexities :).

Thanks!

@sporksmith
Copy link
Contributor

I need to stop for today, but it looks like RSA_new is returning NULL.

@sporksmith
Copy link
Contributor

sporksmith commented Apr 8, 2020

@jtracey any idea why setting breakpoints inside plugin code wouldn't work as expected?

I tried first setting a breakpoint at process_emu_read, since the plugin should have loaded by the time we hit that. Once there I turn on locking, run "p vdl_linkmap_abi_update()", set a breakpoint at RSA_new, turn off locking, and continue. The breakpoint doesn't seem to trigger though, and I get stopped again at an abort, with RSA_new crypto_pk_new in the call stack just after returning from RSA_new.

As a workaround going to try adding a raise(SIGSTOP) in the source...

@jtracey
Copy link
Contributor

jtracey commented Apr 8, 2020

Breakpoints inside plugins get tricky because under the hood of gdb, breakpoints don't actually apply to symbol names or symbols, they apply to memory addresses (specifically, they modify the code stored at the location of that breakpoint). Under normal compiles, each instance of the same plugin shares code pages, to conserve memory. But with debug builds, we give each plugin its own address space, so you can debug multiple instances of a plugin independently. If that's not the behavior you want, and you'd rather not modify the source like you said, you can try removing these #ifndefs:

https://github.com/shadow/shadow/blob/08035fcfe30375ff53d43dd809305c897f127d16/src/external/elf-loader/vdl-map.c#L604
and
https://github.com/shadow/shadow/blob/08035fcfe30375ff53d43dd809305c897f127d16/src/external/elf-loader/vdl-map.c#L722

If you do that, then modifying the code page (e.g., adding a breakpoint) to one instance of a plugin will make the change to all instances. However, you still can't add a breakpoint until after the plugin has been loaded this execution (the specific plugin in the current behavior, or the first plugin if you make those changes), else gdb doesn't know where the address is.

@sporksmith
Copy link
Contributor

sporksmith commented Apr 9, 2020

@jtracey thanks, makes sense.

I pivoted a bit and binary-searched OpenSSL to find the change that broke us. The last release that works is 1.1.0h, which I moved us to in #98 (I accidentally left the wrong version # in the squashed commit message).

I then did a git-bisect between 1.1.0h and 1.1.0i and found the exact commit that broke us: openssl/openssl@bf21fe9

Based on that diff it seems the most likely issues are global initialization order or thread local storage.

@sporksmith sporksmith changed the title Migrate to latest version of OpenSSL OpenSSL 1.1.0i and later segfaults Apr 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants