-
Notifications
You must be signed in to change notification settings - Fork 39
OpenSSL 1.1.0i and later segfaults #92
Comments
I provided a patch to allow the users to continue working/building. But maybe should we think to migrate on a new version of OpenSSL ? |
Yeah, it would be nice to use the new version of OpenSSL. I pushed a PR here: https://github.com/shadow/shadow-plugin-tor/pull/94/files @florian-vuillemot could you test this new version of OpenSSL on your local machine to see if it works? |
Hmm, OK it looks like that new version of OpenSSL failed to pass the CI tests: So maybe we should first accept the PR for the old version, and then figure out the issues with the newer version. |
I merged the URL change in #93, which continues to use openssl v1.0.1e. Let's use this issue to track updating to something newer, e.g., openssl v1.1.1f. |
Bumping OpenSSL alone as in #94 causes libevent to fail to compile. Bumping libevent as well (to 2.1.11) gets it to compile, but in the Github CI run segfaults. Locally I don't get a segfault, but the simulated processes abort. Continuing to debug - going to try changing Shadow's emulated abort to abort for real so that I can get a core dump. Alternatively I suppose it'd be nice if we could get a debuggable core dump from the github run. We'd need to grab the core dump itself, the compiled binaries, and any relevant compiled libraries. |
gdb hangs trying to load the core. Running shadow under gdb I was able to get a stack trace. gdb just gives addresses without symbols, but cross-referencing with /proc/x/maps, it looks like the elf loader is involved. Perhaps best to punt on this pending shadow/shadow#738. Btw I also tried commenting out the crypto overrides - in that case the simulation didn't segfault but seemed to hang for a while and then fail. I also fixed some type errors in those overrides; that didn't seem to make a difference but I'll send a PR to incorporate them. |
It's likely trying to load each plugin with a scan of the entire linkmap, resulting in quadratic time. It's a known issue with upstream gdb we never got around to fixing.
You can find instructions for working with gdb in the Shadow documentation. Basically, you have to use elf-loader functions to only load the symbols you need, otherwise the quadratic operation I mentioned makes it unusable. Let me know if you have any questions.
As a heads up, debugging is actually likely to get more difficult after moving to multi-process. That's actually the primary reason why NS3's DCE opted for creating elf-loader instead of going multi-process. You can read more about that in this paper from them. |
Ah, good to know.
Oh cool, I'll take another look using the
Fair enough; the linked workarounds should make debugging in the current mode better than I thought, and yes going to multiprocess certainly adds new complexities :). Thanks! |
I need to stop for today, but it looks like RSA_new is returning NULL. |
@jtracey any idea why setting breakpoints inside plugin code wouldn't work as expected? I tried first setting a breakpoint at process_emu_read, since the plugin should have loaded by the time we hit that. Once there I turn on locking, run "p vdl_linkmap_abi_update()", set a breakpoint at RSA_new, turn off locking, and continue. The breakpoint doesn't seem to trigger though, and I get stopped again at an abort, with As a workaround going to try adding a raise(SIGSTOP) in the source... |
Breakpoints inside plugins get tricky because under the hood of gdb, breakpoints don't actually apply to symbol names or symbols, they apply to memory addresses (specifically, they modify the code stored at the location of that breakpoint). Under normal compiles, each instance of the same plugin shares code pages, to conserve memory. But with debug builds, we give each plugin its own address space, so you can debug multiple instances of a plugin independently. If that's not the behavior you want, and you'd rather not modify the source like you said, you can try removing these https://github.com/shadow/shadow/blob/08035fcfe30375ff53d43dd809305c897f127d16/src/external/elf-loader/vdl-map.c#L604 If you do that, then modifying the code page (e.g., adding a breakpoint) to one instance of a plugin will make the change to all instances. However, you still can't add a breakpoint until after the plugin has been loaded this execution (the specific plugin in the current behavior, or the first plugin if you make those changes), else gdb doesn't know where the address is. |
@jtracey thanks, makes sense. I pivoted a bit and binary-searched OpenSSL to find the change that broke us. The last release that works is 1.1.0h, which I moved us to in #98 (I accidentally left the wrong version # in the squashed commit message). I then did a git-bisect between 1.1.0h and 1.1.0i and found the exact commit that broke us: openssl/openssl@bf21fe9 Based on that diff it seems the most likely issues are global initialization order or thread local storage. |
The openssl dependancy seems to have moved from https://www.openssl.org/source/openssl-1.0.1e.tar.gz to https://openssl.org/source/old/1.0.1/openssl-1.0.1.tar.gz.
The text was updated successfully, but these errors were encountered: