Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: unreachable with std::thread in a library #15892

Closed
bakkot opened this issue Jan 6, 2022 · 16 comments
Closed

RuntimeError: unreachable with std::thread in a library #15892

bakkot opened this issue Jan 6, 2022 · 16 comments

Comments

@bakkot
Copy link
Contributor

bakkot commented Jan 6, 2022

This is with emcc version

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.0.2-git
clang version 14.0.0 (https://github.com/llvm/llvm-project.git 1a929525e86a20d0b3455a400d0dbed40b325a13)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /usr/local/opt/emscripten/libexec/llvm/bin

(I encountered this while worrying about #15868, but it's apparently unrelated.)

Here's a C++ file:

// lib.cc
#include <thread>

extern "C" void async_do_work() {
  std::thread t([&] {
    // do nothing
  });
}

Build with

emcc -std=c++17 lib.cc -s USE_PTHREADS=1 -s EXPORTED_FUNCTIONS=_async_do_work -s MODULARIZE=1 -s EXPORT_NAME=Init -o lib.js

and trying to use it from Javascript:

'use strict';

let init = typeof require === 'function' ? require('./lib.js') : Init;

(async () => {
  let mod = await init();
  mod._async_do_work();
})().catch(e => {
  console.error(e);
  process.exit(1);
});

results in

Tried to spawn a new thread, but the thread pool is exhausted.
This might result in a deadlock unless some threads eventually exit or the code explicitly breaks out to the event loop.
If you want to increase the pool size, use setting `-s PTHREAD_POOL_SIZE=...`.
If you want to throw an explicit error instead of the risk of deadlocking in those cases, use setting `-s PTHREAD_POOL_SIZE_STRICT=2`.
RuntimeError: unreachable
    at wasm://wasm/0001fff2:wasm-function[308]:0x4afb
    at wasm://wasm/0001fff2:wasm-function[331]:0x4e06
    at wasm://wasm/0001fff2:wasm-function[334]:0x4e21
    at wasm://wasm/0001fff2:wasm-function[335]:0x4e33
    at wasm://wasm/0001fff2:wasm-function[223]:0x4493
    at wasm://wasm/0001fff2:wasm-function[27]:0xa13
    at Object._async_do_work (/Users/kevin/Code/emscripten-call-on-thread/lib.js:1803:22)
    at /Users/kevin/Code/emscripten-call-on-thread/main.js:7:7

It evidently is not unreachable.

Note that this doesn't happen if t is leaked - presumably there's something in the std::thread destructor which is problematic.

@gregbuchholz
Copy link

Interestingly enough, if I compile as a native program with g++, using this "driver" program:

#include<iostream>

extern "C" void async_do_work();

int main(int argc, char *argv[]) {

    async_do_work();

    return 0;
}

...that also dies with an error message: "terminate called without an active exception", followed by a core dump. But using join or detach and things are OK for both native and emscripten:

// lib.cc
#include <iostream>
#include <thread>
#include <unistd.h>

extern "C" void async_do_work() {
  std::cerr << "do work" << std::endl;

  std::thread t([&] {
    std::cerr << "in thread" << std::endl;
    // do nothing
  });

  //t.join();
  t.detach();
  
  sleep(1);
  std::cerr << "after" << std::endl;
  return;
}
$ emcc -std=c++17 lib.cc -pthread -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=2 -s EXPORTED_FUNCTIONS=_async_do_work -s MODULARIZE=1 -s EXPORT_NAME=Init -o lib.js

$ node --experimental-wasm-threads --experimental-wasm-bulk-memory driver.js
do work
in thread
after

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

Is it undefined behaviour perhaps to return from the function while the thread is still alive? i.e. won't it result in a use after free if you return from async_do_work without joining? (I think even detaching might be enough?)

Do you still think this is a bug or can it be closed?

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

Actually yes, it looks like either join or detach should be fine, but simply returning can lead to use-after-free.

From https://en.cppreference.com/w/cpp/thread/thread:

(destructor)
 
destructs the thread object, underlying thread must be joined or detached
(public member function)

@bakkot
Copy link
Contributor Author

bakkot commented Jan 6, 2022

It's a call to std::terminate, not UB. And either way I wouldn't expect that to mean it's unreachable.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

I seem like the issue is some kind of use-after-free (the running thread thinks the std::thread object is still around but in fact it has been destroyed due to return from the function), which is UB, and so I don't think there is necessarily anything we can or should do differently here, is there?

We could investigate how control flow finally led to that unreachable, but I'm not that would lead us anywhere useful, since the code is likely accessing invalid memory/free'd memory locations.

@bakkot
Copy link
Contributor Author

bakkot commented Jan 6, 2022

I don't think it's actually a use-after-free? It's fine to return from a thread before joining or detaching it.

int main() {
  std::thread t([&] {
    printf("thread reached\n");
  });
  sleep(1);
  t.join();
}

is definitely well-defined C++.

There is a bug in the program, which is that it's calling the destructor before calling detach or join. But that's not UB; the documented behavior of the destructor when called before the thread has been detached or joined is to call std::terminate, which just calls std::abort unless std::terminate_handler has been set. But that should abort, as in "native code called abort".

Anyway, I agree it's probably not very important, since the program is definitely bugged. But I don't think it's UB.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

Ah.. thank for pointing that out. I didn't see that documentation you liked to.

Assuming this is a bug in emscripten then I agree we should fix it. If the bug exists in upstream libc++ then we should open a bug there.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

(BTW I didn't mean to ever say that returning from a thread entry point was the issue.. I was only ever referring to the calling of the destructor for the thread object which was caused by returning from the function in which it was defined)

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

It looks like std::terminate() is being called as expected. Building with -g (or --profiling-funcs) shows this:

./emcc -g -std=c++17 -s PTHREAD_POOL_SIZE=1 -s EXIT_RUNTIME -s USE_PTHREADS test.cc && node --experimental-wasm-threads --experimental-wasm-bulk-memory ./a.out.js
exiting due to exception: RuntimeError: unreachable,RuntimeError: unreachable
    at abort_message (<anonymous>:wasm-function[310]:0x4f8f)
    at demangling_terminate_handler() (<anonymous>:wasm-function[333]:0x52a8)
    at std::__terminate(void (*)()) (<anonymous>:wasm-function[336]:0x52c3)
    at std::terminate() (<anonymous>:wasm-function[337]:0x52d5)
    at std::__2::thread::~thread() (<anonymous>:wasm-function[225]:0x4925)
    at __original_main (<anonymous>:wasm-function[28]:0xa64)
    at main (<anonymous>:wasm-function[90]:0x2039)
    at /usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:1709:22
    at callMain (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:3553:15)
    at doRun (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:3616:23)

Without -g I see the same thing you do:

$ node --experimental-wasm-threads --experimental-wasm-bulk-memory ./a.out.js  
exiting due to exception: RuntimeError: unreachable,RuntimeError: unreachable
    at <anonymous>:wasm-function[310]:0x4b63
    at <anonymous>:wasm-function[333]:0x4e6e
    at <anonymous>:wasm-function[336]:0x4e89
    at <anonymous>:wasm-function[337]:0x4e9b
    at <anonymous>:wasm-function[225]:0x44fb
    at <anonymous>:wasm-function[28]:0xa47
    at main (<anonymous>:wasm-function[90]:0x1ca5)
    at /usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:1709:22
    at callMain (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:3553:15)
    at doRun (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:3616:23)

I will take a look at why std::terminate doesn't seem to report anything to stderr.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

Looks like this is expected behaviour right now:

#if defined(__EMSCRIPTEN__) && defined(NDEBUG)
// Just trap in a non-debug build. These internal libcxxabi assertions are
// very rare, and it's not worth linking in vfprintf stdio support or
// even minimal logging for them, as we'll have a proper call stack, which
// will show a call into "abort_message", and can help debugging. (In a
// debug build that won't be inlined.)
__builtin_trap();
#else
fprintf(stderr, "libc++abi: ");
va_list list;
va_start(list, format);
vfprintf(stderr, format, list);
va_end(list);
fprintf(stderr, "\n");
#endif

We don't currently have a debug flavor of libc++ so this always ends up calling __builtin_trap which compiles to just unreachable.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

@kripken do you think it worth building a -debug flavor of libc++?

sbc100 added a commit that referenced this issue Jan 6, 2022
Use `abort()` rather than `__builtin_trap()` here so that the use will
see `Aborted(native code called abort())` rather than just
`RuntimeError: unreachable`.

We could also consider building a `-debug` flavor of libc++ but I think
this is still a worthwhile change.

See #15892
@kripken
Copy link
Member

kripken commented Jan 6, 2022

Hmm, a debug variation of a such a large library has downsides. But if it would help debugging issues like this, how about splitting out a tiny library that does have DebugLibrary applied and that only contains the terminate-calling code, or something like that?

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

libc++ is actually a very tiny library... its almost all defined in headers.

@kripken
Copy link
Member

kripken commented Jan 6, 2022

Hmm, it is one of the slowest to build though... Takes 9 seconds on my laptop. Only thing slower is libc.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2022

Interesting, you are right.. there must be some sources there that are pathologically slow to compile. For now I think #15902 is enough.

sbc100 added a commit that referenced this issue Jan 6, 2022
Use `abort()` rather than `__builtin_trap()` here so that the use will
see `Aborted(native code called abort())` rather than just
`RuntimeError: unreachable`.

We could also consider building a `-debug` flavor of libc++ but I think
this is still a worthwhile change.

See #15892
@bakkot
Copy link
Contributor Author

bakkot commented Jan 7, 2022

I'm going to close this on the assumption that #15902 has changed the error from "unreachable" to "abort", which is good enough for me. (I can't test myself, unfortunately.)

@bakkot bakkot closed this as completed Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants