thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4 #15868

bakkot · 2022-01-02T22:55:19Z

This is with emcc version

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.0.2-git
clang version 14.0.0 (https://github.com/llvm/llvm-project.git 1a929525e86a20d0b3455a400d0dbed40b325a13)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /usr/local/opt/emscripten/libexec/llvm/bin

Here's a program:

#include <stdio.h>

#include <atomic>
#include <thread>

int main() {
  printf("hi\n");

  std::atomic<bool> done(false);
  std::thread t([&] {
    printf("other thread\n");
    done = true;
  });
  while (!done) {
    std::this_thread::yield();
  }
  t.join();
  return 0;
}

If built with

emcc main.cc -s EXIT_RUNTIME=1 -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=4 -s EXPORTED_FUNCTIONS=_main -o main.js

and run with node main.js, it will print hi then peg a core forever and never print other thread.

Built with g++ it will reach other thread and then exit.

The docs imply that setting -s PTHREAD_POOL_SIZE=4 should be sufficient to ensure at least one thread can be created on demand, but that does not seem to be the case. Is there any way to get this working?

(This case is reduced from a real example in Z3.)

The text was updated successfully, but these errors were encountered:

kripken · 2022-01-05T19:05:29Z

I think the issue here is that for a pthread to run wasm code we do a postMessage('run'). So this still needs to return to the main event loop because of how JS works. And that busy-wait loop prevents that.

PTHREAD_POOL_SIZE does help by doing postMessage('load') during startup. So the pthread's Worker exists and the JS there is ready. But the wasm is not yet executing. That only happens during "run".

It's not trivial to fix this, but I wonder if we can using the new waitAsync? cc @juj @sbc100 @kleisauke The wasm is not yet executing, but we could use JS atomic operations. That is, during "load" we could make the worker's JS do Atomics.waitAsync on a fixed known address (0?). Then the main JS code would do Atomics.wake instead of postMessaging "run". I think that would wake up the worker without the main JS code needing to return to the event loop.

@bakkot Meanwhile, you can work around this using one of two methods:

-s PROXY_TO_PTHREAD. That will require an extra pthread in total. But then this works since all async stuff is done on the main thread which does nothing else.
-s ASYNCIFY and call emscripten_sleep(0) in the busy-wait loop. That returns to the main event loop manually. Asyncify can add overhead, but in code like this it might not be that bad, if it's just one loop.

sbc100 · 2022-01-05T19:41:38Z

My understanding is that doing postMessage('run') does not require the calling thread to return the event loop. You only need to return to even loop to receive messages, no?

bakkot · 2022-01-05T19:41:55Z

Ah, hm. -s PROXY_TO_PTHREAD works for the reduced example I give, but my actual use case is trying to use a C++ library, not an application. So a more realistic example is

// spawn-thread.cc
#include <stdio.h>

#include <atomic>
#include <thread>

extern "C" int spawn_thread() {
  printf("hi\n");

  std::atomic<bool> done(false);
  std::thread t([&] {
    printf("other thread\n");
    done = true;
  });
  int i = 0;
  while (!done) {
    std::this_thread::yield();
  }
  t.join();
  return 0;
}

// run-spawn-thread.js
'use strict';

let init = require('./spawn-thread.js');

(async () => {
  let mod = await init();
  mod._spawn_thread();
})().catch(e => { console.error(e); process.exit(1); })

Building with

emcc -std=c++17 spawn-thread.cc -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=4-s EXPORTED_FUNCTIONS=_spawn_thread -s MODULARIZE=1 -s EXPORT_NAME=Init -o spawn-thread.js

and running with

node run-spawn-thread.js

hangs as above. And here it's not possible to use PROXY_TO_PTHREAD - trying to do so gives me emcc: error: PROXY_TO_PTHREAD proxies main() for you, but no main exists.

-s ASYNCIFY and call emscripten_sleep(0) in the busy-wait loop. That returns to the main event loop manually. Asyncify can add overhead, but in code like this it might not be that bad, if it's just one loop.

Z3 is a large library, not just this one loop. If it's just overhead in this particular loop, I might be able to get the maintainers to add an ifdef, but I'm hoping this can be made to work without requiring modifications to the library.

I think the issue here is that for a pthread to run wasm code we do a postMessage('run'). So this still needs to return to the main event loop because of how JS works. And that busy-wait loop prevents that.

Can you say more about this? postMessage can be called synchronously; there's no reason you should need to return to the main loop to call it. It can be called as part of the call to std::thread, before that call returns.

I wonder if we can using the new waitAsync

waitAsync is still stage 3; I believe it's still waiting on a second implementation. So probably best not to depend on it yet. That said, if the worker isn't doing anything else, just wait should be fine?

kripken · 2022-01-05T19:57:47Z

@sbc100

My understanding is that doing postMessage('run') does not require the calling thread to return the event loop. You only need to return to even loop to receive messages, no?

That should be the case in theory, yeah, but in practice I've seen the opposite. Reading the docs now, it seems to be documented as well:

After postMessage() is called, the MessageEvent will be dispatched only after all pending execution contexts have finished. For example, if postMessage() is invoked in an event handler, that event handler will run to completion, as will any remaining handlers for that same event, before the MessageEvent is dispatched. MDN link

(@bakkot , this answers one of your questions as well)

@bakkot

And here it's not possible to use PROXY_TO_PTHREAD - trying to do so gives me emcc: error: PROXY_TO_PTHREAD proxies main() for you, but no main exists.

Ah, yes, for a library it won't just work. But you can define a main, even if it does almost nothing. And just make sure to leave the main thread always free, that is, don't run code there (just run enough code to trigger execution on a pthread). Then you'd be getting the same behavior as proxy-to-pthread.

Z3 is a large library, not just this one loop. If it's just overhead in this particular loop, I might be able to get the maintainers to add an ifdef, but I'm hoping this can be made to work without requiring modifications to the library.

Perhaps we could look into std::this_thread::yield automatically calling Asyncify when Asyncify is enabled. Then no changes would be needed in Z3's source.

As for overhead, hard to say... even a lot of loops might be ok. The question is whether anything speed-critical is on the call stack when a yield happens, usually. Measuring is best.

waitAsync is still stage 3; I believe it's still waiting on a second implementation. So probably best not to depend on it yet. That said, if the worker isn't doing anything else, just wait should be fine?

Ah, interesting point... that might work. I don't think we need anything else at that time. But one question is what happens if we never wake it up, and try to shut it down during a blocking wait(), hmm...

kripken · 2022-01-05T19:58:35Z

cc @RReverser as well who has had ideas here - now that I think of it, maybe we've discussed waitAsync in this context before?

bakkot · 2022-01-05T20:14:37Z

Reading the docs now, it seems to be documented as well:

Oh wow, today I learned. And that seems to be per spec, too. (Actually, maybe not per spec; I'm following up in #whatwg)

I wonder why that is...

But you can define a main, even if it does almost nothing. And just make sure to leave the main thread always free, that is, don't run code there (just run enough code to trigger execution on a pthread).

That doesn't appear to actually work, if I've understood you correctly. That is, if I modify the example from my previous comment to include int main() { return 0; }, then I can indeed pass PROXY_TO_PTHREAD to emcc, but the JS code which is calling the exported functions will still hang. Can you say more about how you expect it to work?

But one question is what happens if we never wake it up, and try to shut it down during a blocking wait(), hmm

The thing I'd do is, have the bit you're waiting for signal just "something is happening, pay attention", and have other bits in the SAB you're waiting on which indicate whether that thing is "you should start running" or "you should shut down".

kripken · 2022-01-05T20:48:15Z

I'm following up in #whatwg

Oh, interesting, please update us here on that!

I wonder if that's a rule that made sense in the single-threaded world (no new events before the current one stops), but with workers it's unnecessarily limiting...

but the JS code which is calling the exported functions will still hang. Can you say more about how you expect it to work?

Sorry, I should have been clearer. Yes, just calling an export like that calls it on the main thread - so you are back where you were before. Instead, main() should set up a worker to listen for something, and on the main thread you should just run enough code to wake up the worker. The worker should then call the original export. Then you end up with your "main" code running in a worker.

bakkot · 2022-01-05T21:00:22Z

MDN appears to be wrong both per spec and in practice. See conversation starting here. And indeed I can confirm with an actual test case:

let workerSrc = `
let arr;
self.addEventListener('message', e => {
  switch (e.data.message) {
    case 'init': {
      console.log('worker: got init');
      arr = new Int32Array(e.data.buf);
      Atomics.notify(arr, 0);
      break;
    }
    case 'next': {
      console.log('worker: got next');
      arr[1] = 1;
      break;
    }
  }
});
`;
let blob = new Blob([workerSrc], {type: 'application/javascript'});
let worker = new Worker(URL.createObjectURL(blob));

(async () => {
  let buf = new SharedArrayBuffer(16);
  let arr = new Int32Array(buf);
  worker.postMessage({ message: 'init', buf });

  if (typeof Atomics.waitAsync === 'function') {
    await Atomics.waitAsync(arr, 0, 0).value;
  } else {
    // give the worker time to start
    await new Promise(res => setTimeout(res, 1000));
  }

  console.log('init');
  worker.postMessage({ message: 'next' });

  console.log('now we busy-wait, with arr[1] initially =', arr[1]);
  for (let i = 0; i < 1e8; ++i) {
    if (arr[1] === 1) {
      console.log('done! took until i =', i);
      return;
    }
  }
  console.log('never initialized');
})().catch(e => console.error('error', e));

The busy-wait does finish, in both Chrome and FF, even though it happens immediately after the call to postMessage({ message: 'next' }).

The worker should then call the original export.

By "the original export", do you mean "the JS code which is calling the library functions"? I was hoping I could put those in the main thread, but it's not the end of the world if not.

I'm not sure what you mean by "main() should set up a worker to listen for something", though. Can you sketch this out in psuedocode?

sbc100 · 2022-01-05T22:58:25Z

Yes, its certainly been our experience that using a thread pool should allow new threads to be started (and joined) without return the event loop. I believe we have many tests that depend on this.

For some reason its not working in your example.. I'll take a look. Do you happen to know if the problem is the same on the web and under node?

sbc100 · 2022-01-05T23:06:17Z

I found the issue. The problem is that a pure compute loop is not enough. The main thread also needs to process events queued via shared memory. Adding a call to emscripten_main_thread_process_queued_calls() to this loop (or emscripten_current_thread_process_queued_calls()) will fix the issue.

We automatically do this for blocking operations such as waiting on a mutex of joining an thread, but we don't (and probably should) call this from shed_yield.

sbc100 · 2022-01-05T23:18:01Z

Interestingly if I port this example to use the lower level pthread API the issue doesn't show up. It only seems to apply to C++ std::thread. Not sure why yet.

My modified example:

#include <stdio.h>
#include <thread>
#include <pthread.h>
#include <atomic>
#include <emscripten/threading.h>
#include <emscripten/console.h>

//#define CXX

std::atomic<bool> done(false);

#ifdef CXX
void thread_main() {
#else
void* thread_main(void*) {
#endif
  //printf("other thread\n");
  _emscripten_outf("in thread");
  done = true;
#ifndef CXX
  return NULL;
#endif
}

int main() {
  printf("hi\n");

#ifdef CXX
  std::thread t(thread_main);
#else
  pthread_t t;
  pthread_create(&t, NULL, thread_main, NULL);
#endif
  while (!done) {
    //emscripten_main_thread_process_queued_calls();
    std::this_thread::yield();
  }
#ifdef CXX
  t.join();
#else
  pthread_join(t, NULL);
#endif
  return 0;
}
$ ./emcc test.cpp -sPTHREADS_DEBUG -sEXIT_RUNTIME -sUSE_PTHREADS -sPTHREAD_POOL_SIZE=2
$ node --experimental-wasm-threads --experimental-wasm-bulk-memory a.out.js

RReverser · 2022-01-05T23:42:31Z

Hi, I'm on vacation so can't dig in, but re: postMessage part - per spec, it's supposed to send messages immediately, so it should work even if main loop is blocked.

Unfortunately, I found and reported a bug in Chrome a while back (as well as created whatwg tests upstream) that shows it's violating the spec and not sending messages until the event loop is unblocked. That bug was not prioritised or updated since.

Because of that bug, the only way to synchronously send messages to the main thread from a Worker is via shared memory - for everything else, you need to keep event loop unblocked.

kripken · 2022-01-05T23:54:15Z

Very interesting bunch of comments just now, thanks everyone! I was not aware of most of that, good to know...

I'm not sure what you mean by "main() should set up a worker to listen for something", though. Can you sketch this out in psuedocode?

I mean something like this:

pthread* pthread;

void main() {
  pthread = make_pthread(pthread_main);
}

void pthread_main() {
  while (1) {
    wait();
    doComputation();
  }
}

void doComputation() {
  if (is_main_thread()) {
    pthread->wake_up();
    return;
  }
  .. do the work ..
}

That is, calling doComputation on the main thread will not actually do the work. It will just forward the work to the thread (by waking it up using atomics). This keeps the main thread free, and now code runs in a worker, which is able to do more things in a synchronous manner than the main thread. This is basically the principle behind PROXY_TO_PTHREAD, but implemented manually.

bakkot · 2022-01-06T00:30:46Z

Unfortunately, I found and reported a bug in Chrome a while back (as well as created whatwg tests upstream) that shows it's violating the spec and not sending messages until the event loop is unblocked. That bug was not prioritised or updated since.

For reference, that's this bug, and the tests are here.

Oddly enough, I found that Chrome does correctly send messages synchronously except in message handlers. That is:

let w = new Worker(`data:text/javascript,
postMessage(null);
onmessage = e => {e.data[0] = 1; console.log('got message')};
`);

function wait() {
  let a = new Int32Array(new SharedArrayBuffer(4));
  w.postMessage(a);
  for (let i = 0; i < 1e9; ++i) {
    if (a[0] !== 0) {
     console.log('success');
     return;
    }
  }
  console.log('fail');
}

w.onmessage = () => {
  console.log('Spawned');

  // works:
  setTimeout(wait, 100);

  // does not work:
  // wait();
};

Firefox does the right thing in this case, though, and the issue with std::thread occurs just the same in FF, so that's not the immediate cause of this problem.

bakkot · 2022-01-06T00:51:57Z

@kripken

I mean something like this:

Ah, got it. Unfortunately I don't see an easy way to get the value out of the computation back to my caller in JavaScript (which I'd presumably need to do asynchronously, so I'd ideally arrange for calling _doComputation() or its wrapper to return a promise). Is there any straightforward way to do that?

Never mind, I think I've figured out how to do it with EM_ASM.

sbc100 · 2022-01-06T02:36:16Z

I tracked it down the std::thread wanting to register something with ___cxa_atexit during startup which is currently proxied back to the main thread:

    at _emscripten_proxy_to_main_thread_js (eval at importScripts (/home/sbc/dev/wasm/emscripten/a.out.worker.js:38:16), <anonymous>:2635:11)
    at _atexit (eval at importScripts (/home/sbc/dev/wasm/emscripten/a.out.worker.js:38:16), <anonymous>:2374:14)
    at ___cxa_atexit (eval at importScripts (/home/sbc/dev/wasm/emscripten/a.out.worker.js:38:16), <anonymous>:2382:10)
    at std::__2::__thread_local_data() (<anonymous>:wasm-function[270]:0x6e6a)
    at void* std::__2::__thread_proxy<std::__2::tuple<std::__2::unique_ptr<std::__2::__thread_struct, std::__2::default_delete<std::__2::__thread_struct> >, void (*)()> >(void*) (<anonymous>:wasm-function[44]:0x1042)
    at Object.invokeEntryPoint (eval at importScripts (/home/sbc/dev/wasm/emscripten/a.out.worker.js:38:16), <anonymous>:2295:36)

So this is not really anything to do with postMessage or the event loop.

Two possible solution (I think we should do both):

Try to remove the proxy requirement of _atexit.
Add emscripten_main_thread_process_queued_calls to shed_yield.

sbc100 · 2022-01-06T02:37:10Z

Actually I already have a PR in the works to do (1): #14479

RReverser · 2022-01-06T10:06:30Z

@bakkot Thanks for linking - couldn't find those easily from my phone :)

This change adds a test that demonstrates the issues with std::thread and busy-waiting for them to start (on the main thread). Fixing the issue is left as a followup. See #15868

To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868

sbc100 · 2022-01-07T14:02:48Z

I believe this is fixed in #15897, with an even more comprehensive fix in #15905.

To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868

bakkot mentioned this issue Jan 2, 2022

wasm build: satisfiable problem produces unknown, Z3_solver_get_reason_unknown returns empty string Z3Prover/z3#5746

Closed

bakkot changed the title ~~calling std::this_thread::yield does not allow thread to spawn even with PTHREAD_POOL_SIZE=4~~ thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4a Jan 3, 2022

bakkot changed the title ~~thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4a~~ thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4 Jan 3, 2022

bakkot mentioned this issue Jan 6, 2022

RuntimeError: unreachable with std::thread in a library #15892

Closed

sbc100 mentioned this issue Jan 6, 2022

Add test of busy waiting std::thread to start. NFC #15896

Merged

sbc100 mentioned this issue Jan 7, 2022

Use musl/native version of atexit #15905

Merged

sbc100 closed this as completed Jan 7, 2022

major-mayer mentioned this issue Jun 28, 2022

Creating a webgl context in a webworker #10764

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4 #15868

thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4 #15868

bakkot commented Jan 2, 2022

kripken commented Jan 5, 2022

sbc100 commented Jan 5, 2022

bakkot commented Jan 5, 2022

kripken commented Jan 5, 2022

kripken commented Jan 5, 2022

bakkot commented Jan 5, 2022

kripken commented Jan 5, 2022

bakkot commented Jan 5, 2022

sbc100 commented Jan 5, 2022

sbc100 commented Jan 5, 2022 •

edited

Loading

sbc100 commented Jan 5, 2022

RReverser commented Jan 5, 2022

kripken commented Jan 5, 2022

bakkot commented Jan 6, 2022

bakkot commented Jan 6, 2022 •

edited

Loading

sbc100 commented Jan 6, 2022

sbc100 commented Jan 6, 2022

RReverser commented Jan 6, 2022

sbc100 commented Jan 7, 2022

thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4 #15868

thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4 #15868

Comments

bakkot commented Jan 2, 2022

kripken commented Jan 5, 2022

sbc100 commented Jan 5, 2022

bakkot commented Jan 5, 2022

kripken commented Jan 5, 2022

kripken commented Jan 5, 2022

bakkot commented Jan 5, 2022

kripken commented Jan 5, 2022

bakkot commented Jan 5, 2022

sbc100 commented Jan 5, 2022

sbc100 commented Jan 5, 2022 • edited Loading

sbc100 commented Jan 5, 2022

RReverser commented Jan 5, 2022

kripken commented Jan 5, 2022

bakkot commented Jan 6, 2022

bakkot commented Jan 6, 2022 • edited Loading

sbc100 commented Jan 6, 2022

sbc100 commented Jan 6, 2022

RReverser commented Jan 6, 2022

sbc100 commented Jan 7, 2022

sbc100 commented Jan 5, 2022 •

edited

Loading

bakkot commented Jan 6, 2022 •

edited

Loading