-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread can't spawn synchronously even with PTHREAD_POOL_SIZE=4 #15868
Comments
std::this_thread::yield
does not allow thread to spawn even with PTHREAD_POOL_SIZE=4
I think the issue here is that for a pthread to run wasm code we do a
It's not trivial to fix this, but I wonder if we can using the new @bakkot Meanwhile, you can work around this using one of two methods:
|
My understanding is that doing |
Ah, hm. // spawn-thread.cc
#include <stdio.h>
#include <atomic>
#include <thread>
extern "C" int spawn_thread() {
printf("hi\n");
std::atomic<bool> done(false);
std::thread t([&] {
printf("other thread\n");
done = true;
});
int i = 0;
while (!done) {
std::this_thread::yield();
}
t.join();
return 0;
} // run-spawn-thread.js
'use strict';
let init = require('./spawn-thread.js');
(async () => {
let mod = await init();
mod._spawn_thread();
})().catch(e => { console.error(e); process.exit(1); }) Building with
and running with
hangs as above. And here it's not possible to use
Z3 is a large library, not just this one loop. If it's just overhead in this particular loop, I might be able to get the maintainers to add an ifdef, but I'm hoping this can be made to work without requiring modifications to the library.
Can you say more about this?
|
That should be the case in theory, yeah, but in practice I've seen the opposite. Reading the docs now, it seems to be documented as well:
(@bakkot , this answers one of your questions as well)
Ah, yes, for a library it won't just work. But you can define a main, even if it does almost nothing. And just make sure to leave the main thread always free, that is, don't run code there (just run enough code to trigger execution on a pthread). Then you'd be getting the same behavior as proxy-to-pthread.
Perhaps we could look into As for overhead, hard to say... even a lot of loops might be ok. The question is whether anything speed-critical is on the call stack when a yield happens, usually. Measuring is best.
Ah, interesting point... that might work. I don't think we need anything else at that time. But one question is what happens if we never wake it up, and try to shut it down during a blocking |
cc @RReverser as well who has had ideas here - now that I think of it, maybe we've discussed waitAsync in this context before? |
Oh wow, today I learned. And that seems to be per spec, too. (Actually, maybe not per spec; I'm following up in #whatwg) I wonder why that is...
That doesn't appear to actually work, if I've understood you correctly. That is, if I modify the example from my previous comment to include
The thing I'd do is, have the bit you're waiting for signal just "something is happening, pay attention", and have other bits in the SAB you're waiting on which indicate whether that thing is "you should start running" or "you should shut down". |
Oh, interesting, please update us here on that! I wonder if that's a rule that made sense in the single-threaded world (no new events before the current one stops), but with workers it's unnecessarily limiting...
Sorry, I should have been clearer. Yes, just calling an export like that calls it on the main thread - so you are back where you were before. Instead, |
MDN appears to be wrong both per spec and in practice. See conversation starting here. And indeed I can confirm with an actual test case: let workerSrc = `
let arr;
self.addEventListener('message', e => {
switch (e.data.message) {
case 'init': {
console.log('worker: got init');
arr = new Int32Array(e.data.buf);
Atomics.notify(arr, 0);
break;
}
case 'next': {
console.log('worker: got next');
arr[1] = 1;
break;
}
}
});
`;
let blob = new Blob([workerSrc], {type: 'application/javascript'});
let worker = new Worker(URL.createObjectURL(blob));
(async () => {
let buf = new SharedArrayBuffer(16);
let arr = new Int32Array(buf);
worker.postMessage({ message: 'init', buf });
if (typeof Atomics.waitAsync === 'function') {
await Atomics.waitAsync(arr, 0, 0).value;
} else {
// give the worker time to start
await new Promise(res => setTimeout(res, 1000));
}
console.log('init');
worker.postMessage({ message: 'next' });
console.log('now we busy-wait, with arr[1] initially =', arr[1]);
for (let i = 0; i < 1e8; ++i) {
if (arr[1] === 1) {
console.log('done! took until i =', i);
return;
}
}
console.log('never initialized');
})().catch(e => console.error('error', e)); The busy-wait does finish, in both Chrome and FF, even though it happens immediately after the call to
By "the original export", do you mean "the JS code which is calling the library functions"? I was hoping I could put those in the main thread, but it's not the end of the world if not. I'm not sure what you mean by " |
Yes, its certainly been our experience that using a thread pool should allow new threads to be started (and joined) without return the event loop. I believe we have many tests that depend on this. For some reason its not working in your example.. I'll take a look. Do you happen to know if the problem is the same on the web and under node? |
I found the issue. The problem is that a pure compute loop is not enough. The main thread also needs to process events queued via shared memory. Adding a call to We automatically do this for blocking operations such as waiting on a mutex of joining an thread, but we don't (and probably should) call this from |
Interestingly if I port this example to use the lower level pthread API the issue doesn't show up. It only seems to apply to C++ std::thread. Not sure why yet. My modified example:
|
Hi, I'm on vacation so can't dig in, but re: postMessage part - per spec, it's supposed to send messages immediately, so it should work even if main loop is blocked. Unfortunately, I found and reported a bug in Chrome a while back (as well as created whatwg tests upstream) that shows it's violating the spec and not sending messages until the event loop is unblocked. That bug was not prioritised or updated since. Because of that bug, the only way to synchronously send messages to the main thread from a Worker is via shared memory - for everything else, you need to keep event loop unblocked. |
Very interesting bunch of comments just now, thanks everyone! I was not aware of most of that, good to know...
I mean something like this: pthread* pthread;
void main() {
pthread = make_pthread(pthread_main);
}
void pthread_main() {
while (1) {
wait();
doComputation();
}
}
void doComputation() {
if (is_main_thread()) {
pthread->wake_up();
return;
}
.. do the work ..
} That is, calling |
For reference, that's this bug, and the tests are here. Oddly enough, I found that Chrome does correctly send messages synchronously except in message handlers. That is: let w = new Worker(`data:text/javascript,
postMessage(null);
onmessage = e => {e.data[0] = 1; console.log('got message')};
`);
function wait() {
let a = new Int32Array(new SharedArrayBuffer(4));
w.postMessage(a);
for (let i = 0; i < 1e9; ++i) {
if (a[0] !== 0) {
console.log('success');
return;
}
}
console.log('fail');
}
w.onmessage = () => {
console.log('Spawned');
// works:
setTimeout(wait, 100);
// does not work:
// wait();
}; Firefox does the right thing in this case, though, and the issue with |
Never mind, I think I've figured out how to do it with EM_ASM. |
I tracked it down the std::thread wanting to register something with
So this is not really anything to do with postMessage or the event loop. Two possible solution (I think we should do both):
|
Actually I already have a PR in the works to do (1): #14479 |
@bakkot Thanks for linking - couldn't find those easily from my phone :) |
This change adds a test that demonstrates the issues with std::thread and busy-waiting for them to start (on the main thread). Fixing the issue is left as a followup. See #15868
This change adds a test that demonstrates the issues with std::thread and busy-waiting for them to start (on the main thread). Fixing the issue is left as a followup. See #15868
This change adds a test that demonstrates the issues with std::thread and busy-waiting for them to start (on the main thread). Fixing the issue is left as a followup. See #15868
This change adds a test that demonstrates the issues with std::thread and busy-waiting for them to start (on the main thread). Fixing the issue is left as a followup. See #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
To achieve this we manage the `atexit` functions in native code using existing musl code. This is small step towards a large change to just use musl for all `atexit` handling: #14479. The codesize implications of this change are a mixed bag. In some places we see saving but in other cases the extra export causes a small regression (only when EXIT_RUNTIME=1). In the long, once we land #14479 there should be more code size saving to be had by doing everything on the native side. Fixes #15868
This is with emcc version
Here's a program:
If built with
and run with
node main.js
, it will printhi
then peg a core forever and never printother thread
.Built with
g++
it will reachother thread
and then exit.The docs imply that setting
-s PTHREAD_POOL_SIZE=4
should be sufficient to ensure at least one thread can be created on demand, but that does not seem to be the case. Is there any way to get this working?(This case is reduced from a real example in Z3.)
The text was updated successfully, but these errors were encountered: