-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"memory access out of bounds"/"operation does not support unaligned accesses" when using mutexes/condition_variables #12971
Comments
Is this a regression? (i.e. does the code in question work with older versions of emscripten error-free?) |
I don't know. Basically, I'm migrating codebase that used to work under Native Client; this code was never ran under Emscripten before. |
@emaxx-google did you ever resolve this issue? I'm having basically the same problem. One thing I don't understand is that the docs explicitly say that webasm can handle unaligned access
and I'm pretty sure I am not in fact compiling to asm.js |
Nope, I'm still struggling with this and other random crashes (some of them are more understood, like #13194, others are harder to even localize), and trying to debug it causes even more issues (debugger is useless due to crbug.com/1179045, ASan due to #13205 and https://groups.google.com/g/emscripten-discuss/c/QERnFWUT0_A/m/jMbw2VQ1BwAJ, etc.). So we're pretty much stuck with this, despite that NaCl deprecation requires us to find the solution ~soon. I've kept writing small test programs of increasing compelxity, hoping to catch the bug in a small repro some day... |
Wasm supports unaligned accesses in almost all operations, except for but SIMD and some atomics. To debug that, one option is to print out the address in the crashing function right before the crash. But almost certainly it is a proper crash, and indeed unaligned. Then tracing back to find where it was allocated in the source would be the next step. |
Thanks, but this approach is problematic due to #13194. But, yeah, I tried this a few months ago, and in "lucky" attempts in which the printf output didn't get garbled I was able to get some logs - however the root cause is still unclear (the original source code seems innocent, unless of course I'm still overlooking the bug in it). |
If #13194 is blocking you, then until that is fixed I think you can work around that issue by calling a custom function that wraps around fprintf, and that function always locks as it runs. Conceptually int fprintf_really_locked(FILE, fmt, ...) {
static mutex_t mutex;
mutex.lock();
int ret = fprintf(FILE, fmt, ...);
mutex.unlock();
return ret;
} If you have very many calls to |
Thank you for the suggestion, @kripken. Unfortunately, I already tried that in the past, and it results in hanging DevTools, so that I can't see any logs at all. (I just quickly tested it again, using newer Emscripten 2.0.24 and Chrome 91, and the hanging issue is still there.) |
Sorry to hear that @emaxx-google . Has a devtools issue been filed? (seems odd that devtools would hang when the app works otherwise) cc @pfaffe Separately, is devtools necessary to debug this? If you can run your app in node, that might be easier anyhow. |
I'll try to make a small repro for the devtools issue (IIRC, simple test programs I tried didn't hang, which means the issue is probably related to the memory corruption). Thanks for reminding me of that. The option with running in node was pretty challenging (since my application is using Extensions APIs that need to be stubbed out under Node, and since I'm using the Closure Library which apparently doesn't play with Node nicely), and unfortunately after making all needed workarounds I got the exact same random crashes as I was observing in DevTools. But if you have some tips regarding extracting additional debugging information from Node, them I'm happy to repeat this exercise. |
Hrm, after commenting out Can there be a bug in the |
Another thing that looks suspicious - Can it be that |
That seems to be expected. 81920 is the default stack size in musl. As far as I can see, the DEFAULT_PTHREAD_STACK_SIZE is only applied when you pass a NULL pthread_attr_t* argument, and only on pthread_create. |
I see, thanks. Then I'm out of ideas - how increasing the stack size can lead to memory corruption. I'd appreciate any hints from the Emscripten team for how to proceed with debugging this issue. A side note: DevTools still "hang" when I'm wrapping each printf/std::cerr usage with a mutex, even now that I don't experience crashes. (But this doesn't seem to be easy to repro in a small test program, so still trying to understand what's the essential ingredient for this bug.) |
If increasing the stack size leads to memory corruption, it might be that the corruption exists before but the stack size adjustment happens to lead to it actually being noticeable - bad luck, basically. So ASan is likely our best bet. I spent some time on #13205 now and opened #14611 which I believe fixes it. |
@kripken: Thanks! Looking forward to the fix to #13205. I'm hoping it'll resolve the "AddressSanitizer: nested bug in the same thread, aborting" error that I'm experiencing right now, which renders ASan useless. (Unfortunately, when I tried to cherry-pick your fix, that error was still present, but I'm still hoping that I just did the cherry-picking wrong.) P.S. @pfaffe also helped me today to debug my application, significantly improving the debugging ergonomics (viewable source, locals, even in the stdlib, global memory inspection, etc). Sadly, we didn't manage to find the culprit, so either fixing ASan or using some kind of experimental breakpoint-on-write would probably be needed... |
Note that one can also use the Details
#include <pthread.h>
#include <stdarg.h>
#include <stdio.h>
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int __wrap_printf(const char *format, ...) {
int rc;
va_list args;
va_start(args, format);
pthread_mutex_lock(&mutex);
rc = vprintf(format, args);
pthread_mutex_unlock(&mutex);
va_end(args);
return rc;
}
int __wrap_fprintf(FILE *stream, const char *format, ...) {
int rc;
va_list args;
va_start(args, format);
pthread_mutex_lock(&mutex);
rc = vfprintf(stream, format, args);
pthread_mutex_unlock(&mutex);
va_end(args);
return rc;
}
#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
pthread_t thread[2];
char *char_repeat(int n, char c) {
char *dest = malloc(n + 1);
memset(dest, c, n);
dest[n] = '\0';
return dest;
}
void *thread_main(void *arg) {
char *msg = char_repeat(100, 'a');
for (int i = 0; i < 10; ++i)
printf("%s\n", msg);
free(msg);
return 0;
}
int main() {
int rc = pthread_create(&thread[0], NULL, thread_main, NULL);
assert(rc == 0);
rc = pthread_create(&thread[1], NULL, thread_main, NULL);
assert(rc == 0);
void *thread_rtn = 0;
rc = pthread_join(thread[0], &thread_rtn);
assert(rc == 0);
assert(thread_rtn == 0);
rc = pthread_join(thread[1], &thread_rtn);
assert(rc == 0);
assert(thread_rtn == 0);
return 0;
} $ emcc testcase.c printf-thread-safe.c -o test.js -pthread -sEXIT_RUNTIME -sPTHREAD_POOL_SIZE=2 -Wl,--wrap=printf -Wl,--wrap=fprintf
$ node --experimental-wasm-threads --experimental-wasm-bulk-memory test.js
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
Thanks everyone for the ideas and tips! It seems that the problem was, apparently, the too small stack size on one of my application's threads. I'm still trying to understand how much stack exactly do I need and why it's bigger than when running as native code or under NaCl, and why leaving the default (when not calling Given that the original issue reported here in #12971 is resolved (and appears to have nothing to do with mutexes/condition_variables specifically), I'm closing the bug. If anyone else is affected by similar issues they should probably file separate reports to avoid the confusion. |
In a program that uses std::thread, std::mutex and std::condition_variable, I'm frequently observing crashes with different messages but in a roughly the same piece of code. The exceptions observed so far are:
(1):
(2):
(3):
I'm using Emscripten 2.0.6. I've tried various debugging flags (ASSERTIONS=2, SAFE_HEAP=1, PTHREADS_DEBUG=1, RUNTIME_LOGGING=1, O0, g4), but without luck.
Unfortunately, I don't have a minimal repro yet.
The text was updated successfully, but these errors were encountered: