Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fprintf is non-thread-safe #13194

Closed
emaxx-google opened this issue Jan 6, 2021 · 9 comments · Fixed by #15065
Closed

fprintf is non-thread-safe #13194

emaxx-google opened this issue Jan 6, 2021 · 9 comments · Fixed by #15065

Comments

@emaxx-google
Copy link

emaxx-google commented Jan 6, 2021

Emscripten's implementation of fprintf isn't thread-safe in the sense that lines printed from different threads may interleave with each other, producing merged/corrupted messages in the result.

For example, when running the following program built under Emscripten:

#include <cstdio>
#include <string>
#include <thread>
void thread_main() {
  std::string msg(100, 'a');
  for (int i = 0; i < 10000; ++i)
    fprintf(stderr, "%s\n", msg.c_str());
}
int main() {
  std::thread t1(thread_main);
  std::thread t2(thread_main);
  t1.join();
  t2.join();
}

with piping its result through sort -u, it can be seen that besides the expected 100-character lines there are often 200-/300-character ones - obviously, the lines printed from different threads got collapsed and merged together.

This behavior doesn't seem to conform to the POSIX specification:

All functions that reference (FILE *) objects, except those with names ending in _unlocked, shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of these (FILE *) objects.
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/flockfile.html)

The same program compiled using the normal g++ with the standard libc always produces only the expected 100-line characters on my Linux machine.

@emaxx-google emaxx-google changed the title printf is non-thread-safe fprintf is non-thread-safe Jan 6, 2021
@emaxx-google
Copy link
Author

BTW, it's not just a "cosmetic" corruption of strings, since in case the printed messages are UTF-8, this issue causes them to sometimes become incorrect UTF strings, resulting in making the messages undecipherable and hitting some internal error logs and assertions in Emscripten code.

@emaxx-google
Copy link
Author

emaxx-google commented Jan 6, 2021

Additionally, after running this program several hundreds of times it failed with the following memory corruption error:

Pthread aborting at Error                                                                                                                                                        
    at abort (eval at globalEval (a.out.worker.js:211:10), <anonymous>:1490:70)                                     
    at Object.checkStackCookie (eval at globalEval (a.out.worker.js:211:10), <anonymous>:1269:46)                   
    at onmessage (a.out.worker.js:120:35)
    at MessagePort.<anonymous> (a.out.worker.js:199:5)
    at MessagePort.emit (events.js:315:20)
    at MessagePort.onmessage (internal/worker/io.js:78:8)
    at MessagePort.exports.emitMessage (internal/per_context/messageport.js:11:10)
pthread sent an error! undefined:undefined: abort(Runtime error: The application has corrupted its heap memory area (address zero)!) at Error
    at jsStackTrace (eval at globalEval (a.out.worker.js:211:10), <anonymous>:2125:19)
    at stackTrace (eval at globalEval (a.out.worker.js:211:10), <anonymous>:2142:16)
    at abort (eval at globalEval (a.out.worker.js:211:10), <anonymous>:1497:44)
    at Object.checkStackCookie (eval at globalEval (a.out.worker.js:211:10), <anonymous>:1269:46)
    at onmessage (a.out.worker.js:120:35)
    at MessagePort.<anonymous> (a.out.worker.js:199:5)
    at MessagePort.emit (events.js:315:20)
    at MessagePort.onmessage (internal/worker/io.js:78:8)
    at MessagePort.exports.emitMessage (internal/per_context/messageport.js:11:10)
Proxied main thread 0xa045c0 finished with return code 0. EXIT_RUNTIME=1 set, quitting process.
exitProcess requested by worker
main thead called exit: noExitRuntime=undefined
Runtime error: The application has corrupted its heap memory area (address zero)!
a.out.js:145
      throw ex;

(I don't know if it's related to fprintf or is a separate pthreads issue.)

@emaxx-google
Copy link
Author

Hello, is there any update on this issue? It sounds unfortunate that the print debugging, even recommended as the first debugging means by the official doc (https://emscripten.org/docs/porting/Debugging.html#manual-print-debugging) is actually causing additional issues by itself.

BTW, I ran the repro snippet quoted above under 2.0.13, and after a few runs I got an error about invalid UTF, despite that all strings printed by the program are ASCII:

Invalid UTF-8 leading byte 0x-1 encountered when deserializing a UTF-8 string on the asm.js/wasm heap to a JS string!

So the memory corruption errors quoted above might be true, as this would at least explain how printed ASCII strings could turn into invalid UTF-8.

@kleisauke
Copy link
Collaborator

Thanks for the reproducer! It seems that an initialization was missing for the stdin-, stdout-, stderr lock. Commit d5d5f69 (see work-in-progress PR #13007) fixes this for me.

I'll look into splitting that patch into a separate PR.

@emaxx-google
Copy link
Author

emaxx-google commented Feb 18, 2021

That's great news; thanks for looking into this!

@emaxx-google
Copy link
Author

Hello, is there any timeline for when those fixes (PR #13007 or it follow-ups) can be landed?

For the context, we're looking into rolling out a Wasm version of an existing Chrome extension that previously used multi-threaded NaCl modules. This issue is one of blockers, since it implies that normal debug logs to stdout/stderr can cause random crashes in the field (see the memory corruption log quoted above).

@kleisauke
Copy link
Collaborator

Sorry for the delay. I'm a bit out of bandwidth lately, it would be great if one of the Emscripten maintainers could land that separately, otherwise I'll take a look at it this week.

As a possible workaround, you could use emscripten_log, which does not exhibit this behavior, afaik.

@kripken
Copy link
Member

kripken commented Apr 6, 2021

I had some time today and opened a PR for your fixes for this issue @kleisauke , #13837 . Verified as working perfectly, thanks!

@emaxx-google
Copy link
Author

emaxx-google commented Jul 12, 2021

Hello, can the same issue happen to other string streams (like std::ostringstream)?
I'm wondering because I'm investigating some memory corruption issues, and ASan points me to internals of std::basic_ostream::operator<<(unsigned long long) (with the stream being stack-allocated in one of upper frames, so that it's guaranteed to still exist).

sbc100 added a commit that referenced this issue Sep 17, 2021
This is a slightly modified version of
d5d5f69

The change to __lockfile.c means we always compile in the locking code
but atomics will get lowered away in single threaded builds and the
wait/wake functions are no-ops in this mode too, so it should not effect
code size in release builds.

Because __lockfile.c is not compiled into a multi-thread-aware library
(its not part of an MTLibrary) it was never actually being built with
`__EMSCRIPTEN_PTHREADS__`.

Fixes: #13194
sbc100 added a commit that referenced this issue Sep 17, 2021
This is second attempt at landing a version of
d5d5f69

The first time we tried it #13837 we ran into issues with
test_pthread_exit_process deadlocking which I tracked down to an issue
with `system/lib/libc/musl/src/thread/__wait.c` where it was blocking
the main thread forever rather then looping and calling
`emscripten_main_thread_process_queued_calls`.

Fixes: #13194
sbc100 added a commit that referenced this issue Sep 17, 2021
This is second attempt at landing a version of
d5d5f69

The first time we tried it #13837 we ran into issues with
test_pthread_exit_process deadlocking which I tracked down to an issue
with `system/lib/libc/musl/src/thread/__wait.c` where it was blocking
the main thread forever rather then looping and calling
`emscripten_main_thread_process_queued_calls`.

Includes a version of #14481 so that should land before this does.

Fixes: #13194
sbc100 added a commit that referenced this issue Sep 20, 2021
This is second attempt at landing a version of
d5d5f69

The first time we tried it #13837 we ran into issues with
test_pthread_exit_process deadlocking which I tracked down to an issue
with `system/lib/libc/musl/src/thread/__wait.c` where it was blocking
the main thread forever rather then looping and calling
`emscripten_main_thread_process_queued_calls`.

Includes a version of #14481 so that should land before this does.

Fixes: #13194
sbc100 added a commit that referenced this issue Sep 20, 2021
This is second attempt at landing a version of
d5d5f69

The first time we tried it #13837 we ran into issues with
test_pthread_exit_process deadlocking which I tracked down to an issue
with `system/lib/libc/musl/src/thread/__wait.c` where it was blocking
the main thread forever rather then looping and calling
`emscripten_main_thread_process_queued_calls`.

Includes a version of #14481 so that should land before this does.

Fixes: #13194
sbc100 added a commit that referenced this issue Sep 20, 2021
This is second attempt at landing a version of
d5d5f69

The first time we tried it #13837 we ran into issues with
test_pthread_exit_process deadlocking which I tracked down to an issue
with `system/lib/libc/musl/src/thread/__wait.c` where it was blocking
the main thread forever rather then looping and calling
`emscripten_main_thread_process_queued_calls`.

Includes a version of #14481 so that should land before this does.

Fixes: #13194
kleisauke added a commit to kleisauke/emscripten that referenced this issue Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants