Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_pthread_abort is failing on macOS & flaky on linux #15014

Open
sbc100 opened this issue Sep 9, 2021 · 3 comments
Open

test_pthread_abort is failing on macOS & flaky on linux #15014

sbc100 opened this issue Sep 9, 2021 · 3 comments

Comments

@sbc100
Copy link
Collaborator

sbc100 commented Sep 9, 2021

Its not clear what changed but this test recently started failing more consistently:

https://chromium-review.googlesource.com/c/emscripten-releases/+/3151159
https://ci.chromium.org/ui/p/emscripten-releases/builders/try/mac/b8836587739777488289/overview

======================================================================
FAIL: test_pthread_abort (test_core.wasm0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/opt/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/opt/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/opt/s/w/ir/k/install/emscripten/tests/common.py", line 188, in decorated
    f(self, *args, **kwargs)
  File "/opt/s/w/ir/k/install/emscripten/tests/test_core.py", line 2361, in test_pthread_abort
    self.do_run_in_out_file_test('pthread/test_pthread_abort.c', assert_returncode=NON_ZERO)
  File "/opt/s/w/ir/k/install/emscripten/tests/common.py", line 1018, in do_run_in_out_file_test
    self._build_and_run(srcfile, expected, **kwargs)
  File "/opt/s/w/ir/k/install/emscripten/tests/common.py", line 1074, in _build_and_run
    self.assertContained(expected_output, js_output)
  File "/opt/s/w/ir/k/install/emscripten/tests/common.py", line 744, in assertContained
    self.fail("Expected to find '%s' in '%s', diff:\n\n%s\n%s" % (
  File "/opt/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 753, in fail
    raise self.failureException(msg)
AssertionError: Expected to find 'onAbort called
' in 'undefined
pthread sent an error! undefined:undefined: abort(undefined) at Error
    at jsStackTrace (eval at importScripts (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:38:16), <anonymous>:2232:19)
    at stackTrace (eval at importScripts (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:38:16), <anonymous>:2249:16)
    at abort (eval at importScripts (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:38:16), <anonymous>:1630:44)
    at _abort (eval at importScripts (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:38:16), <anonymous>:2492:7)
    at <anonymous>:wasm-function[28]:0x909
    at main (<anonymous>:wasm-function[29]:0x911)
    at eval (eval at importScripts (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:38:16), <anonymous>:1700:22)
    at ___call_main (eval at importScripts (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:38:16), <anonymous>:2259:24)
    at <anonymous>:wasm-function[117]:0x4d4b
    at Object.invokeEntryPoint (eval at importScripts (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:38:16), <anonymous>:2227:32)
    at self.onmessage (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:144:48)
    at MessagePort.<anonymous> (/opt/s/w/ir/tmp/t/tmpw4m97k_f/emscripten_test_wasm0_hsj_lnsx/test_pthread_abort.worker.js:24:5)
    at MessagePort.[nodejs.internal.kHybridDispatch] (internal/event_target.js:354:41)
    at MessagePort.exports.emitMessage (internal/per_context/messageport.js:18:26)
@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 2, 2021

Seeing this flake on linux too: https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket/8828967471954354369/+/steps/Emscripten_testsuite__AddressSanitizer_/0/stdout

======================================================================
FAIL: test_pthread_abort (test_core.asan)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/b/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/b/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/b/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/b/s/w/ir/k/install/emscripten/tests/common.py", line 197, in decorated
    f(self, *args, **kwargs)
  File "/b/s/w/ir/k/install/emscripten/tests/test_core.py", line 2476, in test_pthread_abort
    self.do_run_in_out_file_test('pthread/test_pthread_abort.c', assert_returncode=NON_ZERO)
  File "/b/s/w/ir/k/install/emscripten/tests/common.py", line 1051, in do_run_in_out_file_test
    self._build_and_run(srcfile, expected, **kwargs)
  File "/b/s/w/ir/k/install/emscripten/tests/common.py", line 1107, in _build_and_run
    self.assertContained(expected_output, js_output)
  File "/b/s/w/ir/k/install/emscripten/tests/common.py", line 780, in assertContained
    self.fail("Expected to find '%s' in '%s', diff:\n\n%s\n%s" % (
  File "/b/s/w/ir/cipd_bin_packages/cpython3/lib/python3.8/unittest/case.py", line 753, in fail
    raise self.failureException(msg)
AssertionError: Expected to find 'onAbort called
' in 'Aborted(native code called abort())
worker.js onmessage() captured an uncaught exception: RuntimeError: Aborted(native code called abort())
pthread sent an error! undefined:undefined: Aborted(native code called abort())

/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.js:173
   throw ex;
   ^
Error [RuntimeError]: Aborted(native code called abort())
    at abort (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:1424:10)
    at _abort (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:4809:2)
    at __original_main (<anonymous>:wasm-function[54]:0x13e0)
    at main (<anonymous>:wasm-function[56]:0x13f2)
    at eval (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:1450:20)
    at ___call_main (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:2081:19)
    at _main_thread (<anonymous>:wasm-function[918]:0x2b066)
    at __asan::AsanThread::ThreadStart(unsigned long long) (<anonymous>:wasm-function[315]:0xfbb4)
    at __asan::asan_thread_start(void*) (<anonymous>:wasm-function[358]:0x11f6d)
    at Object.invokeEntryPoint (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:2041:31)
    at self.onmessage (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:139:48)
    at MessagePort.<anonymous> (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:24:5)
    at MessagePort.[nodejs.internal.kHybridDispatch] (internal/event_target.js:354:41)
    at MessagePort.exports.emitMessage (internal/per_context/messageport.js:18:26)
Emitted 'error' event on process instance at:
    at emitUnhandledRejectionOrErr (internal/event_target.js:545:11)
    at MessagePort.[nodejs.internal.kHybridDispatch] (internal/event_target.js:358:9)
    at MessagePort.exports.emitMessage (internal/per_context/messageport.js:18:26)
Thrown at:
    at /b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.js:173:4
    at emit (events.js:315:20)
    at internal/process/execution.js:163:25
', diff:

--- expected
+++ actual
@@ -1,2 +1,31 @@
-onAbort called
+Aborted(native code called abort())
+worker.js onmessage() captured an uncaught exception: RuntimeError: Aborted(native code called abort())
+pthread sent an error! undefined:undefined: Aborted(native code called abort())

+/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.js:173
+   throw ex;
+   ^
+Error [RuntimeError]: Aborted(native code called abort())
+    at abort (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:1424:10)
+    at _abort (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:4809:2)
+    at __original_main (<anonymous>:wasm-function[54]:0x13e0)
+    at main (<anonymous>:wasm-function[56]:0x13f2)
+    at eval (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:1450:20)
+    at ___call_main (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:2081:19)
+    at _main_thread (<anonymous>:wasm-function[918]:0x2b066)
+    at __asan::AsanThread::ThreadStart(unsigned long long) (<anonymous>:wasm-function[315]:0xfbb4)
+    at __asan::asan_thread_start(void*) (<anonymous>:wasm-function[358]:0x11f6d)
+    at Object.invokeEntryPoint (eval at importScripts (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:38:16), <anonymous>:2041:31)
+    at self.onmessage (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:139:48)
+    at MessagePort.<anonymous> (/b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.worker.js:24:5)
+    at MessagePort.[nodejs.internal.kHybridDispatch] (internal/event_target.js:354:41)
+    at MessagePort.exports.emitMessage (internal/per_context/messageport.js:18:26)
+Emitted 'error' event on process instance at:
+    at emitUnhandledRejectionOrErr (internal/event_target.js:545:11)
+    at MessagePort.[nodejs.internal.kHybridDispatch] (internal/event_target.js:358:9)
+    at MessagePort.exports.emitMessage (internal/per_context/messageport.js:18:26)
+Thrown at:
+    at /b/s/w/ir/tmp/t/tmpgev2f0j7/emscripten_test_asan_o9_3f94p/test_pthread_abort.js:173:4
+    at emit (events.js:315:20)
+    at internal/process/execution.js:163:25
+

@sbc100 sbc100 changed the title test_pthread_abort is failing on macOS test_pthread_abort is failing on macOS & flaky on linux May 25, 2022
@kleisauke
Copy link
Collaborator

Still flaky: https://circleci.com/gh/emscripten-core/emscripten/626723.

This might be a race condition between the onAbort handler call:

emscripten/src/preamble.js

Lines 435 to 439 in aef5d39

#if expectToReceiveOnModule('onAbort')
if (Module['onAbort']) {
Module['onAbort'](what);
}
#endif

And when it throws this exception:

emscripten/src/preamble.js

Lines 491 to 494 in aef5d39

// Throw the error whether or not MODULARIZE is set because abort is used
// in code paths apart from instantiation where an exception is expected
// to be thrown when abort is called.
throw e;

This would be fine on the main runtime thread, but when ENVIRONMENT_IS_PTHREAD is true, the onAbort handler is asynchronous proxied back to the main thread using a postMessage mechanism:

// When running on a pthread, none of the incoming parameters on the module
// object are present. Proxy known handlers back to the main thread if specified.
var handlers = [];
var knownHandlers = [
#if expectToReceiveOnModule('onExit')
'onExit',
#endif
#if expectToReceiveOnModule('onAbort')
'onAbort',
#endif

emscripten/src/worker.js

Lines 160 to 162 in aef5d39

Module[handler] = function() {
postMessage({ cmd: 'callHandler', handler, args: [...arguments] });
}

So, it might be possible that the pthread propagated the uncaught exception back to the main thread before the callHandler command has arrived.

sbc100 added a commit that referenced this issue Jun 28, 2023
This change is designed fix address pthread thread flakiness around
process termination.  For example see #19683 and #15014.

In some cases I believe that workers are being terminated while they
have postMessage messages in their queue, so messages are getting lost.

In other cases we are seeing message arrive after the call to
`terminate()`.

This change attempts to avoid using `terminate()` at all and attempts
to cooperatively shut down the worker.  We only use `terminate()` if the
thread doesn't respond in a certain amount of time.
sbc100 added a commit that referenced this issue Jun 28, 2023
This change is designed fix address pthread thread flakiness around
process termination.  For example see #19683 and #15014.

In some cases I believe that workers are being terminated while they
have postMessage messages in their queue, so messages are getting lost.

In other cases we are seeing message arrive after the call to
`terminate()`.

This change attempts to avoid using `terminate()` at all and attempts
to cooperatively shut down the worker.  We only use `terminate()` if the
thread doesn't respond in a certain amount of time.
sbc100 added a commit that referenced this issue Jun 29, 2023
This change is designed fix address pthread thread flakiness around
process termination.  For example see #19683 and #15014.

In some cases I believe that workers are being terminated while they
have postMessage messages in their queue, so messages are getting lost.

In other cases we are seeing message arrive after the call to
`terminate()`.

This change attempts to avoid using `terminate()` at all and attempts
to cooperatively shut down the worker.  We only use `terminate()` if the
thread doesn't respond in a certain amount of time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants