Fix a race condition in pthread_mutex_timedlock.c #12245

kripken · 2020-09-17T01:58:07Z

pthread_mutex_lock gives no time when it calls this function. In that case
we don't loop, we just do a single wait forever. In a rare race condition,
the condition we care about may be set right before the wait, and the wait
does not know to look for it (it looks for a wake event, it doesn't read the
memory to check the value).

Instead, just busy-wait. As this is for pthread_mutex_lock, the normal use
case is probably something that needs to be fast anyhow.

sbc100 · 2020-09-17T02:55:41Z

I'm not sure turning all pthread_mutex_lock calls into busy loops in an acceptable solution. There could be threads that wait for long periods on locks, no?

sbc100 · 2020-09-17T02:55:55Z

Is this bug that effect musl in general or just emscripten?

sbc100 · 2020-09-17T02:57:58Z

Also does the the same rice condition apply to timelock calls that do have timeout?

What if I call timedlock with a 30 minute timeout? I could hit the race condition and end up waiting 30 minutes, right?

kleisauke · 2020-09-17T08:56:22Z

I wonder if PR #10524 could also help resolve these race conditions? Since it ran the entire Open POSIX Test Suite in WebAssembly.

sbc100 · 2020-09-17T12:44:22Z

I also wonder if this a real bug in musl if it might have been fixed upstream already?

kripken · 2020-09-17T16:54:38Z

@kleisauke Good idea, but it doesn't look like #10524 can help here - it fixes other issues like thread cancellation, but not core mutex operations or our proxying logic.

@sbc100 I did look at upstream musl, and the code has not changed significantly, so it's not fixed upstream AFAICT.

I don't know if this only affects us or musl in general (it would be incredibly hard to test a native build of musl in a reliable enough way on a variant of #12258!).

Overall I think the diagnoses in these three PRs is incorrect as has been pointed out. However, they are all necessary to fix #12258, and they each definitely fix a specific deadlock I encountered while debugging that testcase. So I guess we need to debug those three deadlocks more. I am a little unsure how best to do that though - how can I debug whether Atomic.wait is actually atomic as it claims to be?

The only good news here is that this is likely not urgent, as these corner cases are very hard to hit. They are also all quite old, so I don't think we have any recent regression here.

kripken · 2020-09-22T23:26:31Z

I have found the actual cause here, and will open a refactoring PR and then a fix PR shortly.

kripken added 3 commits September 16, 2020 17:11

Fix a pthreads race condition

361ff90

shorter

bb08b54

better

c959116

kripken requested a review from juj September 17, 2020 01:58

kripken added a commit that referenced this pull request Sep 17, 2020

Add a testcase for #12243 #12244 #12245 [ci skip]

5617421

kripken mentioned this pull request Sep 17, 2020

Add a testcase for pthreads race conditions #12258

Closed

kripken mentioned this pull request Sep 17, 2020

Exception Handling - exception on the wrong thread when using pthreads #12035

Closed

kripken marked this pull request as draft September 22, 2020 19:42

kripken closed this Sep 22, 2020

kripken deleted the pthread1 branch September 22, 2020 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a race condition in pthread_mutex_timedlock.c #12245

Fix a race condition in pthread_mutex_timedlock.c #12245

kripken commented Sep 17, 2020

sbc100 commented Sep 17, 2020

sbc100 commented Sep 17, 2020

sbc100 commented Sep 17, 2020

kleisauke commented Sep 17, 2020

sbc100 commented Sep 17, 2020

kripken commented Sep 17, 2020

kripken commented Sep 22, 2020

Fix a race condition in pthread_mutex_timedlock.c #12245

Fix a race condition in pthread_mutex_timedlock.c #12245

Conversation

kripken commented Sep 17, 2020

sbc100 commented Sep 17, 2020

sbc100 commented Sep 17, 2020

sbc100 commented Sep 17, 2020

kleisauke commented Sep 17, 2020

sbc100 commented Sep 17, 2020

kripken commented Sep 17, 2020

kripken commented Sep 22, 2020