Fallback to a read lock on fork preparation #356

simo5 · 2024-03-07T21:45:27Z

The reason to acquire a write lock is that there is a corner case where
the forking application is going to actually create additional threads
in an atfork() handler that runs before ours instead of doing something
more sensible like an immediate exec().
If such an application were to create threads then there is a chance
one of those threads could cause a call into pkcs11-provider that would
end up acquiring a read lock, which we will destroy as part of the post
fork atfork() handler.
Note that pthread creation or even pthread lock initialization functions
are not considered async-sginal-safe fucntions, so this is all a shot in
the dark anyway ...

Given this premise, falling back to a read lock, which will prevent
modification of the slots is sufficient in most cases.

Although there is a slight chance again that resetting the locks can
cause a thread to obtain a write lock while another thread holds a now
defunct read lock, such that there is modification of the slots while
a separate thread is reading this information, this is so rare it is
not worth dealing with (and there is nothing we can do anyway).
At least two threads won't be writing at the same time, that could be
really problematic.

Use of atfork() is also so rare that the chance of this happening in any
reasonable application can be considered infinitesimally small.

test fork deadlock

Fixes #355

simo5 · 2024-03-07T21:50:33Z

Actually this patch may still not be ok, because the read lock could be nested in the same thread as the code calling the callback and already holding a read lock ...

ifranzki · 2024-03-08T07:18:32Z

I have tested this change with Apache and yes, it does avoid the deadlock.

Jakuje

looks good

ifranzki · 2024-03-08T09:23:50Z

Small typo in commit title:
"Just acquire a read lock on for preparation"
should probably read "... on fork preparation"

simo5 · 2024-03-08T13:56:37Z

@ifranzki does apache fork from the thread where the callback is running, or does it fork from another thread?

simo5 · 2024-03-08T16:13:42Z

Ok so I verified that holding a read lock multiple times with pthread_rwlock_rdlock() is just fine, therefore the code will work as is.

However I think I may still change this patch to attempt to acquire a write lock, and fallback to a mere read lock only if that fails ...

simo5 · 2024-03-08T17:26:51Z

I added a trywrlock() and also added a much deeper explanation in the commit message of my reasoning why I do what I do now in the code.
Will modify the PR description accordingly as well.

ifranzki · 2024-03-11T07:47:44Z

does apache fork from the thread where the callback is running, or does it fork from another thread?

It does fork from the thread where the callback is running. You can see this in the backtrace in the issue:
#8 0x000003fff7d31a76 in apr_proc_create () from /lib/s390x-linux-gnu/libapr-1.so.0
This is doing the fork, it is a function of the Apache portable runtime (APR), here is the code (look for apr_proc_create):
https://svn.apache.org/viewvc/apr/apr/trunk/threadproc/unix/proc.c?view=markup

Also, the backtrace shows that its executing in the process's main thread, i.e. the backtrack shows main() as very first function in the backtrace.

simo5 · 2024-03-11T21:40:59Z

Thanks for confirming, I think I am ok with the current code.
I just nee dot find time to write a test so we do not regress.

simo5 · 2024-03-15T14:56:45Z

Finally found time to add the test.
The test fails without the commit that fixes the deadlock.
Tested failure locally by reverting the (now-last) commit.

This test will ensure we do not inadvertently reintroduce a deadlock in the future.

Jakuje · 2024-03-18T11:16:15Z

Address sanitizer / CI with Address Sanitizer (fedora) (pull_request) Failing after 360m

I ran diff through the logs for fedora in this run and last working in main https://github.com/latchset/pkcs11-provider/actions/runs/8266635712/job/22615078760 and the only difference in the dependencies is unbound from 1.19.1-2 to 1.19.1-4.

Running the address sanitizer checks locally on master shows no issues, while running it in this branch shows two memory leaks:

Direct leak of 88 byte(s) in 1 object(s) allocated from:
    #0 0x7f742b2d92ef in malloc (/usr/lib64/libasan.so.8.0.0+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
    #1 0x7f742adb5334 in CRYPTO_zalloc (/lib64/libcrypto.so.3+0x1b5334) (BuildId: 5d012dcc6f62f35dabb8a129d641e3df7b731f11)
    #2 0x7f742ae172dc in UI_create_method (/lib64/libcrypto.so.3+0x2172dc) (BuildId: 5d012dcc6f62f35dabb8a129d641e3df7b731f11)
    #3 0x4023f7 in main /home/jjelen/devel/pkcs11-provider/tests/tfork_deadlock.c:127
    #4 0x7f742aa46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #5 0x7f742aa4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #6 0x402cd4 in _start (/home/jjelen/devel/pkcs11-provider/tests/tfork_deadlock+0x402cd4) (BuildId: 4227a9452263f96357af137df2d8392b07b00ea6)

Indirect leak of 14 byte(s) in 1 object(s) allocated from:
    #0 0x7f742b2d92ef in malloc (/usr/lib64/libasan.so.8.0.0+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
    #1 0x7f742adb6672 in CRYPTO_strdup (/lib64/libcrypto.so.3+0x1b6672) (BuildId: 5d012dcc6f62f35dabb8a129d641e3df7b731f11)
    #2 0x7f742ae172f4 in UI_create_method (/lib64/libcrypto.so.3+0x2172f4) (BuildId: 5d012dcc6f62f35dabb8a129d641e3df7b731f11)
    #3 0x4023f7 in main /home/jjelen/devel/pkcs11-provider/tests/tfork_deadlock.c:127
    #4 0x7f742aa46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #5 0x7f742aa4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #6 0x402cd4 in _start (/home/jjelen/devel/pkcs11-provider/tests/tfork_deadlock+0x402cd4) (BuildId: 4227a9452263f96357af137df2d8392b07b00ea6)

Can you try to fix them and rebuild it?

A free() was used indtead of the proper OPENSSL_free() While there reformatted the code to better conform to our style. Signed-off-by: Simo Sorce <[email protected]>

Signed-off-by: Simo Sorce <[email protected]>

The reason to acquire a write lock is that there is a corner case where the forking application is going to actually create additional threads in an atfork() handler that runs before ours instead of doing something more sensible like an immediate exec(). If such an application were to create threads then there is a chance one of those threads could cause a call into pkcs11-provider that would end up acquiring a read lock, which we will destroy as part of the post fork atfork() handler. Note that pthread creation or even pthread lock initialization functions are not considered async-signal-safe fucntions, so this is all a shot in the dark anyway ... Given this premise, falling back to a read lock, which will prevent modification of the slots is sufficient in most cases. Although there is a slight chance again that resetting the locks can cause a thread to obtain a write lock while another thread holds a now defunct read lock, such that there is modification of the slots while a separate thread is reading this information, this is so rare it is not worth dealing with (and there is nothing we can do anyway). At least two threads won't be writing at the same time, that could be really problematic. Use of atfork() is also so rare that the chance of this happening in any reasonable application can be considered infinitesimally small. Signed-off-by: Simo Sorce <[email protected]>

simo5 · 2024-03-19T16:52:11Z

@Jakuje I sneaked in a commit to fix a Coverity regression in another PR I merged today.
However the Fedora Sanitizer seem randomly failing again.
Are you ok to push regardless ?

simo5 · 2024-03-19T17:34:53Z

Merging as the address sanitizer issue is a problem in the Fedora build not in the code.

simo5 requested a review from Jakuje March 7, 2024 21:49

Jakuje approved these changes Mar 8, 2024

View reviewed changes

simo5 force-pushed the fork_deadlock branch from ef4b55e to 71f0fee Compare March 8, 2024 17:26

simo5 changed the title ~~Just acquire a read lock on for preparation.~~ Fallback to a read lock on fork preparation Mar 8, 2024

simo5 force-pushed the fork_deadlock branch from 71f0fee to 219e335 Compare March 8, 2024 17:41

simo5 force-pushed the fork_deadlock branch from 219e335 to cb4f798 Compare March 15, 2024 14:55

simo5 force-pushed the fork_deadlock branch 3 times, most recently from 3f707ed to e21e6dd Compare March 15, 2024 15:34

simo5 force-pushed the fork_deadlock branch 2 times, most recently from 0b3c4f7 to a410b57 Compare March 19, 2024 16:33

simo5 added 3 commits March 19, 2024 12:41

Fix Coverity issue CID 488133

7645f18

A free() was used indtead of the proper OPENSSL_free() While there reformatted the code to better conform to our style. Signed-off-by: Simo Sorce <[email protected]>

Add forking deadlock testing

a8adcb8

Signed-off-by: Simo Sorce <[email protected]>

simo5 force-pushed the fork_deadlock branch from a410b57 to f1e36ce Compare March 19, 2024 16:46

Jakuje approved these changes Mar 19, 2024

View reviewed changes

simo5 merged commit 673e4fd into latchset:main Mar 19, 2024
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback to a read lock on fork preparation #356

Fallback to a read lock on fork preparation #356

simo5 commented Mar 7, 2024 •

edited

Loading

simo5 commented Mar 7, 2024

ifranzki commented Mar 8, 2024

Jakuje left a comment

ifranzki commented Mar 8, 2024

simo5 commented Mar 8, 2024

simo5 commented Mar 8, 2024

simo5 commented Mar 8, 2024

ifranzki commented Mar 11, 2024

simo5 commented Mar 11, 2024

simo5 commented Mar 15, 2024

Jakuje commented Mar 18, 2024

simo5 commented Mar 19, 2024

simo5 commented Mar 19, 2024

Fallback to a read lock on fork preparation #356

Fallback to a read lock on fork preparation #356

Conversation

simo5 commented Mar 7, 2024 • edited Loading

simo5 commented Mar 7, 2024

ifranzki commented Mar 8, 2024

Jakuje left a comment

Choose a reason for hiding this comment

ifranzki commented Mar 8, 2024

simo5 commented Mar 8, 2024

simo5 commented Mar 8, 2024

simo5 commented Mar 8, 2024

ifranzki commented Mar 11, 2024

simo5 commented Mar 11, 2024

simo5 commented Mar 15, 2024

Jakuje commented Mar 18, 2024

simo5 commented Mar 19, 2024

simo5 commented Mar 19, 2024

simo5 commented Mar 7, 2024 •

edited

Loading