-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate named mutexes on FreeBSD #10519
Comments
There should be more output indicating the test that failed, but I suspect any parent/child test would fail given the location of the failure. A child process acquires |
thanks @kouvel for the note. I can get more info from the test as well as I think there is some debug option for the threading library. |
This is still a problem (with dotnet/coreclr#18480 reverted).
It's not immediately obvious what's going on. I'll need to find out how to run the tests in isolation and go from there. |
The child is calling runtime/src/coreclr/pal/src/synchobj/mutex.cpp Lines 1551 to 1556 in a842e7a
The above is with a Release build. With a Debug build I get a different issue, again the child doesn't own a lock but this time the parent owns it. We hit the assert at the end: runtime/src/coreclr/pal/src/synchmgr/synchmanager.cpp Lines 4358 to 4366 in 78593b9
There is already logging/tracing in these files but I don't know how to turn it on/view it so I've added |
I guess the test that failed above is Try adding the following asserts at the beginning of this function:
_ASSERTE(m_lockOwnerProcessId == SharedMemoryHelpers::InvalidProcessId);
_ASSERTE(m_lockOwnerThreadId == SharedMemoryHelpers::InvalidSharedThreadId); If those fail in the parent, that may indicate that the pthread mutex successfully acquired the lock for the parent while the child still holds it. |
Thanks @kouvel the asserts you suggested did fail. I noticed the failure only occurred when the parent and child were separate processes. There wasn't any issue if they were two threads in a single process. https://lists.freebsd.org/pipermail/freebsd-announce/2016-May/001716.html
That matches my experience. @wfurt So without a fix in FreeBSD coming anytime soon you can probably close this issue. I'll just link to the line here in case someone else searches for this in the future.
|
Can you provide minimal C example that demonstrates the problem? FreeBSD has process-shared mutexes since 11.x at least. |
The method in use here is pthread mutexes where the storage for the What appears to be the problem in FreeBSD's case is the Amazingly I've not had any memory access violations. But the second process is able to acquire the mutex while the first still has it acquired (because they each are referring to different mutexes). So I have to assume the process-shared mutexes you're referring to that do work are of a different variety - not the shared |
Let me try to debug code by description. POSIX requires that process-shared mutexes were initialized with the attribute pshared set to PTHREAD_PROCESS_SHARED. It is not enough (but also not required) to use the same memory for pthread_mutex_t to get the shared semantic. On FreeBSD, there is a shadow part of the mutex that is handled differently for private vs shared case. Did you set PTHREAD_PROCESS_SHARED for your mutex using pthread_mutexattr_setpshared()? |
BTW you should not use pthread_mutex* functions after fork(2) in multi-threaded process. The functions are not async-signal safe. I think pthread_mutex_lock/unlock for normal AKA non-pi/pp mutexes would work on FreeBSD. |
Yes, runtime/src/coreclr/pal/src/synchobj/mutex.cpp Lines 785 to 792 in 127a498
From a quick search, I think this is the FreeBSD library implementation, you can see the So that |
Yes, I have reverted that |
malloc()ed memory is only used to store shadow mutex data when mutex is process-private. Otherwise, the shadow is allocated by umtx_op(UMTX_OP_SHM) and then mapped with MAP_SHARED (see __thr_pshared_offpage()). Does UMTX_OP_SHM appears in the ktrace output? I think we cannot move forward without a minimal example demonstrating the issue. |
@kostikbel Sorry I just realized you are the author of the post I linked to. I couldn't find anything more recent.
So has libthr2 been incorporated into FreeBSD? I am testing this on 12.2. If it should be working then I will try to create a minimal test program this weekend. Thankyou both for your input. |
I tried the following silly test, and it correctly hangs (incorrectly it would print 'locked'):
|
There is no libthr2. libthr supports process-shared locking objects starting with 11.x AFAIR (I do not remember if I backported that to 10.x) |
No, that part is not implemented, it is still future work. However, process shared mutexes are expected to be functional, since the time of that post. It was implemented here: freebsd/freebsd-src@1bdbd70 (look for It was done this way to avoid breaking the ABI for backwards compatibility -- libthr2 is a proposed project to change to inlined locks, which brings a number of benefits but is not required to have a functional process-shared mutex implementation. |
@kostikbel, I'm going to have to think about your example. It looks like the child locks the mutex and then immediately exits. |
Thanks @emaste lots to keep reading. |
You can simplify it by adding infinite loop to child after the lock. It gives the same result
|
In our case the mutex is init'ed after fork. That might be where we're diverging from design. |
Yes, this cannot work. |
OK, the alternative implementation is working - I think it uses a file lock which should be fine (unless the filesystem doesn't support locks?). Thanks again. |
dotnet/coreclr#18480 forced usage of flock() based implementation of named mutexes on FreeBSD.
According to the detection, phtread should work but following test was failing:
this should be further investigated and understood.
Note that there is libthr and libpthread on FreeBSD.
related to #10355
The text was updated successfully, but these errors were encountered: