Investigate named mutexes on FreeBSD #10519

wfurt · 2018-06-15T07:29:42Z

dotnet/coreclr#18480 forced usage of flock() based implementation of named mutexes on FreeBSD.
According to the detection, phtread should work but following test was failing:

threading/NamedMutex/test1/paltest_namedmutex_test1 test:

Child process: 'paltest_namedmutex_test1' failed at line 357. Expression: childRunningEvent.Release()

this should be further investigated and understood.
Note that there is libthr and libpthread on FreeBSD.

related to #10355

kouvel · 2018-06-16T17:43:00Z

There should be more output indicating the test that failed, but I suspect any parent/child test would fail given the location of the failure. A child process acquires childRunningEvent for the duration of the test and based on the above it looks like releasing it at the end of the test is failing. At that point, the parent process would also be trying to acquire and release childRunningEvent to wait for the child process to exit (in UninitializeParent). Probably would have to debug it to see what's happening.

wfurt · 2018-06-18T05:28:37Z

thanks @kouvel for the note. I can get more info from the test as well as I think there is some debug option for the threading library.

jasonpugsley · 2021-09-17T05:24:51Z

This is still a problem (with dotnet/coreclr#18480 reverted).

'paltest_namedmutex_test1' child process failed at line 367. Expression: childRunningEvent.Release()
'paltest_namedmutex_test1' child process failed at line 666. Expression: UninitializeChild(childRunningEvent, parentEvents, childEvents)

It's not immediately obvious what's going on. I'll need to find out how to run the tests in isolation and go from there.

jasonpugsley · 2021-09-19T04:59:29Z

The child is calling NamedMutexProcessData::ReleaseLock() but when it checks to see if the current thread owns the lock we find m_lockOwnerProcessId = -1, m_lockOwnerThreadId = -1
The child doesn't own the lock so we hit the throw clause:

runtime/src/coreclr/pal/src/synchobj/mutex.cpp

Lines 1551 to 1556 in a842e7a

    
           void NamedMutexProcessData::ReleaseLock() 
        
           { 
        
               if (!IsLockOwnedByCurrentThread()) 
        
               { 
        
                   throw SharedMemoryException(static_cast<DWORD>(NamedMutexError::ThreadHasNotAcquiredMutex)); 
        
               }

The above is with a Release build. With a Debug build I get a different issue, again the child doesn't own a lock but this time the parent owns it. We hit the assert at the end:

runtime/src/coreclr/pal/src/synchmgr/synchmanager.cpp

Lines 4358 to 4366 in 78593b9

    
           bool CThreadSynchronizationInfo::OwnsNamedMutex(NamedMutexProcessData *processData) 
        
           { 
        
               _ASSERTE(this == &GetCurrentPalThread()->synchronizationInfo); 
        
               for (NamedMutexProcessData *current = m_ownedNamedMutexListHead; 
        
                   current != nullptr; 
        
                   current = current->GetNextInThreadOwnedNamedMutexList()) 
        
               { 
        
                   _ASSERTE(current->IsLockOwnedByCurrentThread());

There is already logging/tracing in these files but I don't know how to turn it on/view it so I've added printfs but that's not scaling. Is there any information on how to trace one of these test runs?

kouvel · 2021-09-23T17:23:35Z

I guess the test that failed above is LifetimeTests_Child. All of the child thread/process tests acquire the childRunningEvent mutex at the beginning as part of InitializeChild and release the mutex at the end in UninitializeChild. The parent thread/process waits to acquire the same mutex at the end of its test, to ensure that the child has completed before the parent completes, so that it can safely move on to the next test. I wonder if the parent is able to acquire the lock while the child still holds it, as then the parent would overwrite m_lockOwnerProcessId and m_lockOwnerThreadId in the shared data and make it look like the child doesn't own the lock anymore. And since the parent soon releases the lock, they could also be -1 after that.

Try adding the following asserts at the beginning of this function:

runtime/src/coreclr/pal/src/synchobj/mutex.cpp

Line 965 in 44b4450

m_lockOwnerProcessId = GetCurrentProcessId();

    _ASSERTE(m_lockOwnerProcessId == SharedMemoryHelpers::InvalidProcessId);
    _ASSERTE(m_lockOwnerThreadId == SharedMemoryHelpers::InvalidSharedThreadId);

If those fail in the parent, that may indicate that the pthread mutex successfully acquired the lock for the parent while the child still holds it.

jasonpugsley · 2021-09-30T05:49:44Z

Thanks @kouvel the asserts you suggested did fail. I noticed the failure only occurred when the parent and child were separate processes. There wasn't any issue if they were two threads in a single process.
I've examined the logging very carefully and then scoured the web and it appears FreeBSD doesn't have a valid/complete implementation of process-shared mutexes. The last detailed information I could find was from 2016 and I don't think it has been fixed since.

https://lists.freebsd.org/pipermail/freebsd-announce/2016-May/001716.html

Our libthr, the library implementing the POSIX threads and locking
operations, uses a pointer as the internal representation behind a
lock. The pointer contains the address of the actual structure carrying
the lock. This has unfortunate consequences for implementing the
PTHREAD_PROCESS_SHARED attribute for locks, since really only the
pointer is shared when a lock is mapped into distinct address spaces.

That matches my experience.

@wfurt So without a fix in FreeBSD coming anytime soon you can probably close this issue.

I'll just link to the line here in case someone else searches for this in the future.

runtime/src/coreclr/pal/src/include/pal/mutex.hpp

Line 125 in 80f015d

    
           #if HAVE_FULLY_FEATURED_PTHREAD_MUTEXES && HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES && !(defined(HOST_ARM) || defined(HOST_ARM64) || defined(__FreeBSD__))

kostikbel · 2021-09-30T12:58:41Z

Can you provide minimal C example that demonstrates the problem? FreeBSD has process-shared mutexes since 11.x at least.

jasonpugsley · 2021-09-30T14:04:19Z

The method in use here is pthread mutexes where the storage for the pthread_mutex_t object that is referenced by both processes exists in shared memory.
The shared memory is mmap'ed from a file known to both processes.

What appears to be the problem in FreeBSD's case is the pthread_mutex_t object in the shared memory is not the actual data (struct) describing the mutex. Rather it is a pointer that points to memory outside of the shared space. So the process that creates the mutex has visibility of the mutex because the pointer is valid. For the second process it's an invalid pointer.

Amazingly I've not had any memory access violations. But the second process is able to acquire the mutex while the first still has it acquired (because they each are referring to different mutexes).

So I have to assume the process-shared mutexes you're referring to that do work are of a different variety - not the shared pthread_mutex_t type.

kostikbel · 2021-09-30T14:16:58Z

Let me try to debug code by description.

POSIX requires that process-shared mutexes were initialized with the attribute pshared set to PTHREAD_PROCESS_SHARED. It is not enough (but also not required) to use the same memory for pthread_mutex_t to get the shared semantic. On FreeBSD, there is a shadow part of the mutex that is handled differently for private vs shared case.

Did you set PTHREAD_PROCESS_SHARED for your mutex using pthread_mutexattr_setpshared()?

kostikbel · 2021-09-30T14:27:09Z

BTW you should not use pthread_mutex* functions after fork(2) in multi-threaded process. The functions are not async-signal safe. I think pthread_mutex_lock/unlock for normal AKA non-pi/pp mutexes would work on FreeBSD.

jasonpugsley · 2021-09-30T14:27:23Z

Yes,

runtime/src/coreclr/pal/src/synchobj/mutex.cpp

Lines 785 to 792 in 127a498

    
           error = pthread_mutexattr_setpshared(&mutexAttributes, PTHREAD_PROCESS_SHARED); 
        
           _ASSERTE(error == 0); 
        
           error = pthread_mutexattr_setrobust(&mutexAttributes, PTHREAD_MUTEX_ROBUST); 
        
           _ASSERTE(error == 0); 
        
           error = pthread_mutexattr_settype(&mutexAttributes, PTHREAD_MUTEX_RECURSIVE); 
        
           _ASSERTE(error == 0);

From a quick search, I think this is the FreeBSD library implementation, you can see the calloc() to hold the internal struct pthread_mutex data.
https://github.com/freebsd/freebsd-src/blob/a20c10893eb17e281f119d1b9b39c175dbf4d7bd/lib/libthr/thread/thr_mutex.c#L277-L300

So that calloc() stores the mutex data in process-private address space.

jasonpugsley · 2021-09-30T14:32:08Z

Yes, I have reverted that defined(__FreeBSD__) check during my investigation.

kostikbel · 2021-09-30T14:34:52Z

malloc()ed memory is only used to store shadow mutex data when mutex is process-private. Otherwise, the shadow is allocated by umtx_op(UMTX_OP_SHM) and then mapped with MAP_SHARED (see __thr_pshared_offpage()). Does UMTX_OP_SHM appears in the ktrace output?

I think we cannot move forward without a minimal example demonstrating the issue.

jasonpugsley · 2021-09-30T14:41:06Z

@kostikbel Sorry I just realized you are the author of the post I linked to. I couldn't find anything more recent.

libthr with inlined locks became informally known as the libthr2
project, since it is better to change the library name than just
bumping the library version. rtld should ensure that libthr and libthr2
are not simultaneously loaded into a single address space.

So has libthr2 been incorporated into FreeBSD? I am testing this on 12.2.

If it should be working then I will try to create a minimal test program this weekend. Thankyou both for your input.

kostikbel · 2021-09-30T14:47:39Z

I tried the following silly test, and it correctly hangs (incorrectly it would print 'locked'):

/* */
#include <sys/types.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

static pthread_mutex_t mtx;

int
main(void)
{
	pthread_mutexattr_t attr;
	pid_t child;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_mutex_init(&mtx, &attr);

	child = fork();
	if (child != 0) {
		sleep(2);
		pthread_mutex_lock(&mtx);
		printf("locked\n");
	} else {
		pthread_mutex_lock(&mtx);
	}
}

kostikbel · 2021-09-30T14:48:35Z

There is no libthr2. libthr supports process-shared locking objects starting with 11.x AFAIR (I do not remember if I backported that to 10.x)

emaste · 2021-09-30T14:53:10Z

@kostikbel Sorry I just realized you are the author of the post I linked to. I couldn't find anything more recent.

libthr with inlined locks became informally known as the libthr2
project, since it is better to change the library name than just
bumping the library version. rtld should ensure that libthr and libthr2
are not simultaneously loaded into a single address space.

So has libthr2 been incorporated into FreeBSD? I am testing this on 12.2.

No, that part is not implemented, it is still future work. However, process shared mutexes are expected to be functional, since the time of that post. It was implemented here: freebsd/freebsd-src@1bdbd70 (look for THR_PSHARED_PTR to see the details).

It was done this way to avoid breaking the ABI for backwards compatibility -- libthr2 is a proposed project to change to inlined locks, which brings a number of benefits but is not required to have a functional process-shared mutex implementation.

jasonpugsley · 2021-09-30T14:57:19Z

@kostikbel, I'm going to have to think about your example. It looks like the child locks the mutex and then immediately exits.
It's not a robust mutex so the parent doesn't know the child is "dead". I need to research this more to understand it. 😀

jasonpugsley · 2021-09-30T14:58:26Z

Thanks @emaste lots to keep reading.

kostikbel · 2021-09-30T15:02:02Z

You can simplify it by adding infinite loop to child after the lock. It gives the same result

/* */
#include <sys/types.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

static pthread_mutex_t mtx;

int
main(void)
{
	pthread_mutexattr_t attr;
	pid_t child;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_mutex_init(&mtx, &attr);

	child = fork();
	if (child != 0) {
		sleep(2);
		pthread_mutex_lock(&mtx);
		printf("locked\n");
	} else {
		pthread_mutex_lock(&mtx);
		while (1)
			sleep(1000);
	}
}

jasonpugsley · 2021-09-30T15:10:06Z

In our case the mutex is init'ed after fork. That might be where we're diverging from design.
One process init's the mutex in shared memory and the other process is trying to access it from it's reference to the shared memory. I need a proper simplified repro of our process.

kostikbel · 2021-09-30T15:12:14Z

Yes, this cannot work.

jasonpugsley · 2021-09-30T15:22:28Z

OK, the alternative implementation is working - I think it uses a file lock which should be fine (unless the filesystem doesn't support locks?). Thanks again.

jkotas changed the title ~~investigare named mutexes on FreeBSD~~ Investigate named mutexes on FreeBSD Jun 15, 2018

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

Thefrank mentioned this issue Jun 15, 2024

Support for FreeBSD #14537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate named mutexes on FreeBSD #10519

Investigate named mutexes on FreeBSD #10519

wfurt commented Jun 15, 2018

kouvel commented Jun 16, 2018

wfurt commented Jun 18, 2018

jasonpugsley commented Sep 17, 2021 •

edited

Loading

jasonpugsley commented Sep 19, 2021 •

edited

Loading

kouvel commented Sep 23, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021 •

edited

Loading

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

kostikbel commented Sep 30, 2021

emaste commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021 •

edited

Loading

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

Investigate named mutexes on FreeBSD #10519

Investigate named mutexes on FreeBSD #10519

Comments

wfurt commented Jun 15, 2018

kouvel commented Jun 16, 2018

wfurt commented Jun 18, 2018

jasonpugsley commented Sep 17, 2021 • edited Loading

jasonpugsley commented Sep 19, 2021 • edited Loading

kouvel commented Sep 23, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021 • edited Loading

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

kostikbel commented Sep 30, 2021

emaste commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021 • edited Loading

kostikbel commented Sep 30, 2021

jasonpugsley commented Sep 30, 2021

jasonpugsley commented Sep 17, 2021 •

edited

Loading

jasonpugsley commented Sep 19, 2021 •

edited

Loading

jasonpugsley commented Sep 30, 2021 •

edited

Loading

jasonpugsley commented Sep 30, 2021 •

edited

Loading