Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate named mutexes on FreeBSD #10519

Open
wfurt opened this issue Jun 15, 2018 · 23 comments
Open

Investigate named mutexes on FreeBSD #10519

wfurt opened this issue Jun 15, 2018 · 23 comments
Labels
area-PAL-coreclr help wanted [up-for-grabs] Good issue for external contributors os-freebsd FreeBSD OS
Milestone

Comments

@wfurt
Copy link
Member

wfurt commented Jun 15, 2018

dotnet/coreclr#18480 forced usage of flock() based implementation of named mutexes on FreeBSD.
According to the detection, phtread should work but following test was failing:

threading/NamedMutex/test1/paltest_namedmutex_test1 test:

Child process: 'paltest_namedmutex_test1' failed at line 357. Expression: childRunningEvent.Release()

this should be further investigated and understood.
Note that there is libthr and libpthread on FreeBSD.

related to #10355

@jkotas jkotas changed the title investigare named mutexes on FreeBSD Investigate named mutexes on FreeBSD Jun 15, 2018
@kouvel
Copy link
Member

kouvel commented Jun 16, 2018

There should be more output indicating the test that failed, but I suspect any parent/child test would fail given the location of the failure. A child process acquires childRunningEvent for the duration of the test and based on the above it looks like releasing it at the end of the test is failing. At that point, the parent process would also be trying to acquire and release childRunningEvent to wait for the child process to exit (in UninitializeParent). Probably would have to debug it to see what's happening.

@wfurt
Copy link
Member Author

wfurt commented Jun 18, 2018

thanks @kouvel for the note. I can get more info from the test as well as I think there is some debug option for the threading library.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@jasonpugsley
Copy link
Contributor

jasonpugsley commented Sep 17, 2021

This is still a problem (with dotnet/coreclr#18480 reverted).

'paltest_namedmutex_test1' child process failed at line 367. Expression: childRunningEvent.Release()
'paltest_namedmutex_test1' child process failed at line 666. Expression: UninitializeChild(childRunningEvent, parentEvents, childEvents)

It's not immediately obvious what's going on. I'll need to find out how to run the tests in isolation and go from there.

@jasonpugsley
Copy link
Contributor

jasonpugsley commented Sep 19, 2021

The child is calling NamedMutexProcessData::ReleaseLock() but when it checks to see if the current thread owns the lock we find m_lockOwnerProcessId = -1, m_lockOwnerThreadId = -1
The child doesn't own the lock so we hit the throw clause:

void NamedMutexProcessData::ReleaseLock()
{
if (!IsLockOwnedByCurrentThread())
{
throw SharedMemoryException(static_cast<DWORD>(NamedMutexError::ThreadHasNotAcquiredMutex));
}

The above is with a Release build. With a Debug build I get a different issue, again the child doesn't own a lock but this time the parent owns it. We hit the assert at the end:

bool CThreadSynchronizationInfo::OwnsNamedMutex(NamedMutexProcessData *processData)
{
_ASSERTE(this == &GetCurrentPalThread()->synchronizationInfo);
for (NamedMutexProcessData *current = m_ownedNamedMutexListHead;
current != nullptr;
current = current->GetNextInThreadOwnedNamedMutexList())
{
_ASSERTE(current->IsLockOwnedByCurrentThread());

There is already logging/tracing in these files but I don't know how to turn it on/view it so I've added printfs but that's not scaling. Is there any information on how to trace one of these test runs?

@kouvel
Copy link
Member

kouvel commented Sep 23, 2021

I guess the test that failed above is LifetimeTests_Child. All of the child thread/process tests acquire the childRunningEvent mutex at the beginning as part of InitializeChild and release the mutex at the end in UninitializeChild. The parent thread/process waits to acquire the same mutex at the end of its test, to ensure that the child has completed before the parent completes, so that it can safely move on to the next test. I wonder if the parent is able to acquire the lock while the child still holds it, as then the parent would overwrite m_lockOwnerProcessId and m_lockOwnerThreadId in the shared data and make it look like the child doesn't own the lock anymore. And since the parent soon releases the lock, they could also be -1 after that.

Try adding the following asserts at the beginning of this function:

m_lockOwnerProcessId = GetCurrentProcessId();

    _ASSERTE(m_lockOwnerProcessId == SharedMemoryHelpers::InvalidProcessId);
    _ASSERTE(m_lockOwnerThreadId == SharedMemoryHelpers::InvalidSharedThreadId);

If those fail in the parent, that may indicate that the pthread mutex successfully acquired the lock for the parent while the child still holds it.

@jasonpugsley
Copy link
Contributor

Thanks @kouvel the asserts you suggested did fail. I noticed the failure only occurred when the parent and child were separate processes. There wasn't any issue if they were two threads in a single process.
I've examined the logging very carefully and then scoured the web and it appears FreeBSD doesn't have a valid/complete implementation of process-shared mutexes. The last detailed information I could find was from 2016 and I don't think it has been fixed since.

https://lists.freebsd.org/pipermail/freebsd-announce/2016-May/001716.html

Our libthr, the library implementing the POSIX threads and locking
operations, uses a pointer as the internal representation behind a
lock. The pointer contains the address of the actual structure carrying
the lock. This has unfortunate consequences for implementing the
PTHREAD_PROCESS_SHARED attribute for locks, since really only the
pointer is shared when a lock is mapped into distinct address spaces.

That matches my experience.

@wfurt So without a fix in FreeBSD coming anytime soon you can probably close this issue.

I'll just link to the line here in case someone else searches for this in the future.

#if HAVE_FULLY_FEATURED_PTHREAD_MUTEXES && HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES && !(defined(HOST_ARM) || defined(HOST_ARM64) || defined(__FreeBSD__))

@kostikbel
Copy link

Can you provide minimal C example that demonstrates the problem? FreeBSD has process-shared mutexes since 11.x at least.

@jasonpugsley
Copy link
Contributor

The method in use here is pthread mutexes where the storage for the pthread_mutex_t object that is referenced by both processes exists in shared memory.
The shared memory is mmap'ed from a file known to both processes.

What appears to be the problem in FreeBSD's case is the pthread_mutex_t object in the shared memory is not the actual data (struct) describing the mutex. Rather it is a pointer that points to memory outside of the shared space. So the process that creates the mutex has visibility of the mutex because the pointer is valid. For the second process it's an invalid pointer.

Amazingly I've not had any memory access violations. But the second process is able to acquire the mutex while the first still has it acquired (because they each are referring to different mutexes).

So I have to assume the process-shared mutexes you're referring to that do work are of a different variety - not the shared pthread_mutex_t type.

@kostikbel
Copy link

Let me try to debug code by description.

POSIX requires that process-shared mutexes were initialized with the attribute pshared set to PTHREAD_PROCESS_SHARED. It is not enough (but also not required) to use the same memory for pthread_mutex_t to get the shared semantic. On FreeBSD, there is a shadow part of the mutex that is handled differently for private vs shared case.

Did you set PTHREAD_PROCESS_SHARED for your mutex using pthread_mutexattr_setpshared()?

@kostikbel
Copy link

BTW you should not use pthread_mutex* functions after fork(2) in multi-threaded process. The functions are not async-signal safe. I think pthread_mutex_lock/unlock for normal AKA non-pi/pp mutexes would work on FreeBSD.

@jasonpugsley
Copy link
Contributor

jasonpugsley commented Sep 30, 2021

Yes,

error = pthread_mutexattr_setpshared(&mutexAttributes, PTHREAD_PROCESS_SHARED);
_ASSERTE(error == 0);
error = pthread_mutexattr_setrobust(&mutexAttributes, PTHREAD_MUTEX_ROBUST);
_ASSERTE(error == 0);
error = pthread_mutexattr_settype(&mutexAttributes, PTHREAD_MUTEX_RECURSIVE);
_ASSERTE(error == 0);

From a quick search, I think this is the FreeBSD library implementation, you can see the calloc() to hold the internal struct pthread_mutex data.
https://github.com/freebsd/freebsd-src/blob/a20c10893eb17e281f119d1b9b39c175dbf4d7bd/lib/libthr/thread/thr_mutex.c#L277-L300

So that calloc() stores the mutex data in process-private address space.

@jasonpugsley
Copy link
Contributor

Yes, I have reverted that defined(__FreeBSD__) check during my investigation.

@kostikbel
Copy link

malloc()ed memory is only used to store shadow mutex data when mutex is process-private. Otherwise, the shadow is allocated by umtx_op(UMTX_OP_SHM) and then mapped with MAP_SHARED (see __thr_pshared_offpage()). Does UMTX_OP_SHM appears in the ktrace output?

I think we cannot move forward without a minimal example demonstrating the issue.

@jasonpugsley
Copy link
Contributor

@kostikbel Sorry I just realized you are the author of the post I linked to. I couldn't find anything more recent.

libthr with inlined locks became informally known as the libthr2
project, since it is better to change the library name than just
bumping the library version. rtld should ensure that libthr and libthr2
are not simultaneously loaded into a single address space.

So has libthr2 been incorporated into FreeBSD? I am testing this on 12.2.

If it should be working then I will try to create a minimal test program this weekend. Thankyou both for your input.

@kostikbel
Copy link

I tried the following silly test, and it correctly hangs (incorrectly it would print 'locked'):

/* */
#include <sys/types.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

static pthread_mutex_t mtx;

int
main(void)
{
	pthread_mutexattr_t attr;
	pid_t child;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_mutex_init(&mtx, &attr);

	child = fork();
	if (child != 0) {
		sleep(2);
		pthread_mutex_lock(&mtx);
		printf("locked\n");
	} else {
		pthread_mutex_lock(&mtx);
	}
}

@kostikbel
Copy link

There is no libthr2. libthr supports process-shared locking objects starting with 11.x AFAIR (I do not remember if I backported that to 10.x)

@emaste
Copy link

emaste commented Sep 30, 2021

@kostikbel Sorry I just realized you are the author of the post I linked to. I couldn't find anything more recent.

libthr with inlined locks became informally known as the libthr2
project, since it is better to change the library name than just
bumping the library version. rtld should ensure that libthr and libthr2
are not simultaneously loaded into a single address space.

So has libthr2 been incorporated into FreeBSD? I am testing this on 12.2.

No, that part is not implemented, it is still future work. However, process shared mutexes are expected to be functional, since the time of that post. It was implemented here: freebsd/freebsd-src@1bdbd70 (look for THR_PSHARED_PTR to see the details).

It was done this way to avoid breaking the ABI for backwards compatibility -- libthr2 is a proposed project to change to inlined locks, which brings a number of benefits but is not required to have a functional process-shared mutex implementation.

@jasonpugsley
Copy link
Contributor

@kostikbel, I'm going to have to think about your example. It looks like the child locks the mutex and then immediately exits.
It's not a robust mutex so the parent doesn't know the child is "dead". I need to research this more to understand it. 😀

@jasonpugsley
Copy link
Contributor

Thanks @emaste lots to keep reading.

@kostikbel
Copy link

You can simplify it by adding infinite loop to child after the lock. It gives the same result

/* */
#include <sys/types.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

static pthread_mutex_t mtx;

int
main(void)
{
	pthread_mutexattr_t attr;
	pid_t child;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_mutex_init(&mtx, &attr);

	child = fork();
	if (child != 0) {
		sleep(2);
		pthread_mutex_lock(&mtx);
		printf("locked\n");
	} else {
		pthread_mutex_lock(&mtx);
		while (1)
			sleep(1000);
	}
}

@jasonpugsley
Copy link
Contributor

jasonpugsley commented Sep 30, 2021

In our case the mutex is init'ed after fork. That might be where we're diverging from design.
One process init's the mutex in shared memory and the other process is trying to access it from it's reference to the shared memory. I need a proper simplified repro of our process.

@kostikbel
Copy link

Yes, this cannot work.

@jasonpugsley
Copy link
Contributor

OK, the alternative implementation is working - I think it uses a file lock which should be fine (unless the filesystem doesn't support locks?). Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-PAL-coreclr help wanted [up-for-grabs] Good issue for external contributors os-freebsd FreeBSD OS
Projects
None yet
Development

No branches or pull requests

6 participants