Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading failure (HANG) on AArch64. ASSERT in utils.c: !lock->owner #3956

Closed
AssadHashmi opened this issue Nov 20, 2019 · 13 comments
Closed

Comments

@AssadHashmi
Copy link
Contributor

We're seeing intermittent hangs with an AArch64 guest binary compiled with OpenMP. All the indications are that it is a multi-threading bug in DR. The hang happens on a DR release build. With the DEBUG build, the following assert fires and exits so doesn't get as far as hanging:

<Application /path/to/test_case.exe (44407).  Internal Error: DynamoRIO debug check failure: /path/to/dynamorio/core/utils.c:576 !lock->owner

Which happens in:

static void
deadlock_avoidance_lock(mutex_t *lock, bool acquired, bool ownable)
{
    if (acquired) {
        . . .
        if (ownable) {
            ASSERT(!lock->owner);
            lock->owner = d_r_get_thread_id();
            lock->owning_dcontext = get_thread_private_dcontext();
        }
        . . .

The guest binary is built with armclang and linked to the Arm Performance Libraries on RHEL7.5:
armclang -fopenmp -armpl=lp64,parallel test_case.c -o test_case.exe

It fails without clients:
drrun ./test_case.exe

These also appear during the -debug run:

<get_memory_info mismatch! (can happen if os combines entries in /proc/pid/maps) 
        os says: 0x0000fffde4000000-0x0000fffe0c000000 prot=0x00000000           
        cache says: 0x0000fffde4000000-0x0000fffe08000000 prot=0x00000000`

<ran out of stolen fd space>

It takes between 3 and 60 runs to get the assert to fire and only seems to fail on ThunderX2 machines.

Running with -loglevel 3 gives the following thread statistics:

(Begin) Thread statistics @6735 global, 0 thread fragments (0:05.953):
                      BB fragments targeted by IBL (thread):                 3
                               Fcache exits, total (thread):                 4
                            Fcache exits, from BBs (thread):                 4
             Fcache exits, total indirect branches (thread):                 3
         Fcache exits, non-trace indirect branches (thread):                 3
   Fcache exits, ind target in cache but not table (thread):                 3
             Fcache exits, from BB, ind target ... (thread):                 3
              Fcache exits, BB->BB, ind target ... (thread):                 3
             Fcache exits, dir target not in cache (thread):                 1
                                Special heap units (thread):                 1
                           Peak special heap units (thread):                 1
             Current special heap capacity (bytes) (thread):             65536
                Peak special heap capacity (bytes) (thread):             65536
                              Heap headers (bytes) (thread):                56
                          Heap align space (bytes) (thread):                12
                     Peak heap align space (bytes) (thread):                12
                     Heap bucket pad space (bytes) (thread):              1136
                Peak heap bucket pad space (bytes) (thread):              1136
                            Heap allocs in buckets (thread):                15
                        Heap allocs variable-sized (thread):                 7
                             Total reserved memory (thread):            393216
                        Peak total reserved memory (thread):            393216
               Guard pages, reserved virtual pages (thread):                 4
          Peak guard pages, reserved virtual pages (thread):                 4
                    Current stack capacity (bytes) (thread):             65536
                       Peak stack capacity (bytes) (thread):             65536
                              Heap claimed (bytes) (thread):             17192
                         Peak heap claimed (bytes) (thread):             17192
                     Current heap capacity (bytes) (thread):             65536
                        Peak heap capacity (bytes) (thread):             65536
              Current total memory from OS (bytes) (thread):            393216
                 Peak total memory from OS (bytes) (thread):            393216
                      Current vmm blocks for stack (thread):                 3
                         Peak vmm blocks for stack (thread):                 3
               Current vmm blocks for special heap (thread):                 3
                  Peak vmm blocks for special heap (thread):                 3
                  Our virtual memory blocks in use (thread):                 6
             Peak our virtual memory blocks in use (thread):                 6
             Allocations using multiple vmm blocks (thread):                 2
                Blocks used for multi-block allocs (thread):                 6
         Current vmm virtual memory in use (bytes) (thread):            393216
            Peak vmm virtual memory in use (bytes) (thread):            393216
                              Number of safe reads (thread):                17
(End) Thread statistics

Does anything look unusual?

There's lots of other thread related tracing in the logs but I don't know what to look for.

Clearly ownable and lock->owner are contradicting each other.
Where and when could that be happening?

Thanks

@derekbruening
Copy link
Contributor

So a thread acquires a lock and its owner field is pointing at another thread who failed to properly clear it, yet did clear the lock_requests field? Which lock is it? What is the callstack of the assert point? What are the values of the lock fields, especially of owner? Does owner point at an existing thread, or at a now-exited thread?

#2502 is a known race issue which affects both AArch{32,64} AFAIK but would likely result in different symptoms than this.

@derekbruening
Copy link
Contributor

Clearly ownable and lock->owner are contradicting each other.

IIRC ownable is going to always be true for DR locks (it's there for certain locks used "as the app"): that is not relevant here. The issue is that owner is not empty.

@derekbruening
Copy link
Contributor

@AssadHashmi were you able to determine any further clues? We are also seeing this same assert in debug, and in release crashes and hangs that look like they may come from races/bugs in DR's lock implementation, on Thunder X2.

@AssadHashmi
Copy link
Contributor Author

@AssadHashmi were you able to determine any further clues? We are also seeing this same assert in debug, and in release crashes and hangs that look like they may come from races/bugs in DR's lock implementation, on Thunder X2.

No further clues but I will re-visit this and post an update.

@derekbruening
Copy link
Contributor

A first pass review shows that the atomic macros in arch_exports.h for AArch64 need to be changed to meet the release-acquire memory model that DR is assuming there (xref my recent changes to non-mutex atomics which also want release-acquire, such as 35cbfc4#diff-e74a10e858f5f2dcd45c8afe825e8b26R642). Basically, s/ldxr/ldaxr;s/stxr/stlxr/. Unfortunately I tried that already with our app and it didn't seem to make the symptoms disappear but I will try again and double-check everything.

@AssadHashmi
Copy link
Contributor Author

@derekbruening I have a small test case (linked to the Arm Performance Libraries and built with the Arm compiler), which crashes after around 200 runs. If you have a patch, I can try it on my build. I'm on the latest head.

derekbruening added a commit that referenced this issue Apr 17, 2020
Replaces ldxr..stxr with ldaxr..stlxr in the atomic sequences used to
implement DR's mutexes.  DR's mutexes, and its other atomic
operations, assume release-acquire memory ordering.

Issue: #3956
@derekbruening
Copy link
Contributor

derekbruening commented Apr 17, 2020

I just pushed my simple s/ldxr/ldaxr;s/stxr/stlxr/ to the branch i3956-arm-locks. If you could try that on your test case that would be great.

Adding a link to the branch: https://github.com/DynamoRIO/dynamorio/tree/i3956-arm-locks

@derekbruening
Copy link
Contributor

I went ahead and made it into a PR for simple diff browsing and commenting on the changes: #4254.

@derekbruening
Copy link
Contributor

A partial update on #4254's effect on our app: it does seem to be better. Running it with a small-ish thread pool on HEAD, it succeeded only 2 out of 10 times, with the rest being various crashes ("Cannot correctly handle received signal 11", internal DR crash, occasionally app crash). With #4254, 10 out of 10 succeed. However, upping the thread pool to a normal level it seems to be hanging. Let me evaluate further in debug build to ensure the assert in the title is gone. Probably there are multiple underlying issues (certainly there are several debug asserts/warnings I will file soon).

@AssadHashmi
Copy link
Contributor Author

Without the #4254 patch, my test case crashes after 161, 529, 663, 195 and 230 runs. With the patch it gets to 1000 without crashing. I will do some more, longer runs over the weekend, maybe with different OpenMP and build flags.

@AssadHashmi
Copy link
Contributor Author

For extra info, without the patch, the test case crashed in different ways:

Received SIGSEGV at pc 0x0000ffffb1277194 in thread 58687
Cannot correctly handle received signal 11 in thread 35786
Segmentation fault

@derekbruening
Copy link
Contributor

So that matches what I'm seeing on the small version: those same types of crashes w/o the fix, and success with the fix. I'm seeing hangs when I scale up the app, and with drcachesim debug build I see some other asserts and warnings and app crashes, but they are presumably separate issues. It is looking like #4254 is worth putting in and that it does indeed fix the DR lock bugs this issue covers. I'll file separate issues on the other problems I'm seeing: unfortunately there are quite a few, and some are blocking running this app reliably, but fixing this is a good step.

derekbruening added a commit that referenced this issue Apr 19, 2020
Adds a "dmb ish" barrier prior to the "ldrex..strex" loops in the
atomics used to implement mutexes and other operations on 32-bit ARM
where we need release-acquire semantics.

Issue: #3956
@AssadHashmi
Copy link
Contributor Author

I'm seeing hangs when I scale up the app

Weekend test runs show that my small test case always passes. I will apply #4254 and run the full applications test suite to see what happens. Results may take a couple of day to come through.

derekbruening added a commit that referenced this issue Apr 20, 2020
Adds a "dmb ish" barrier prior to the "ldrex..strex" loops in the
atomics used to implement mutexes and other operations on 32-bit ARM
where we need release-acquire semantics.

Issue: #3956
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants