Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HANG burst_threads test failing (non-det) #2092

Closed
derekbruening opened this issue Dec 5, 2016 · 8 comments
Closed

HANG burst_threads test failing (non-det) #2092

derekbruening opened this issue Dec 5, 2016 · 8 comments

Comments

@derekbruening
Copy link
Contributor

Seen on CDash: http://dynamorio.org/CDash/buildSummary.php?buildid=15334

I managed to get a local repro into gdb:

(gdb) info threads
  Id   Target Id         Frame 
  2    Thread 0x7f270624f700 (LWP 32604) "tool.drcacheoff" syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
  1    Thread 0x7f27094c8740 (LWP 32599) "tool.drcacheoff" (Exiting) 0x00007f27090c259d in pthread_join () from /lib64/libpthread.so.0
(gdb) thread apply all bt

Thread 2 (Thread 0x7f270624f700 (LWP 32604)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x0000000000007f5f in ?? ()
#2  0x00000000006f3674 in ksynch_wait (futex=0x4b4bf3c8, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006cf7ce in wait_for_event (e=0x4b4bf3c8) at /work/dr/git/src/core/unix/os.c:9188
#4  0x00000000006d00fd in os_take_over_all_unknown_threads (dcontext=0x4b4b5080) at /work/dr/git/src/core/unix/os.c:9447
#5  0x0000000000479358 in dynamorio_take_over_threads (dcontext=0x4b4b5080) at /work/dr/git/src/core/dynamo.c:2741
#6  0x00000000006a158b in dynamo_start (mc=0x7f270624ec40) at /work/dr/git/src/core/arch/x86_code.c:103
#7  0x00000000004790e2 in dr_app_start_helper (mc=0x7f270624ec40) at /work/dr/git/src/core/dynamo.c:2636
#8  0x00000000006a211b in dr_app_start () at /work/dr/git/src/core/arch/x86/x86.asm:516
#9  0x00007f2708610540 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f27094c8740 (LWP 32599)):
#0  0x00007f27090c259d in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000411847 in main ()

I suspect that it's a race where an app thread exits after DR constructs a
list of app threads but before DR synchs with each one.

@derekbruening
Copy link
Contributor Author

Actually the gdb instance above was instead a bug in #2089. The CDash hang was in TOT though and the mentioned race seems like an actual possibility.

@derekbruening
Copy link
Contributor Author

This test has also failed on Travis. Here's on but there have been others: https://travis-ci.org/DynamoRIO/dynamorio/jobs/350115271

@derekbruening
Copy link
Contributor Author

OK that Travis failure is an output mismatch: one core has 0 data refs and doesn't print the string "Miss rate".

derekbruening added a commit that referenced this issue Mar 7, 2018
Allows for zero data or instruction refs on a core and thus no printed miss
rate, to reduce flakiness in the burst_threads test.

Issue: #2092
derekbruening added a commit that referenced this issue Mar 7, 2018
Allows for zero data or instruction refs on a core and thus no printed miss
rate, to reduce flakiness in the burst_threads test.

Issue: #2092
@fhahn
Copy link
Contributor

fhahn commented Jun 20, 2018

It also just failed on Travis on #3058 with:

  *** cmd failed (40): pre-DR init
  pre-DR start
  pre-DR detach
  <ERROR: master_signal_handler with no siginfo (i#26?): tid=22369, sig=8>
  <Application
  /home/travis/build/DynamoRIO/dynamorio/build_debug-internal-64/clients/bin64/tool.drcacheoff.burst_threads
  (22361).  Cannot correctly handle a received signal.>

https://travis-ci.org/DynamoRIO/dynamorio/jobs/394648678

@derekbruening
Copy link
Contributor Author

Xref #2941

@hgreving2304
Copy link

Xref #2694

@hgreving2304
Copy link

hgreving2304 commented Jan 16, 2019

I ran into this on glinux (rare!)
<ERROR: master_signal_handler with no siginfo (i#26?): tid=143542, sig=11>

<Application
260: /usr/local/google/home/hgreving/dynamorio/build/clients/bin64/tool.drcacheoff.burst_threads
260: (143534). Cannot correctly handle a received signal.>

@hgreving2304
Copy link

I've tested >>1000 times all 32-bit/64bit debug+release and could not reproduce any failure. There were a few fixes lately around the line above, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants