Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace ending mid-rseq is truncated by raw2trace #6444

Closed
derekbruening opened this issue Nov 10, 2023 · 2 comments · Fixed by #6446
Closed

Trace ending mid-rseq is truncated by raw2trace #6444

derekbruening opened this issue Nov 10, 2023 · 2 comments · Fixed by #6446
Assignees

Comments

@derekbruening
Copy link
Contributor

I encountered a trace with a truncated file: no thread exit at the end.
It ends with a write after PC 0x1141e187:

000000 | 000a 0008 000000001140fc66
00000c | 000a 0003 000000001140fc6e
000018 | 000a 0003 000000001140fc71
000024 | 000a 0003 000000001140fc74
000030 | 000a 0003 000000001140fc77
00003c | 000e 0005 000000001140fc7a
000048 | 0001 0008 00007f1e1bca6958
000054 | 000a 0007 000000001141e180
000060 | 000a 0009 000000001141e187
00006c | 0001 0008 00007f1e1bca8d68

Re-running raw2trace: that PC is a ways up in the log, followed by a long rseq sequence (it has a loop) which doesn't reach its end: it has a timestap and thread exit appearing as the thread exited mid-rseq-region it seems. So apparently we do not output the rseq buffer in this case.

   8294 [drmemtrace]: Chunk instr count is now 1313701
   8295   0x000000001141e187  64 48 89 04 25 68 66 mov    %rax -> %fs:0xffff6668[8byte]
   8296                       ff ff
   8297 [drmemtrace]: Appended instr fetch for original 0x1141e187
   8298 [drmemtrace]: Appended memref type write (1) size 8 to 0x7f1e1bca8d68
   8299 [drmemtrace]: Chunk instr count is now 1313702
   8300 [drmemtrace]: --- Reached rseq entry (end=0x1141e1d8): buffering all output ---
   8301 [drmemtrace]: Appended marker type 24 value 0x1141e1d8
<...long sequence because of a loop in the rseq region...>
   9966   0x000000001141e1ce  49 39 c3             cmp    %r11 %rax
   9967 [drmemtrace]: Appended instr fetch for original 0x1141e1ce
   9968   0x000000001141e1d1  75 ed                jnz    $0x000000001141e1c0
   9969 [drmemtrace]: Remembering rseq branch 0x1141e1d1 -> 0x1141e1c0
   9970 [drmemtrace]: Appended instr fetch for original 0x1141e1d1
   9971 [drmemtrace]: Delaying 1 entries for decode=0x1141e1d1
   9972 [drmemtrace]: Remembered delayed branch decode=0x1141e1d1 target=0x1141e1c0
   9973 [drmemtrace]: Thread 775342 timestamp 0x002f684c2ceca958
   9974 [drmemtrace]: Appended marker type 3 value 0x1024
   9975 [drmemtrace]: Thread 775342 timestamp 0x002f684c2cecb08b
   9976 [drmemtrace]: Appended marker type 3 value 0x1029
   9977 [drmemtrace]: Discarded 1 entries for final branch without subsequent instr
   9978 [drmemtrace]: Thread 775342 exit
   9979 I1109 22:24:49.979226 2274534 decomposer.cc:574] raw2trace complete!

Two action items:

  1. Output rseq buffer if we hit thread exit mid-rseq
  2. Add invariant check that a thread exit footer exists for every thread
@derekbruening derekbruening self-assigned this Nov 10, 2023
derekbruening added a commit that referenced this issue Nov 10, 2023
Adds an invariant check that each thread has an exit record in a
drmemtrace.

Adds a unit test.

The new check failed on nearly every existing unit test, requiring
adding thread exits.  That in turn triggered exit checks on missing
headers and filetypes.  I ended up having to edit every test case.  As
part of this I standardized the tid constants used.

Issue: #6444
derekbruening added a commit that referenced this issue Nov 10, 2023
If the thread trace ends mid-rseq-region, dump the buffer instead of
losing not only the rseq sequence so far but also the thread exit
record.

Adds a unit test.  Also tested on a real trace with this issue.

Fixes #6444
@derekbruening
Copy link
Contributor Author

For the thread exit invariant check, it is failing for two tests on win64 as below. Having trouble figure this out but it may be related to the exit-all-threads on NtTerminateProcess: other-thread exits. For now I'm going to disable the check for Windows online with a comment.

https://github.com/DynamoRIO/dynamorio/actions/runs/6830155161/job/18577608422?pr=6445


234/305 Test #235: code_api|tool.drcachesim.scattergather-x86 .......................***Failed  Required regular expression not found. Regex=[^AVX2/AVX-512 scatter/gather checks ok
?
---- <application exited with code 0> ----
?
Trace invariant checks passed
?
$
] 25.53 sec
AVX2/AVX-512 scatter/gather checks ok
---- <application exited with code 0> ----
Trace invariant failure in T6280 at ref # 4502857 (0 instrs since timestamp 13344129548981663): Thread is missing exit

234/305 Test #234: code_api|tool.drcachesim.invariants ..............................***Failed  Required regular expression not found. Regex=[^.*
?
Trace invariant checks passed
?
$
] 47.13 sec
About to create thread
in wnd_callback 0x0000000000000024 0
in wnd_callback 0x0000000000000081 0
in wnd_callback 0x0000000000000083 0
in wnd_callback 0x0000000000000001 0
in wnd_callback 0x0000000000008001 3 0
About to crash
Inside handler
in wnd_callback 0x0000000000008001 0 2
Got message 0x0000000000008001 1 3
Before _beginthreadex
QueueUserAPC returned 1
SleepEx returned 192
Apc arg = 37
Result = 100
After _beginthreadex
All done
---- <application exited with code 0> ----
Cache simulation results:
Core #0 (1 thread(s))
  L1I0 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                        1,824,722
    Misses:                          5,545
    Compulsory misses:               3,279
    Invalidations:                       0
    Miss rate:                        0.30%
  L1D0 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                        1,042,935
    Misses:                          6,386
    Compulsory misses:               4,379
    Invalidations:                       0
    Prefetch hits:                   2,625
    Prefetch misses:                 4,723
    Miss rate:                        0.61%
Core #1 (2 thread(s))
  L1I1 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                           10,225
    Misses:                            214
    Compulsory misses:                 214
    Invalidations:                       0
    Miss rate:                        2.05%
  L1D1 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                            4,506
    Misses:                            135
    Compulsory misses:                 235
    Invalidations:                       0
    Prefetch hits:                      60
    Prefetch misses:                   100
    Miss rate:                        2.91%
Core #2 (2 thread(s))
  L1I2 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                        5,764,947
    Misses:                         19,528
    Compulsory misses:               6,122
    Invalidations:                       0
    Miss rate:                        0.34%
  L1D2 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                        1,886,571
    Misses:                         38,719
    Compulsory misses:              14,914
    Invalidations:                       0
    Prefetch hits:                   9,950
    Prefetch misses:                29,238
    Miss rate:                        2.01%
Core #3 (0 thread(s))
LL (size=8388608, assoc=16, block=64, LRU) stats:
    Hits:                           53,103
    Misses:                         17,424
    Compulsory misses:              24,257
    Invalidations:                       0
    Prefetch hits:                  27,228
    Prefetch misses:                 6,833
    Local miss rate:                 24.71%
    Child hits:                 10,546,541
    Total miss rate:                  0.16%

===========================================================================
Trace invariant failure in T7040 at ref # 10490989 (0 instrs since timestamp 13344129569495308): Thread is missing exit
abort() has been called

derekbruening added a commit that referenced this issue Nov 13, 2023
If the thread trace ends mid-rseq-region, dump the buffer instead of
losing not only the rseq sequence so far but also the thread exit
record.

Adds a unit test.  Also tested on a real trace with this issue.

Fixes #6444
derekbruening added a commit that referenced this issue Nov 13, 2023
Adds an invariant check that each thread has an exit record in a
drmemtrace.

Adds a path for the analyzer to call process_shard_exit() when there is
no exit record.
Manually tested this on a real truncated trace.

Adds a unit test.

The new check failed on nearly every existing unit test, requiring
adding thread exits. That in turn triggered exit checks on missing
headers and filetypes. I ended up having to edit every test case. As
part of this I standardized the tid constants used.

Issue: #6444
derekbruening added a commit that referenced this issue Nov 15, 2023
Augments the mid-rseq-trace-exit case to handle the rseq final instr
being reached but not passed.  Adds a test case.

Issue: #6444
derekbruening added a commit that referenced this issue Nov 15, 2023
Augments the mid-rseq-trace-exit case to handle the rseq final instr
being reached but not passed. Adds a test case.

Issue: #6444
brettcoon pushed a commit that referenced this issue Nov 16, 2023
Augments the mid-rseq-trace-exit case to handle the rseq final instr
being reached but not passed. Adds a test case.

Issue: #6444
abhinav92003 added a commit that referenced this issue May 1, 2024
Skips generating the final interval snapshot for thread-shards that are
missing the thread exit entry, during trace interval analysis in the
drmemtrace analyzer framework. Such final intervals currently have
incorrect data (the cumulative instr count, instr count delta, and end
timestamp). Instead of generating incorrect data we skip such a final
interval and print a non-fatal error message.

This issue is seen on interval analysis of only the traces affected by
i#6444. That issue has already been fixed many months ago, and does not
affect newer traces. This fix is to allow trace interval analysis of
those older traces.

Due to i#6444, some threads were missing thread exit entries because
tracing ended in an rseq region. For such traces, the i#6444 fix already
added an analyzer workaround that ensures that process_shard_exit is
called despite the missing thread exit event. Here we modify that
workaround to skip processing the final interval when the workaround had
to be invoked; this is required because the worker's trace stream would
have moved on to other shards (if there were others assigned to the
worker) and therefore cannot anymore provide the state (instr count,
timestamp when the shard ended) required for generation of the final
interval snapshot for that shard.

Note that for the shard that was the last shard in the worker's stream,
we can still continue to generate the final interval snapshot, because
the worker's trace stream is still good to generate state for that
shard.

Adds modified version of existing interval analysis unit tests to verify
the workaround.

Also fixes a log which may not print the correct stream name if the
error was in a different shard than the current one (because of the same
reason we're seeing the primary issue).

Issue: #6444
Fixes: #6793
@derekbruening
Copy link
Contributor Author

The missing thread exit in #6444 (comment) may be the same as #6733 which we see on Linux every so often and have never been able to figure out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant