Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#4403: Fix invalid drmemtrace address elision #4404

Merged
merged 7 commits into from
Aug 17, 2020

Conversation

derekbruening
Copy link
Contributor

Fixes a bug where a non-memref clobbering a base register failed to
invalidate eliding that address from a drcachesim offline trace.

Adds asm test cases to burst_traceopts.

Adds further C++ test code to burst_traceopts which also triggers this
bug (it took some experimentation to get some code to do that), and
adds handling of different trace header points for opt vs noopt to
support this additional trace size.

Fixes #4403

Fixes a bug where a non-memref clobbering a base register failed to
invalidate eliding that address from a drcachesim offline trace.

Adds asm test cases to burst_traceopts.

Adds further C++ test code to burst_traceopts which also triggers this
bug (it took some experimentation to get some code to do that), and
adds handling of different trace header points for opt vs noopt to
support this additional trace size.

Fixes #4403
@derekbruening
Copy link
Contributor Author

Looks like we need to update the Appveyor config:

appveyor DownloadFile http://doxygen.nl/files/doxygen-1.8.17.windows.x64.bin.zip
Error downloading remote file: One or more errors occurred.
Inner Exception: Remote server returned 404: Not Found

I created PR #4405.

@derekbruening
Copy link
Contributor Author

Wow the test crashed on Appveyor:

[00:23:24] 219: Test command: "C:\Program Files (x86)\CMake\bin\cmake.exe" "-D" "precmd=foreach@C:/Program Files (x86)/CMake/bin/cmake.exe@-E@[email protected]_traceopts.*.dir" "-D" "cmd=C:/projects/dynamorio/build/build_debug-internal-64/clients/bin64/tool.drcacheoff.burst_traceopts.exe" "-D" "postcmd=" "-D" "postcmd2=" "-D" "postcmd3=" "-D" "cmp=C:/projects/dynamorio/build/build_debug-internal-64/suite/tests/offline-burst_traceopts.expect" "-D" "code=" "-P" "C:/projects/dynamorio/suite/tests/runmulti.cmake"
[00:23:24] 219: Test timeout computed to be: 600
[00:23:24] 219: Running cmd |C:/projects/dynamorio/build/build_debug-internal-64/clients/bin64/tool.drcacheoff.burst_traceopts.exe|
[00:23:24] 219: CMake Error at C:/projects/dynamorio/suite/tests/runmulti.cmake:106 (message):
[00:23:24] 219:   *** cmd failed (-1): <Application
[00:23:24] 219:   C:\projects\dynamorio\build\build_debug-internal-64\clients\bin64\tool.drcacheoff.burst_traceopts.exe
[00:23:24] 219:   (4608).  DynamoRIO Cache Simulator Tracer internal crash at PC
[00:23:24] 219:   0x00007ff9503e4ce7.  Please report this at http://dynamorio.org/issues.
[00:23:24] 219:   Program aborted.
[00:23:24] 219: 
[00:23:24] 219:   0xc0000005 0x00000000 0x00007ff9503e4ce7 0x00007ff9503e4ce7
[00:23:29] 219:   0x0000000000000000 0xffffffffffffffff
[00:23:29] 219: 
[00:23:29] 219:   Base: 0x00007ff6f2970000
[00:23:29] 219: 
[00:23:29] 219:   Registers: eax=0x17ad8a9b17ad8a99 ebx=0x0000000000000246
[00:23:29] 219:   ecx=0x00007ff6f306e118 edx=0x0000000000000003
[00:23:29] 219: 
[00:23:29] 219:         esi=0x0000000000000000 edi=0x00007ff6f306e118 esp=0x000002050beecb90
[00:23:29] 219:   ebp=0x0000000000000000
[00:23:29] 219: 
[00:23:29] 219:         r8 =0x0000000000000000 r9 =0x0000000000000010 r10=0x00007ff673101a60
[00:23:29] 219:   r11=0x00007ff673106290
[00:23:29] 219: 
[00:23:29] 219:         r12=0x0000000000000000 r13=0x0000000000000000 r14=0x0000000000000000
[00:23:29] 219:   r15=0x0000000000000000
[00:23:29] 219: 
[00:23:29] 219:         eflags=0x00000
[00:23:29] 219: 
[00:23:29] 219:   version 8.0.18487, custom build
[00:23:29] 219: 
[00:23:29] 219:   -client_lib ';;-offline ' -stderr_mask 12 -stack_size 56K -max_elide_jmp 0
[00:23:29] 219:   -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list ''
[00:23:29] 219:   -no_native_exec_managed_code -no_indcall2direct >
[00:23:29] 219: 
[00:23:29] 219:   ***
[00:23:29] 219: 
[00:23:29] 219: Call Stack (most recent call first):
[00:23:29] 219:   C:/projects/dynamorio/suite/tests/runmulti.cmake:115 (process_cmdline)
[00:23:29] 219: 
[00:23:29] 219: 
[00:23:29] 218/251 Test #219: code_api|tool.drcacheoff.burst_traceopts .....................***Failed    0.20 sec

I just ran it on a local Windows machine and it passes there though. Hmm.

@derekbruening
Copy link
Contributor Author

Passes 20x in a row locally. Passes 32-bit too. Grrr: at least having the crash PC symbolized would help but there are no artifacts in our Appveyor CI builds. There are known Windows instabilities in static DR -- trying to find the issue numbers.

@johnfxgalea
Copy link
Contributor

Were there any merges to master prior to your appveyor fix which could be the actual cause?

@derekbruening
Copy link
Contributor Author

Were there any merges to master prior to your appveyor fix which could be the actual cause?

Looking at the Appveyor history: only one, PR #4395? Though it was green in its PR. Looks like the Doxygen failure hit on its merge but nobody noticed.

This crashing test is changed by my PR here, so probably it is something in static DR combining managed mode and standalone DR that is made more likely to show up by my changes. I recall problems on Windows with a single app using DR in multiple modes when DR is statically linked.

@derekbruening
Copy link
Contributor Author

There is RDP access to the Appveyor machines. I will try that to at least get a callstack and understand if it's related to the change here (suspecting not) or under some existing issue.

@derekbruening
Copy link
Contributor Author

Hmm:

Exception thrown at 0x00007FFAF0AC4CE7 (ntdll.dll) in tool.drcacheoff.burst_traceopts.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

 	ntdll.dll!00007ffaf0ac4ce7()	Unknown
 	ntdll.dll!00007ffaf0ac4b80()	Unknown
>	tool.drcacheoff.burst_traceopts.exe!_Mtxlock(_RTL_CRITICAL_SECTION * _Mtx) Line 24	C
 	tool.drcacheoff.burst_traceopts.exe!std::_Lockit::_Lockit(int kind) Line 69	C++
 	tool.drcacheoff.burst_traceopts.exe!std::_Container_base12::_Orphan_all(void)	C++
 	tool.drcacheoff.burst_traceopts.exe!std::_Vector_alloc<std::_Vec_base_types<droption_parser_t *,std::allocator<droption_parser_t *> > >::_Orphan_all() Line 536	C++
 	tool.drcacheoff.burst_traceopts.exe!std::vector<std::_List_unchecked_const_iterator<std::_List_val<std::_List_simple_types<unsigned short> >,std::_Iterator_base0>,dr_allocator_t<std::_List_unchecked_const_iterator<std::_List_val<std::_List_simple_types<unsigned short> >,std::_Iterator_base0> > >::_Change_array(std::_List_unchecked_const_iterator<std::_List_val<std::_List_simple_types<unsigned short> >,std::_Iterator_base0> * const _Newvec, const unsigned __int64 _Newsize, const unsigned __int64 _Newcapacity) Line 1898	C++
 	tool.drcacheoff.burst_traceopts.exe!std::vector<std::_List_unchecked_const_iterator<std::_List_val<std::_List_simple_types<unsigned short> >,std::_Iterator_base0>,dr_allocator_t<std::_List_unchecked_const_iterator<std::_List_val<std::_List_simple_types<unsigned short> >,std::_Iterator_base0> > >::_Reallocate_exactly(const unsigned __int64 _Newcapacity) Line 1510	C++
 	tool.drcacheoff.burst_traceopts.exe!std::vector<std::_List_unchecked_const_iterator<std::_List_val<std::_List_simple_types<unsigned short> >,std::_Iterator_base0>,dr_allocator_t<std::_List_unchecked_const_iterator<std::_List_val<std::_List_simple_types<unsigned short> >,std::_Iterator_base0> > >::reserve(const unsigned __int64 _Newcapacity) Line 1524	C++
 	tool.drcacheoff.burst_traceopts.exe!std::_Hash<std::_Uset_traits<unsigned short,std::_Uhash_compare<unsigned short,std::hash<unsigned short>,std::equal_to<unsigned short> >,dr_allocator_t<unsigned short>,0> >::_Init(unsigned __int64 _Buckets) Line 1131	C++
 	tool.drcacheoff.burst_traceopts.exe!std::_Hash<std::_Uset_traits<unsigned short,std::_Uhash_compare<unsigned short,std::hash<unsigned short>,std::equal_to<unsigned short> >,dr_allocator_t<unsigned short>,0> >::_Hash<std::_Uset_traits<unsigned short,std::_Uhash_compare<unsigned short,std::hash<unsigned short>,std::equal_to<unsigned short> >,dr_allocator_t<unsigned short>,0> >(const std::_Uhash_compare<unsigned short,std::hash<unsigned short>,std::equal_to<unsigned short> > & _Parg, const dr_allocator_t<unsigned short> & _Al) Line 204	C++
 	tool.drcacheoff.burst_traceopts.exe!std::unordered_set<unsigned short,std::hash<unsigned short>,std::equal_to<unsigned short>,dr_allocator_t<unsigned short> >::unordered_set<unsigned short,std::hash<unsigned short>,std::equal_to<unsigned short>,dr_allocator_t<unsigned short> >() Line 107	C++
 	tool.drcacheoff.burst_traceopts.exe!offline_instru_t::identify_elidable_addresses(void * drcontext, _instr_list_t * ilist, int version) Line 732	C++
 	tool.drcacheoff.burst_traceopts.exe!offline_instru_t::bb_analysis(void * drcontext, void * tag, void * * bb_field, _instr_list_t * ilist, bool repstr_expanded) Line 641	C++
 	tool.drcacheoff.burst_traceopts.exe!event_bb_analysis(void * drcontext, void * tag, _instr_list_t * bb, bool for_trace, bool translating, void * user_data) Line 1202	C++
 	tool.drcacheoff.burst_traceopts.exe!drmgr_bb_event_do_instrum_phases(void * drcontext, void * tag, _instr_list_t * bb, char for_trace, char translating, _per_thread_t * pt, _local_ctx_t * local_info, void * * pair_data, void * * quartet_data) Line 928	C
 	tool.drcacheoff.burst_traceopts.exe!drmgr_bb_event(void * drcontext, void * tag, _instr_list_t * bb, char for_trace, char translating) Line 1141	C
 	tool.drcacheoff.burst_traceopts.exe!instrument_basic_block(_dcontext_t * dcontext, unsigned char * tag, _instr_list_t * bb, char for_trace, char translating, dr_emit_flags_t * emitflags) Line 1676	C
 	tool.drcacheoff.burst_traceopts.exe!client_process_bb(_dcontext_t * dcontext, build_bb_t * bb) Line 2756	C
 	tool.drcacheoff.burst_traceopts.exe!build_bb_ilist(_dcontext_t * dcontext, build_bb_t * bb) Line 4126	C
 	tool.drcacheoff.burst_traceopts.exe!build_basic_block_fragment(_dcontext_t * dcontext, unsigned char * start, unsigned int initial_flags, char link, char visible, char for_trace, _instr_list_t * * unmangled_ilist) Line 5130	C
 	tool.drcacheoff.burst_traceopts.exe!d_r_dispatch(_dcontext_t * dcontext) Line 214	C
 	[External Code]	

Xref #4155: func_view test fails on Appveyor with access violation.
But that was in raw2trace mixing standalone and the DR allocator.

@derekbruening
Copy link
Contributor Author

It's transparency: using the same libraries as the app, which when linking DR statically into the app has no built-in isolation we'd normally have. It's not clear why the changes here cause this to show up as a crash: that std::unordered_set was there and used in similar ways before.

One solution is to create our own set class. These are GPR register enums so a simple array of bools would do the trick.

@derekbruening
Copy link
Contributor Author

So I made an isolated set class, and testing it on my local Windows suddenly is hitting a crash that looks sort of similar to the one on Appveyor since it involves an STL class using std::_Lockit, but this is post-detach:

(19ac.3674): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
ntdll!RtlWaitOnAddress+0xc6:
00007ff8`92f15e16 ff4024          inc     dword ptr [rax+24h] ds:2242cb63`2242cb85=????????
0:000> kn
 # Child-SP          RetAddr           Call Site
00 0000005b`fe73eed0 00007ff8`92ef15b4 ntdll!RtlpWaitOnCriticalSection+0xa6
01 0000005b`fe73efb0 00007ff8`92ef13e2 ntdll!RtlpEnterCriticalSectionContended+0x1c4
02 0000005b`fe73f010 00007ff7`977b6084 ntdll!RtlEnterCriticalSection+0x42
03 0000005b`fe73f040 00007ff7`977b1174 tool_drcacheoff_burst_traceopts!_Mtxlock+0x14 [f:\dd\vctools\crt\crtw32\stdcpp\xmtx.c @ 24] 
04 0000005b`fe73f070 00007ff7`977ade7c tool_drcacheoff_burst_traceopts!std::_Lockit::_Lockit+0x54 [f:\dd\vctools\crt\crtw32\stdcpp\xlock.cpp @ 69] 
05 0000005b`fe73f0a0 00007ff7`977a85f2 tool_drcacheoff_burst_traceopts!std::_Container_base12::_Orphan_all+0x2c [c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.12.25827\include\xutility @ 252] 
06 0000005b`fe73f0f0 00007ff7`977ae898 tool_drcacheoff_burst_traceopts!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Reallocate_for<<lambda_66f57f934f28d61049862f64df852ff0>,char const * __ptr64>+0x92 [c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.12.25827\include\xstring @ 3607] 
07 0000005b`fe73f150 00007ff7`977ae7d2 tool_drcacheoff_burst_traceopts!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign+0xb8 [c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.12.25827\include\xstring @ 2433] 
08 0000005b`fe73f1a0 00007ff7`977a972b tool_drcacheoff_burst_traceopts!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign+0x32 [c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.12.25827\include\xstring @ 2438] 
09 0000005b`fe73f1d0 00007ff7`977a6256 tool_drcacheoff_burst_traceopts!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::basic_string<char,std::char_traits<char>,std::allocator<char> >+0x3b [c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.12.25827\include\xstring @ 1964] 
0a 0000005b`fe73f210 00007ff7`977a6673 tool_drcacheoff_burst_traceopts!post_process+0x76 [d:\derek\dr\git\src\clients\drcachesim\tests\burst_traceopts.cpp @ 159] 
0b 0000005b`fe73f5c0 00007ff7`977a6cea tool_drcacheoff_burst_traceopts!gather_trace+0x153 [d:\derek\dr\git\src\clients\drcachesim\tests\burst_traceopts.cpp @ 210] 
0c 0000005b`fe73f670 00007ff7`977b90a4 tool_drcacheoff_burst_traceopts!main+0x7a [d:\derek\dr\git\src\clients\drcachesim\tests\burst_traceopts.cpp @ 260] 

Why is this happening now when it didn't this happen before with the std::unordered_set??

How is detach affecting this: I don't see any relation to the PEB or TEB or other things that DR could potentially mess up on detach.

@derekbruening
Copy link
Contributor Author

Not really sure why all these containers which shouldn't need internal locks all go through this std::_Container_base12::_Orphan_all and std::_Lockit. Trying to examine the RTL_CRITICAL_SECTION here: but not making much progress. Grr this is turning into a rabbit hole and making less sense all the time.

@derekbruening
Copy link
Contributor Author

OK I think this local error and the Appveyor were actually caused by memory corruption from the test loop I put in: to make the pattern trigger the elision bug the code went through contortions, including being a two-dimensional array, and the nested loop was left there, resulting in memory clobbering! Running appveyor now to confirm. The set class may still be a good idea to isolate all this Windows lock stuff but it's separated out and not in this PR.

@derekbruening
Copy link
Contributor Author

Travis failure is fib-conflict #3406.

@derekbruening derekbruening merged commit 6ef1757 into master Aug 17, 2020
@derekbruening derekbruening deleted the i4403-trace-elide-bug branch August 17, 2020 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drmemtrace address elision ignoring non-memref writes, leading to incorrect addresses
3 participants