-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#4403: Fix invalid drmemtrace address elision #4404
Conversation
Fixes a bug where a non-memref clobbering a base register failed to invalidate eliding that address from a drcachesim offline trace. Adds asm test cases to burst_traceopts. Adds further C++ test code to burst_traceopts which also triggers this bug (it took some experimentation to get some code to do that), and adds handling of different trace header points for opt vs noopt to support this additional trace size. Fixes #4403
Looks like we need to update the Appveyor config:
I created PR #4405. |
Wow the test crashed on Appveyor:
I just ran it on a local Windows machine and it passes there though. Hmm. |
Passes 20x in a row locally. Passes 32-bit too. Grrr: at least having the crash PC symbolized would help but there are no artifacts in our Appveyor CI builds. There are known Windows instabilities in static DR -- trying to find the issue numbers. |
Were there any merges to master prior to your appveyor fix which could be the actual cause? |
Looking at the Appveyor history: only one, PR #4395? Though it was green in its PR. Looks like the Doxygen failure hit on its merge but nobody noticed. This crashing test is changed by my PR here, so probably it is something in static DR combining managed mode and standalone DR that is made more likely to show up by my changes. I recall problems on Windows with a single app using DR in multiple modes when DR is statically linked. |
There is RDP access to the Appveyor machines. I will try that to at least get a callstack and understand if it's related to the change here (suspecting not) or under some existing issue. |
Hmm:
Xref #4155: func_view test fails on Appveyor with access violation. |
It's transparency: using the same libraries as the app, which when linking DR statically into the app has no built-in isolation we'd normally have. It's not clear why the changes here cause this to show up as a crash: that std::unordered_set was there and used in similar ways before. One solution is to create our own set class. These are GPR register enums so a simple array of bools would do the trick. |
So I made an isolated set class, and testing it on my local Windows suddenly is hitting a crash that looks sort of similar to the one on Appveyor since it involves an STL class using std::_Lockit, but this is post-detach:
Why is this happening now when it didn't this happen before with the std::unordered_set?? How is detach affecting this: I don't see any relation to the PEB or TEB or other things that DR could potentially mess up on detach. |
Not really sure why all these containers which shouldn't need internal locks all go through this |
OK I think this local error and the Appveyor were actually caused by memory corruption from the test loop I put in: to make the pattern trigger the elision bug the code went through contortions, including being a two-dimensional array, and the nested loop was left there, resulting in memory clobbering! Running appveyor now to confirm. The set class may still be a good idea to isolate all this Windows lock stuff but it's separated out and not in this PR. |
Travis failure is fib-conflict #3406. |
Fixes a bug where a non-memref clobbering a base register failed to
invalidate eliding that address from a drcachesim offline trace.
Adds asm test cases to burst_traceopts.
Adds further C++ test code to burst_traceopts which also triggers this
bug (it took some experimentation to get some code to do that), and
adds handling of different trace header points for opt vs noopt to
support this additional trace size.
Fixes #4403