-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support detach on 32-bit ARM (AArch64 finished) #1578
Comments
Adds an alternative scheme for achieving a post-call control point that does not require flushing or shared data structure examination per-call: replacing the return address with a sentinel. When the new flag DRWRAP_REPLACE_RETADDR is set, the return address is replaced with the address of a single return instruction in the client library, with the real address saved. When a block is seen consisting of that sentinel instruction, post-call callbacks are called, and then control is sent to the saved real address using dr_redirect_native_target(). Adds wrapping tests to drwrap-test. This new scheme requires restoring return addresses on the stack on detach or other state translation. Adds functionality to do so, along with a new test client.drwrap-test-detach. This requires the client's state restoration event be called for addresses not in the code cache. Adds such a call. Adds comments about translation problems with clean call mangling which is filed as i#4219. The issues seen here are all limited to traces, so the test works around the problems with -disable_traces. Tested the core drwrap behavior on ARM and AArch64 but missing general detach support there (#1578) prevents enabling the detach test there. Issue: #4219 Fixes #4197
Adds an alternative scheme for achieving a post-call control point that does not require flushing or shared data structure examination per-call: replacing the return address with a sentinel. When the new flag DRWRAP_REPLACE_RETADDR is set, the return address is replaced with the address of a single return instruction in the client library, with the real address saved. When a block is seen consisting of that sentinel instruction, post-call callbacks are called, and then control is sent to the saved real address using dr_redirect_native_target(). Adds wrapping tests to drwrap-test. This new scheme requires restoring return addresses on the stack on detach or other state translation. Adds functionality to do so, along with a new test client.drwrap-test-detach. This requires the client's state restoration event be called for addresses not in the code cache. Adds such a call. Adds comments about translation problems with clean call mangling which is filed as i#4219. The issues seen here are all limited to traces, so the test works around the problems with -disable_traces. Tested the core drwrap behavior on ARM and AArch64 but missing general detach support there (#1578) prevents enabling the detach test there. Issue: #4219 Fixes #4197
If we only support detach for -private_loader we could put a flag in the |
The other detach issue I'm seeing is a crash due to not setting xsp in notify_and_jmp_without_stack. There's even a comment there that "xsp is only set for X86" but no reason is given. We have to set it to get the frame for sigreturn at the right spot: not really understanding that comment when it doesn't have a TODO or FIXME. |
After fixing the setting of xsp, I'm seeing something strange: immediately past the sigreturn everything is good, but as soon as we execute the SYS_futex we returned to for __pthread_cond_wait, the thread state is messed up. This causes a SIGSEGV natively, and under gdb it messes up gdb such that it's unusable:
If we messed up the tpidr* register wouldn't that show up more immediately? Why does running the SYS_futex ruin gdb? Why is gdb so fragile: it should IMHO handle anything user mode does. |
Fixes a crash on detach on aarchxx by setting xsp to the proper value for SYS_rt_sigreturn to find the signal frame for resuming app state. Tested manually on aarch64 the api.detach test in gdb. Unfortunately there are further bugs, so the test cannot be enabled yet. Issue: #1578
Fixes a crash on detach on aarchxx by setting xsp to the proper value for SYS_rt_sigreturn to find the signal frame for resuming app state. Tested manually on aarch64 on the api.detach test in gdb. Unfortunately there are further bugs, so the test cannot be enabled yet. Issue: #1578
Sets os_should_swap_state() to true. Adds support for os_swap_context() twice in a row with the same state. Removes os_swap_context_go_native(). The api.detach test now passes on AArch64. Enables all of the api.* and api.static_* tests for AArchXX, except api.startstop( crash) and api.static_prepop (hang) which will be fixed separately. Fixes a build error in the api.static_crash test: a missing header. Issue: #1578, #1582
Another issue: for an app not using libc (like a pure-asm app: suite/tests/bin/allasm_aarch64_flush) the app TLS pointer is NULL, which causes problems in DR's exit sequence where for non-x86 we need a base value b/c the DR pointer is a secondary pointer at an offset inside the base. My comment in os_switch_seg_to_context() for my fix for now: if (os_tls->app_lib_tls_base == NULL) {
/* XXX i#1578: For pure-asm apps that do not use libc, the app may have no
* thread register value. For detach we would like to write a 0 back into
* the thread register, but it complicates our exit code, which wants access
* to DR's TLS between dynamo_thread_exit_common()'s call to
* dynamo_thread_not_under_dynamo() and its call to
* set_thread_private_dcontext(NULL). For now we just leave our privlib
* segment in there. It seems rather unlikely to cause a problem: app code
* is unlikely to read the thread register; it's going to assume it owns it
* and will just blindly write to it.
*/
return true;
} |
Sets os_should_swap_state() to true. Adds support for os_swap_context() twice in a row with the same state. Removes os_swap_context_go_native(). For a no-libc app with NULL TLS (like our pure-asm tests), we leave our TLS pointer in the register on exit to simplify the exit process for now. This is a minor transparency issue on detach: going to live with it for now. The api.detach test now passes on AArch64. Enables all of the api.* and api.static_* tests for AArchXX, except api.startstop( crash), api.static_noclient (assert), api.thread_churn (crash), and api.static_prepop (hang) which will be fixed separately. Fixes a build error in the api.static_crash test: a missing header. Issue: #1578, #1582 Co-authored-by: Abhinav Anil Sharma <[email protected]>
So we now have api.detach and several other tests passing. api.static_signal passes most of the time on Jenkins but once it got an extra "Got SIGSEGV" (2 in a row): http://139.178.83.194:8080/job/DynamoRIO-AArch64-Precommit/1711/consoleFull |
PR #4470 for #4468 enabled client.drwrap-test-detach but it seems to still have problems:#4467 (comment) |
The status of trying to enable all the api.* and api.static_* start/stop/detach tests as of when I last looked at them, back on October 2020:
Once on Jenkins it got an extra "Got SIGSEGV" (2 in a row)
|
Since the base detach test and api.startstop and api.detach_state (GPR step) now work, base detach support is considered complete. The remaining issues have been filed as separate issues. I updated the checklist above to show which issues cover which outstanding tests. |
Fixes a bug in the api.static_prepop assembly where the link register is clobbered, leading to an infinite loop on aarchxx. Enables the test for aarchxx. Fixes a bug on identifying stopping points on ARM where LSB Thumb decoration on the two sides of the comparison was not consistent. Fixes a corresponding bug on going native where the LSB was not set. Tested manually on ARM. Issue: #1578, #4717, #4720 Fixes #4717
I closed this since detach is working for AArch64, but it still seems to have problems on ARM. #4720 is one of those. Maybe it should be re-opened for 32-bit ARM work. Some tests we've ensured work on AArch64 we have not had resources or machines to get working on ARM. |
Fixes a bug in the api.static_prepop assembly where the link register is clobbered, leading to an infinite loop on aarchxx. Enables the test for aarchxx. Fixes a bug on identifying stopping points on ARM where LSB Thumb decoration on the two sides of the comparison was not consistent. Fixes a corresponding bug on going native where the LSB was not set. Tested manually on ARM. Issue: #1578, #4717, #4720 Fixes #4717
To determine whether a thread has exited I'm storing a value in the app TLS slot we're using. To properly detach we'll need to instead restore that value. This issue covers coming up with some other solution to the problem of figuring out whether a thread has exited, or else delaying the detach value restore.
The text was updated successfully, but these errors were encountered: