-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cross-arch support to drdecodelib #1684
Comments
Alternatively, supporting building one or the other on either host architecture would also go a long way and be a nice mostly-there solution. |
For #4318 we'd want cross-arch support in full DR and not just drdecodelib. I think it's not much extra work to do so as the bulk of the work would have to be done for drdecodelib anyway. |
The goal is to compile the decodelib sources for all architectures into The main challenge is duplicate symbols:
|
@johnfxgalea , @AssadHashmi : looking for some feedback on the proposal in the prior comment re: interface changes and names. |
Pretty nice idea. It will also help tools like drdisas to disassemble cross-arch instructions. I just had a quick look at your comment, and best I sleep on it before giving further feedback. However, my initial concern with the change is that it is going to break quite a lot of code including Dr Memory. I think it is best we break it down into smaller steps and do the change incrementally. Maybe start off with one arch first, particularly with regards to the opcode name changes? Also: Alternatively, supporting building one or the other on either host architecture would also go a long way and be a nice mostly-there solution. I take it that the above is no longer under consideration? That will avoid such a major change but I understand it would be better to have support for both archs in the same build. |
Yes, I think that is best for consistency. |
What if there were a compatibility layer in place which kept all the old names for the primary architecture as aliases?
This might work for our use case. We're driven by #4318 and could conceivably swap to a different binary, though it complicates particular deployment patterns. Maybe we should look further into that to start. |
Hmm, what do you mean here exactly? |
For the way we run our automated jobs it is easier to have a single binary with everything statically linked together. A job is typically one single binary. Now we'll need two separate deployed binaries plus some kind of selector to figure out which one to launch which may end up requiring a separate job stage in the automation pipeline. Still easier than adding all arches in the same DR build though, even if not as appealing. |
It seems like the first step for either the cross-target or multi-arch is to isolate the decoding files from the rest of the core, so we can set a different target arch for them without messing up header includes, inline asm, and other execution-oriented code and build attributes where we want the host arch. This would be a continuation of the cleanup of building drdecode separately, but I would want to eventually get the core to use the drdecode lib itself, instead of building those same files separately. Because of the include file issue, I think we have to move the files to physically separate the headers so our include dirs using the host arch can be separate from those for the target arch. This will be a version history pain point but long-term should be cleanest in other respects as well. I'm proposing to make a new directory I.e., we'd have Unless there are competing ideas either for the approach or the directory name and location for |
Okay, but moving source out of core/arch is going to "mess" some (okay, a few) header includes mainly in the Perl script.
I think the separation is good. Not sure if "decode" is a better name but "ir" also sounds good to me. |
Right, all references, including genapi.pl, would be updated. I picked "ir" over "decode" b/c it's instruction generation, instruction list manipulation, encoding, etc., not just decoding, and "decode" makes it sound too limited IMHO. |
Moves all of the IR-related files (instruction generation, encoding, decoding, disassembly, instructions, operands, instruction lists) from core/arch to core/ir, mirroring the arch-specific subdirectories under core/ir. This is a code cleanup step toward properly isolating the drdecode library, as well as moving us toward the ability to build for a separate target architecture from the host architecture and eventually perhaps building in multiple target architectures in the same binary for decoding and IR manipulation. Also renames mangle_utils.c to ir_utils.c to better fit its purpose and location. Issue: #1684
Moves all of the IR-related files (instruction generation, encoding, decoding, disassembly, instructions, operands, instruction lists) from core/arch to core/ir, mirroring the arch-specific subdirectories under core/ir. This is a code cleanup step toward properly isolating the drdecode library, as well as moving us toward the ability to build for a separate target architecture from the host architecture and eventually perhaps building in multiple target architectures in the same binary for decoding and IR manipulation. Also renames mangle_utils.c to ir_utils.c to better fit its purpose and location. Issue: #1684
Moves the handful of functions from mangle.c used by drdecode (things that are difficult to untangle from the IR functions) into ir_utils.c Renames the top-level ir_utils.c to ir_utils_shared.c. Adds ir_utils.h. Issue: #1684
Moves the handful of functions from mangle.c used by drdecode (things that are difficult to untangle from the IR functions) into ir_utils.c Renames the top-level ir_utils.c to ir_utils_shared.c. Adds ir_utils.h. Issue: #1684
Adds drlibc_unix.h to better separate drlibc and prevent it including os_private.h and thus pulling in tls.h and other headers related to DR execution, which incorrectly mixes target and host architectures for drdecode and drdisas. Issue: #1684
Adds drlibc_unix.h to better separate drlibc and prevent it including os_private.h and thus pulling in tls.h and other headers related to DR execution, which incorrectly mixes target and host architectures for drdecode and drdisas. Issue: #1684
So I've been struggling with the right approach here to split the host from the target for the multi-binary solution. I have two different branches and I keep waffling between them, thinking the other must be better. Approach A: Add an IR_ARCH define for the target, change all the ifdefs (and IF_XX_ELSE, etc.) in core/ir/ to use the new IR_X86 instead of X86, and then go fix up enough IR-using ifdefs in the rest of the code to get what we want to build. This is nice because it's a stepping-stone to multi-arch-in-one-build: we then replace all the IR_ARCH defines with runtime conditionals and leave all the other ifdefs alone as host ifdefs. But there are many cases outside of core/ir/ that require changing and some that are complex and mix host and target-IR, like TLS using SEG_FS: so if host is x86 and target is aarch64, it fails b/c SEG_FS is x86-only. Approach B: Add a HOST_ARCH define for the host and change only the asm code to use it, leaving everything else as the target. This produces a crippled core that is never going to work for anything but standalone mode: but A essentially does that as well, it's just not quite as crippled. Both require a lot of work going through and trying to separate host from system in places like TLS handling where they are intertwined. Both would benefit from avoiding the core altogether and only supporting building drdecode, drdisas, rawtrace, and trace analyzers: the things we want to run vs other arches. However, there are several problems there:
Approach B is less work, it turns out. I pushed it through w/ a relatively small diff (much smaller than in-progress A), and now I have a fully-building x86-host aarch64-target where I can run drdisas, rawtrace, and opcode_mix on an x86_64 machine targeting aarch64 traces and ISA. But I feel bad b/c it does not leverage the refactoring into core/ir. Still, that refactoring is a good cleanup and a step toward future more-modular code, even if we don't push on it now. |
The interface and name changes are quite intrusive and the later suggestion is desirable:
Regarding the different approaches:
Agreed. Based on my limited knowledge, B seems like a safer option ATM and prepares the ground for future work. Just to be clear, is this work primarily driven by #4318 ? |
Yes, that is our immediate goal, postprocessing and analyzing traces gathered on another platform, such as aarch64, on an x86 machine. Ideally we wouldn't need a different build to do it but we'll leave that for future work and go with the separate build solution. The separate build has a libdynamorio that works in standalone mode, but fails of course if used to run anything. I have most of the tests disabled; trying to figure out the least disruptive way to do that though w/o a big if around everything. Ideally there I could get everything to build like on Mac and use a -L tag, but the building is the hard part with all the asm code and low-level things in the tests. Worst-case I'll do the big if(). |
Building the aarch64 code on Windows with VS2017 is resulting in a bunch of warnings. While we won't easily get aarch64 drcachesim post-processing on Windows x86 due to module mapping complexity, we can at least get drdecode and drdisas. Here is a sampling of some of the warnings. I'm wondering if I should split some of these out instead of putting into the same PR #4325 -- although maybe it won't be all that many LOC to fix these.
|
Adds support for building with different host and target architectures. The goal is to support running drdisas, drdecode, drraw2trace, and drcachesim trace analyzers on one host machine while targeting recorded bytes or traces from a different type of machine: e.g., processing aarch64 traces on an x86_64 machine. The goal is *not* to turn DR into a cross-ISA binary translator: we only support standalone mode here, not full DR managed mode. Long-term it would be nicer to split out the DR standalone mode interfaces into separate libraries but that is beyond the scope of the current work. Adds a top-level CMake option variable "TARGET_ARCH" which defaults to CMAKE_SYSTEM_PROCESSOR but can be set to a different value. The regular arch variables/defines X86, AARCH64, and ARM are set from TARGET_ARCH, while new values DR_HOST_{X86,AARCH64,ARM} are set for the host. Uses a target-centric approach, where the bulk of the code is built targeting the TARGET_ARCH, with only assembly code and other limited code using the DR_HOST_* arch. This limits the scope of the code changes as compared to the opposite host-centric approach. A new define DR_HOST_NOT_TARGET is used to simply disable cases where the host and target are intertwined and difficult to easily separate, such as TLS handling and assembly code used for application execution (such as icache_op_isb_asm). The key code changed to use DR_HOST_* includes: + *.asm files + arch_exports.h inline asm for atomics and other key utilities + system call defines (we don't want to have our raw syscall wrappers running incorrect syscall numbers: we'll end up with fork bombs or other madness) + injection code + data section setup + drsyms libraries Code using built-in __*__ arch defines is changed to use our defines: + sigcontext.h, which is also augmented for some missing AArch64 types + x86_code_test.c Code reading ELF headers is relaxed to allow *any* arch of the same bitwdith (i#1345 prevents different bitwidth), since we need the host for standalone init code but the target for raw2trace module mapping. Nearly all tests are disabled not just from running but also from building for simplicity, to reduce code changes here. Some api.* decoder tests, drdisas, and the altbin test for drcachesim post-processing and analysis are enbabled in host!=target builds. Tests are disabled by using a macro to set up the tests and shifting all the setup code prior to the tests themselves, allowing a return() to skip over the bulk of the tests to avoid a giant if(). Only Linux host-x86_64-target-aarch64 is explicitly tested and enabled here. Adding the inverse, and adding host-i386-target-arm32, should not be much work, mostly with compiler flags, but will be done separately. Adding Windows support is also separated out as it requires a number of changes for compiler problems with aarch64 code. Adds a new host-x86_64-target-aarch64 build to runsuite.cmake, allocated to a new Travis job to avoid slowing down the current jobs. Issue: #1684, #4318, #1345
Fixes an undefined symbol in drwrap when host!=target. Only some compilers complain about it. Issue: #1684
Fixes an undefined symbol in drwrap when host!=target. Only some compilers complain about it. Issue: #1684
Adds a long-missing feature: following into a Windows child process of a different bitwidth. Switches injection from DR and from drinjectlib (including drrun and drinject) to use -early_inject_map. This was most easily done by turning on -early_inject by default as well. However, the -early_inject_location default is INJECT_LOCATION_ImageEntry, which is the same late takeover point as with thread injection. Switching all injection over to map-from-the-parent simplifies cross-arch following, as well as making it easier to shift the takeover point to an earlier spot in the future. This is a step toward #607 by switching drinjectlib to use map injection; the takeover point, as mentioned, is still the image entry. Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit child from a 64-bit parent, but this option is only sketched out and is not fully supported yet: #49 covers adding tests and official support. Adds library swapping code to find the other-bitwidth library, which assumes a parallel directory structure. Add a new fatal error if the library for a child is not found. To support generating code for all 3 child-parent cases (same-same, 32-64, and 64-32), and in particular for 32-64, switches the small gencode sequence for -early_inject_map from using IR to using raw bytes. A multi-arch encoder (#1684) would help but we would need cross-bitwidth support there, which is not on the horizon. Fixes what look like bugs in the original gencode generation along the way (s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not clear how it worked before. Adds support for several system calls from a 32-bit parent to a 64-bit child where the desired NtWow64* system call does not exist. We use switch_modes_and_call() for NtProtectVirtualMemory and NtQueryVirtualMemory. Changes all types in the injection code to handle 64-bit addresses in 32-bit code. Adds UNICODE_STRING_32 and RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from 64-bit parents. Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and PROCESS_BASIC_INFORMATION64. Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit. Adds get_remote_proc_address() and uses it to look up dynamorio_earliest_init_takeover() in a child DR. Finds the remote ntdll base via a remote query memory walk plus remote image header parsing. This requires adding a switch_modes_and_call() version of NtQueryVirtualMemory (also mentioned above), which needs 64-bit args: so we refactor switch_modes_and_call() to take in a struct of all 64-bit fields for the args. Fixes a few bugs in other routines to properly get the image name and image entry for 32-bit children of 64-bit parents. Updates environment variable propagation code to handle a 32-bit parent and a 64-bit child. Updates a 64-bit parent and 32-bit child to insert the variables into the 32-bit PEB (64-bit does no good), which requires finding the 32-bit PEB. This is done via the 32-bit TEB, using a hack due to what seems like a kernel bug where it has the TebBaseAddress 0x2000 too low. Makes environment variable propagation failures fatal and visible, unlike previously where errors would just result in silently letting the child run natively. Turns some other prior soft errors into fatal errors on child takeover. Moves environment variable propagation to post-CreateUserProcess instead of waiting for ResumeThread, which avoids having to get the thread context (for which we have no other-bitwidth support) to figure out whether it's the first thread in the process or not. We bail on propagation for pre-Vista where we'd have to wait for ResumeThred. Generalizes the other-bitwidth Visual Studio toolchain environment variable setting for use in a new build-and-test other-bitwidth test which builds dynamorio and the large_options client (to ensure options are propagated to children; and it has convenient init and exit time prints) for the other bitwidth, arranges parallel lib dirs, and runs the other client Issue: #803, #147, #607, #49 Fixes #803
Adds a long-missing feature: following into a Windows child process of a different bitwidth. Switches injection from DR and from drinjectlib (including drrun and drinject) to use -early_inject_map. This was most easily done by turning on -early_inject by default as well. However, the -early_inject_location default is INJECT_LOCATION_ThreadStart, a new "early" injection location which is the same late takeover point as with thread injection (we could also use _ImageEntry, which is only very slightly later, but that fails for .NET and other applications). Switching all injection over to map-from-the-parent simplifies cross-arch following, as well as making it easier to shift the takeover point to an earlier spot in the future. This is a step toward #607 by switching drinjectlib to use map injection; the takeover point, as mentioned, is still the thread start. Placing a hook at the thread start causes some stability issues, so instead of the usual hook for -early_inject_map, for INJECT_LOCATION_ThreadStart we set the thread context, like thread injection does. The gencode still restores the hook as a nop, for simplicity. For parent64 child32, we can't easily locate the thread start, so we assume it's ntdll32!RtlUserThreadStart (which is also a fallback if anything fails in other cases; the final fallback is a hook at the image entry, which works nearly everywhere but not for .NET where the image entry is not reached). Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit child from a 64-bit parent, but this option is only sketched out and is not fully supported yet: #49 covers adding tests and official support. Adds library swapping code to find the other-bitwidth library, which assumes a parallel directory structure. Add a new fatal error if the library for a child is not found. To support generating code for all 3 child-parent cases (same-same, 32-64, and 64-32), and in particular for 32-64, switches the small gencode sequence for -early_inject_map from using IR to using raw bytes. A multi-arch encoder (#1684) would help but we would need cross-bitwidth support there, which is not on the horizon. Fixes what look like bugs in the original gencode generation along the way (s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not clear how it worked before. Adds support for several system calls from a 32-bit parent to a 64-bit child where the desired NtWow64* system call does not exist. We use switch_modes_and_call() for NtProtectVirtualMemory and NtQueryVirtualMemory. Changes all types in the injection code to handle 64-bit addresses in 32-bit code. Adds UNICODE_STRING_32 and RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from 64-bit parents. Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and PROCESS_BASIC_INFORMATION64. Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit. Adds get_remote_proc_address() and uses it to look up dynamorio_earliest_init_takeover() in a child DR. Finds the remote ntdll base via a remote query memory walk plus remote image header parsing. This requires adding a switch_modes_and_call() version of NtQueryVirtualMemory (also mentioned above), which needs 64-bit args: so we refactor switch_modes_and_call() to take in a struct of all 64-bit fields for the args. Fixes a few bugs in other routines to properly get the image name and image entry for 32-bit children of 64-bit parents. Updates environment variable propagation code to handle a 32-bit parent and a 64-bit child. Updates a 64-bit parent and 32-bit child to insert the variables into the 32-bit PEB (64-bit does no good), which requires finding the 32-bit PEB. This is done via the 32-bit TEB, using a hack due to what seems like a kernel bug where it has the TebBaseAddress 0x2000 too low. Makes environment variable propagation failures fatal and visible, unlike previously where errors would just result in silently letting the child run natively. Turns some other prior soft errors into fatal errors on child takeover. Moves environment variable propagation to post-CreateUserProcess instead of waiting for ResumeThread, which avoids having to get the thread context (for which we have no other-bitwidth support) to figure out whether it's the first thread in the process or not. We bail on propagation for pre-Vista where we'd have to wait for ResumeThred. Generalizes the other-bitwidth Visual Studio toolchain environment variable setting for use in a new build-and-test other-bitwidth test which builds dynamorio and the large_options client (to ensure options are propagated to children; and it has convenient init and exit time prints) for the other bitwidth, arranges parallel lib dirs, and runs the other client. Issue: #803, #147, #607, #49 Fixes #803
Pasting my notes for ARM on i386:
And x86-64 on aarch64:
It's not clear whether there's a small or large amount of work needed beyond these initial config issues. |
Today we have ifdefs selecting whether the decoder and encoder support
either ARM or x86. For a nice standalone decoder we should have support
for both in the same build.
The text was updated successfully, but these errors were encountered: