Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for 32-bit app on 64-bit kernel, switch to 64-bit mode and use extra registers in DR and tool #751

Open
derekbruening opened this issue Nov 28, 2014 · 8 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on April 24, 2012 11:22:34

I've long wanted to work on various projects involving switching modes, whether for the app's benefit or the tool's, but so far have not had time to work on any of them.

the idea here is that, when running a 32-bit application on a 64-bit kernel
under a DynamoRIO tool, we can switch to 64-bit mode and use the extra
registers as scratch space to reduce spills and improve performance.

we should consider using the registers for core DR for ibl, and also make
them available to the tool

we'd have to mangle instructions that are not legal in x64 mode. some just
need a re-encoding (e.g., 1-byte inc/dec) while others will be more complex
(pusha, BCD instrs, lds, etc.) and it may be simpler to swap back to x86
mode rather than try to emulate some of them.

also have to be careful of instrs whose default operand size changes based
on mode. most problematic and common will be push/pop which will likely
have to all be converted to store/load (multiple if mem arg).

need to ensure fault handling is done properly regardless of mode

xref issue #49 : simultaneous 32-bit and 64-bit app code support

xref my prior proposals about 32-to-64 for app code for supporting 32-bit
legacy plugins in 64-bit apps

xref "Dynamic Register Promotion of Stack Variables" in CGO 2011
32-to-64 to optimize app by getting app stack refs into regs

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=751

@derekbruening
Copy link
Contributor Author

From [email protected] on June 12, 2012 10:36:48

pasting in some notes which are mostly me talking to myself but hopefully these are readable:

** TODO impl notes
*** TODO 64-bit DR or 64-bit capabilities in 32-bit DR?

simplest and cleanest to use 64-bit DR for decode/encode

*** TODO linux or windows? windows is easier

much easier to have 64-bit lib loaded in windows than to go implement our
own linux loader (ld.so won't load our 64-bit lib into 32-bit app)

*** TODO transitions between modes

on windows can use the already-set-up 64-bit code segment

kernel exposes GDT and LDT slots so should be able to create a descriptor
on linux but more work

*** TODO existing mixed-mode support

xref already-existing support for mixing 32-bit and 64-bit:

  • per-thread mode for decoding: set_x86_mode()
  • per-instr flag for encoding: instr_set_x86_mode()
  • whole fragment must be same mode: FRAG_32_BIT
    (decode_fragment relies on this. it's used for trace building to add
    each block to trace ilist, as well as for state recreation of
    coarse-grain)
  • no 32<->64 links
  • non-module-bitwidth is fine-grained
  • 2 versions of gencode for wow64 today
  • debugging support: xref windbg wow64exts "!sw" to swap modes.

for this we don't need to support app w/ mixed code (that's issue #49 ).

*** TODO x64-incompatible segment operations

push/pop of cs/ds/es/ss segments
segments in general are all flat: most apps don't care. ones that do:
could just bail on this xformation? if hit unsupported, leave existing
frags as x64 and bail on rest? leave some as x86 and try others later that
don't have segment ops in them?

*** TODO injection

don't start from 32-bit DR in process: just have 64-bit DR and add support
for 32-bit app (xref issue #49 )

just use x64 drinject and give it a 32-bit process?
follow children: treat like 64-to-64, and have DR in child do delay hook on
32-bit ntdll.

or, use 32-bit drinject and solve cross-arch inject
=> issue #803 though follow children will be 64-to-64

*** TODO once 64-bit DR loaded

ntdll64 already in process, so DR will initialize normally.

*** TODO mcontext

64-bit data struct but put in 32-bit values

*** TODO wow64 layer

don't try to solve issue #49 : leave wow64 native

*** TODO syscalls: just skip the wow64 far call so may actually be faster!

*** TODO Ki: need to hook 32-bit ntdll Ki

I'm assuming kernel talks to 64-bit wow64 layer, always, regardless of the
current processor mode. one question is: will the wow64 layer, on a fault
while in 64-bit mode imposed by DR, still go to 32-bit ntdll Ki? and if
so, will DR have any situations where it can't recover or handle its own
deliberate faults or something b/c it doesn't have 64-bit state?

*** TODO gencode

generate 32-bit ibl and cxt switch. add far jmp to cxt sw to support
32-bit for incremental work or if want to bail on translating certain app
instructions.

@derekbruening
Copy link
Contributor Author

From [email protected] on June 12, 2012 10:37:13

Owner: [email protected]

@derekbruening
Copy link
Contributor Author

From [email protected] on June 22, 2012 13:50:31

pasting from issue #828 comment 1:

If we get the 32to64 translation working well, here's a strawman proposal for how things could work. When doing mixed mode instrumentation:

  • DR is x64
  • Client is x64
  • 32-bit code is presented as a 32-bit instrlist
  • client can insert x64 code into 32-bit ilist! (drmemory needs this anyway)
  • translation to x64 is transparent, similar to how app cti mangling is transparent today
  • client checks the x86 mode of the first instr if it wants to know the mode

This way, if we can translate everything, we can stay in x64 mode the entire time. The client can insert whatever code it likes, x64 or x86, we'll translate and preserve the semantics.

@derekbruening
Copy link
Contributor Author

From [email protected] on June 22, 2012 15:39:02

one concern w/ (or maybe just addition to) the proposal in comment 3 is that there may be instrs we never support translating and leave 32-bit (e.g., BCD). we may need to communicate to the client then that such a fragment will remain entirely 32-bit and can't accept any 64-bit instru.

so this proposal is to just let the client use r8 - r15 as it sees fit, rather than have some spill slot API extension (or just behind the scenes impl) that maps existing slots to registers or something, which is an alternative but is much less flexible for the client.

@derekbruening
Copy link
Contributor Author

From [email protected] on June 28, 2012 08:23:05

** TODO speed up ibl, in-trace cmp, exit stubs by DR using 64-bit registers

simplest to have static partition registers among DR components and client

translator using r8 any overlap bet ibl and translator? no: translator's uses
of r8 are all local, and ind branch fault should happen before ibl.

mangler uses r9 and r10 s/xcx/r9x/
not worth taking another just for selfmod: though could use r11 ibl uses r8 - r10 . rcx is in r9 , xchg r8 w/ rax for flags, use r10 for other
scratch.

in-trace cmp: ecx + flags => r9 and r10 exit stubs: may as well use r10 (convention coming out as r9 ==xcx, r10 ==xax)

client can then have r11 - r15 rip-rel far ind call will use 3 regs: but no rip-rel in 32-bit

*** TODO have DR's use of x64 regs be optional

probably better for drmem perf to have drmem use r9 - r10 : bigger win to keep
shadow regs or other key data in real regs than to improve DR's ibl. so
have it under runtime option.

@derekbruening
Copy link
Contributor Author

From [email protected] on June 28, 2012 08:25:45

update: in-trace cmp: ecx => r9 (flags only for x64 cmp)

@derekbruening
Copy link
Contributor Author

From [email protected] on March 10, 2014 08:56:02

Owner: ---

@derekbruening
Copy link
Contributor Author

From [email protected] on April 22, 2014 10:10:09

xref WOW64 complications pointed out in issue #979 : "WOW64 layer assumes r12 - r15 are untouched in between syscalls"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant