-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#1569 AArch64: Implement trace optimisation. #2442
Conversation
The optimisation is not yet enabled by default, though some programs run faster, and correctly, with "-enable_traces". This is based on work done by Kevin Zhou in July 2016. Change-Id: I6ab3bf0a2f48b25002f1586ff1b85e6232b4357d
The Travis failure is i#2003. |
@@ -107,6 +107,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f, | |||
/* FIXME i#1575: coarse-grain NYI on ARM */ | |||
ASSERT_NOT_IMPLEMENTED(!TEST(FRAG_COARSE_GRAIN, f->flags)); | |||
if (LINKSTUB_DIRECT(l_flags)) { | |||
/* We put a NOP here for future linking. */ | |||
*pc++ = 0xd503201f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a named constant
@@ -119,6 +121,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f, | |||
/* br x1 */ | |||
*pc++ = 0xd61f0000 | 1 << 5; | |||
} else { | |||
/* We put a NOP here for trace building. */ | |||
*pc++ = 0xd503201f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a named constant
*(uint *)stub_pc = (0xa9000000 | 0 | 1 << 10 | (dr_reg_stolen - DR_REG_X0) << 5 | | ||
TLS_REG0_SLOT >> 3 << 15); | ||
/* Restore the NOP instruction. */ | ||
*(uint *)stub_pc = 0xd503201f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a named constant
@@ -107,6 +107,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f, | |||
/* FIXME i#1575: coarse-grain NYI on ARM */ | |||
ASSERT_NOT_IMPLEMENTED(!TEST(FRAG_COARSE_GRAIN, f->flags)); | |||
if (LINKSTUB_DIRECT(l_flags)) { | |||
/* We put a NOP here for future linking. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But linking was already working fine, so it is unclear why we'd need a nop now. Please elaborate in the comment -- maybe best in the pre-function comment. Putting an asm listing is also helpful (esp with this stub getting very large: 8 instructions now). This is unusual.
@@ -119,6 +121,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f, | |||
/* br x1 */ | |||
*pc++ = 0xd61f0000 | 1 << 5; | |||
} else { | |||
/* We put a NOP here for trace building. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does trace building need a nop? I think there's some code missing from this diff (e.g., fixup_indirect_trace_exit): I can't get a clear picture of what the trace building strategy is on A64 vs x86 and why a nop would help.
(dr_reg_stolen - DR_REG_X0) << 5 | | ||
ENTRY_PC_SPILL_SLOT >> 3 << 10); | ||
pc += 4; | ||
f->prefix_size = (byte)((cache_pc)pc - f->start_pc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks identical to the else code below which looks identical to the original code: so the original was sufficient and there's nothing to change here, right?
#ifdef AARCH64 | ||
/* For AArch64, there is no need to save the flags | ||
* so we always have the same ibt prefix. */ | ||
return fragment_ibt_prefix_size(flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this code needs to be here: use_ibt_prefix should be true always for AARCH64, right? So this routine was already returning fragment_ibt_prefix_size(flags). Cleaner to not have an ifdef.
@@ -295,7 +295,7 @@ typedef enum _dr_pred_type_t { | |||
#define PREFIX_PRED_BITS 5 | |||
#define PREFIX_PRED_BITPOS (32 - PREFIX_PRED_BITS) | |||
#define PREFIX_PRED_MASK \ | |||
(((1 << PREFIX_PRED_BITS)-1) << PREFIX_PRED_BITPOS) /*0xf8000000 */ | |||
((((uint)1 << PREFIX_PRED_BITS)-1) << PREFIX_PRED_BITPOS) /*0xf8000000 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just 1U
@@ -705,7 +705,8 @@ instr_get_predicate(instr_t *instr) | |||
instr_t * | |||
instr_set_predicate(instr_t *instr, dr_pred_type_t pred) | |||
{ | |||
instr->prefixes |= ((pred << PREFIX_PRED_BITPOS) & PREFIX_PRED_MASK); | |||
instr->prefixes = ((instr->prefixes & ~PREFIX_PRED_MASK ) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks incorrect: I think there are other bits in prefixes besides predicates, which this routine should not clobber
@@ -1261,6 +1261,9 @@ end_and_emit_trace(dcontext_t *dcontext, fragment_t *cur_f) | |||
md->emitted_size -= local_exit_stub_size(dcontext, target, md->trace_flags); | |||
} | |||
|
|||
/* Fixup all indirect trace exits to comform ibl routine */ | |||
IF_AARCH64(md->emitted_size += fixup_indirect_trace_exit(dcontext, trace)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is missing from this diff: this function does not exist.
Also, while I don't know precisely what it does, are you sure this shouldn't be done during incremental trace building as each exit is added, rather than at the end.
run aarch64 tests |
…part of support This patch incorporated changes from patch #2442, and fixed some corner cases where the assumption was incorrect and caused the program to crash. Trace support is not yet enabled by default, but can be enabled with "-enable_traces". A large part of this work was based on work done by Kevin Zhou. Please see branch i1569-trace and patch #2442 for the original version. Issues: #1569, #2974
…part of support This patch incorporated changes from patch #2442, and fixed some corner cases where the assumption was incorrect and caused the program to crash. Trace support is not yet enabled by default, but can be enabled with "-enable_traces". A large part of this work was based on work done by Kevin Zhou. Please see branch i1569-trace and patch #2442 for the original version. Issues: #1569, #2974
…5045) This patch incorporated changes from PR #2442 that implemented the initial version of trace support for AArch64. This patch also fixed some corner cases not considered in PR #2442 where the assumption was incorrect and caused the program to crash. Trace support is not yet enabled by default, but can be enabled with "-enable_traces". This commit introduces internal control flow by adding a trace_exit_label in fixup_indirect_trace_exit, which might break code that assumes linear control flow (such as translate.c). Either special support is needed for this trace_exit_label or alternative schemes should be used that has a linear control. Some complexities in this commit can be removed once we have #5062 implemented and decode_fragment eliminated. Co-authored-by: Kevin Zhou <[email protected]> Issues: #1569, #2974
An improved version of this has already been merged by #5045. |
The optimisation is not yet enabled by default, though some programs
run faster, and correctly, with "-enable_traces".
This is based on work done by Kevin Zhou in July 2016.
Change-Id: I6ab3bf0a2f48b25002f1586ff1b85e6232b4357d