Skip to content

Commit

Permalink
Replace successive "ldr" and "str" instructions with "ldp" and "stp" (#…
Browse files Browse the repository at this point in the history
…77540)

* Replace successive "ldr" and "str" instructions with "ldp" and "stp"

This change serves to address the following four Github tickets:

    1. ARM64: Optimize pair of "ldr reg, [fp]" to ldp #35130
    2. ARM64: Optimize pair of "ldr reg, [reg]" to ldp #35132
    3. ARM64: Optimize pair of "str reg, [reg]" to stp #35133
    4. ARM64: Optimize pair of "str reg, [fp]" to stp  #35134

A technique was employed that involved detecting an optimisation
opportunity as instruction sequences were being generated.
The optimised instruction was then generated on top of the previous
instruction, with no second instruction generated. Thus, there were no
changes to instruction group size at “emission time” and no changes to
jump instructions.

* No longer use a temporary buffer to build the optimized instruction.

* Addressed assorted review comments.

* Now optimizes ascending locations and decending locations with
consecutive STR and LDR instructions.

* Modification to remove last instructions.

* Ongoing improvements to remove previously-emitted instruction
during ldr / str optimization.

* Stopped optimization of consecutive instructions that straddled an instruction group boundary.

* Addressed code change requests in GitHub.

* Various fixes to ldp/stp optimization

Add code to update IP mappings when an instruction is removed.

* Delete unnecessary and incorrect assert

* Diagnostic change only, to confirm whether a theory is correct or
not when chasing an error.

* Revert "Diagnostic change only, to confirm whether a theory is correct or"

This reverts commit 4b0e51e.

* Do not merge. Temporarily removed calls to
"codeGen->genIPmappingUpdateForRemovedInstruction()".

Also, corrected minor bug in instruction numbering
when removing instructions during optimization.

* Modifications to better update the IP mapping table for a replaced instruction.

* Minor formatting change.

* Check for out of range offsets

* Don't optimise during prolog/epilog

* Fix windows build error

* IGF_HAS_REMOVED_INSTR is ARM64 only

* Add OptimizeLdrStr function

* Fix formatting

* Ensure local variables are tracked

* Don't peephole local variables

Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Alan Hayward <[email protected]>
Co-authored-by: Alan Hayward <[email protected]>
  • Loading branch information
4 people authored Jan 27, 2023
1 parent ff987dc commit acc409e
Show file tree
Hide file tree
Showing 8 changed files with 404 additions and 33 deletions.
2 changes: 1 addition & 1 deletion src/coreclr/jit/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -664,7 +664,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
*/

#ifdef DEBUG
void genIPmappingDisp(unsigned mappingNum, IPmappingDsc* ipMapping);
void genIPmappingDisp(unsigned mappingNum, const IPmappingDsc* ipMapping);
void genIPmappingListDisp();
#endif // DEBUG

Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7042,7 +7042,7 @@ const char* CodeGen::siStackVarName(size_t offs, size_t size, unsigned reg, unsi
* Display a IPmappingDsc. Pass -1 as mappingNum to not display a mapping number.
*/

void CodeGen::genIPmappingDisp(unsigned mappingNum, IPmappingDsc* ipMapping)
void CodeGen::genIPmappingDisp(unsigned mappingNum, const IPmappingDsc* ipMapping)
{
if (mappingNum != unsigned(-1))
{
Expand Down
91 changes: 90 additions & 1 deletion src/coreclr/jit/emit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,22 @@ void emitLocation::CaptureLocation(emitter* emit)
assert(Valid());
}

void emitLocation::SetLocation(insGroup* _ig, unsigned _codePos)
{
ig = _ig;
codePos = _codePos;

assert(Valid());
}

void emitLocation::SetLocation(emitLocation newLocation)
{
ig = newLocation.ig;
codePos = newLocation.codePos;

assert(Valid());
}

bool emitLocation::IsCurrentLocation(emitter* emit) const
{
assert(Valid());
Expand All @@ -50,6 +66,11 @@ int emitLocation::GetInsNum() const
return emitGetInsNumFromCodePos(codePos);
}

int emitLocation::GetInsOffset() const
{
return emitGetInsOfsFromCodePos(codePos);
}

// Get the instruction offset in the current instruction group, which must be a funclet prolog group.
// This is used to find an instruction offset used in unwind data.
// TODO-AMD64-Bug?: We only support a single main function prolog group, but allow for multiple funclet prolog
Expand Down Expand Up @@ -798,6 +819,7 @@ insGroup* emitter::emitSavIG(bool emitAdd)

assert((ig->igFlags & IGF_PLACEHOLDER) == 0);
ig->igData = id;
INDEBUG(ig->igDataSize = gs;)

memcpy(id, emitCurIGfreeBase, sz);

Expand Down Expand Up @@ -8724,6 +8746,14 @@ UNATIVE_OFFSET emitter::emitCodeOffset(void* blockPtr, unsigned codePos)
{
of = ig->igSize;
}
#ifdef TARGET_ARM64
else if ((ig->igFlags & IGF_HAS_REMOVED_INSTR) != 0 && no == ig->igInsCnt + 1U)
{
// This can happen if a instruction was replaced, but the replacement couldn't fit into
// the same IG and instead was place in a new IG.
return ig->igNext->igOffs + emitFindOffset(ig->igNext, 1);
}
#endif
else if (ig->igFlags & IGF_UPD_ISZ)
{
/*
Expand All @@ -8742,7 +8772,6 @@ UNATIVE_OFFSET emitter::emitCodeOffset(void* blockPtr, unsigned codePos)
// printf("[IG=%02u;ID=%03u;OF=%04X] <= %08X\n", ig->igNum, emitGetInsNumFromCodePos(codePos), of, codePos);

/* Make sure the offset estimate is accurate */

assert(of == emitFindOffset(ig, emitGetInsNumFromCodePos(codePos)));
}

Expand Down Expand Up @@ -9198,6 +9227,66 @@ void emitter::emitNxtIG(bool extend)
#endif
}

//------------------------------------------------------------------------
// emitRemoveLastInstruction: Remove the last instruction emitted; it has been optimized away by the
// next instruction we are generating. `emitLastIns` must be non-null, meaning there is a
// previous instruction. The previous instruction might have already been saved, or it might
// be in the currently accumulating insGroup buffer.
//
// The `emitLastIns` is set to nullptr after this function. It is expected that a new instruction
// will be immediately generated after this, which will set it again.
//
// Removing an instruction can invalidate any captured emitter location
// (using emitLocation::CaptureLocation()) after the instruction was generated. This is because the
// emitLocation stores the current IG instruction number and code size. If the instruction is
// removed and not replaced (e.g., it is at the end of the IG, and any replacement creates a new
// EXTEND IG), then the saved instruction number is incorrect. The IGF_HAS_REMOVED_INSTR flag is
// used to check for this later.
//
// NOTE: It is expected that the GC effect of the removed instruction will be handled by the newly
// generated replacement(s).
//
#ifdef TARGET_ARM64
void emitter::emitRemoveLastInstruction()
{
assert(emitLastIns != nullptr);
assert(emitLastInsIG != nullptr);

JITDUMP("Removing saved instruction in %s:\n> ", emitLabelString(emitLastInsIG));
JITDUMPEXEC(dispIns(emitLastIns))

// We should assert it's not a jmp, as that would require updating the jump lists, e.g. emitCurIGjmpList.

BYTE* lastInsActualStartAddr = (BYTE*)emitLastIns - m_debugInfoSize;
unsigned short lastCodeSize = (unsigned short)emitLastIns->idCodeSize();

// Check that a new buffer hasn't been create since the last instruction was emitted.
assert((emitCurIGfreeBase <= lastInsActualStartAddr) && (lastInsActualStartAddr < emitCurIGfreeEndp));

// Ensure the current IG is non-empty.
assert(emitCurIGnonEmpty());
assert(lastInsActualStartAddr < emitCurIGfreeNext);
assert(emitCurIGinsCnt >= 1);
assert(emitCurIGsize >= emitLastIns->idCodeSize());

size_t insSize = emitCurIGfreeNext - lastInsActualStartAddr;

emitCurIGfreeNext = lastInsActualStartAddr;
emitCurIGinsCnt -= 1;
emitInsCount -= 1;
emitCurIGsize -= lastCodeSize;

// We're going to overwrite the memory; zero it.
memset(emitCurIGfreeNext, 0, insSize);

// Remember this happened.
emitCurIG->igFlags |= IGF_HAS_REMOVED_INSTR;

emitLastIns = nullptr;
emitLastInsIG = nullptr;
}
#endif

/*****************************************************************************
*
* emitGetInsSC: Get the instruction's constant value.
Expand Down
32 changes: 28 additions & 4 deletions src/coreclr/jit/emit.h
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,16 @@ class emitLocation
{
}

emitLocation(insGroup* _ig, unsigned _codePos)
{
SetLocation(_ig, _codePos);
}

emitLocation(emitter* emit)
{
CaptureLocation(emit);
}

emitLocation(void* emitCookie) : ig((insGroup*)emitCookie), codePos(0)
{
}
Expand All @@ -142,6 +152,8 @@ class emitLocation
}

void CaptureLocation(emitter* emit);
void SetLocation(insGroup* _ig, unsigned _codePos);
void SetLocation(emitLocation newLocation);

bool IsCurrentLocation(emitter* emit) const;

Expand All @@ -160,6 +172,7 @@ class emitLocation
}

int GetInsNum() const;
int GetInsOffset() const;

bool operator!=(const emitLocation& other) const
{
Expand Down Expand Up @@ -250,6 +263,7 @@ struct insGroup
#ifdef DEBUG
BasicBlock* lastGeneratedBlock; // The last block that generated code into this insGroup.
jitstd::list<BasicBlock*> igBlocks; // All the blocks that generated code into this insGroup.
size_t igDataSize; // size of instrDesc data pointed to by 'igData'
#endif

UNATIVE_OFFSET igNum; // for ordering (and display) purposes
Expand Down Expand Up @@ -280,6 +294,9 @@ struct insGroup
#define IGF_REMOVED_ALIGN 0x0800 // IG was marked as having an alignment instruction(s), but was later unmarked
// without updating the IG's size/offsets.
#define IGF_HAS_REMOVABLE_JMP 0x1000 // this group ends with an unconditional jump which is a candidate for removal
#ifdef TARGET_ARM64
#define IGF_HAS_REMOVED_INSTR 0x2000 // this group has an instruction that was removed.
#endif

// Mask of IGF_* flags that should be propagated to new blocks when they are created.
// This allows prologs and epilogs to be any number of IGs, but still be
Expand Down Expand Up @@ -2170,6 +2187,10 @@ class emitter
insGroup* emitSavIG(bool emitAdd = false);
void emitNxtIG(bool extend = false);

#ifdef TARGET_ARM64
void emitRemoveLastInstruction();
#endif

bool emitCurIGnonEmpty()
{
return (emitCurIG && emitCurIGfreeNext > emitCurIGfreeBase);
Expand Down Expand Up @@ -2823,12 +2844,15 @@ inline unsigned emitGetInsOfsFromCodePos(unsigned codePos)

inline unsigned emitter::emitCurOffset()
{
unsigned codePos = emitCurIGinsCnt + (emitCurIGsize << 16);
return emitSpecifiedOffset(emitCurIGinsCnt, emitCurIGsize);
}

assert(emitGetInsOfsFromCodePos(codePos) == emitCurIGsize);
assert(emitGetInsNumFromCodePos(codePos) == emitCurIGinsCnt);
inline unsigned emitter::emitSpecifiedOffset(unsigned insCount, unsigned igSize)
{
unsigned codePos = insCount + (igSize << 16);

// printf("[IG=%02u;ID=%03u;OF=%04X] => %08X\n", emitCurIG->igNum, emitCurIGinsCnt, emitCurIGsize, codePos);
assert(emitGetInsOfsFromCodePos(codePos) == igSize);
assert(emitGetInsNumFromCodePos(codePos) == insCount);

return codePos;
}
Expand Down
Loading

0 comments on commit acc409e

Please sign in to comment.