RyuJIT: Remove redundant memory barrier for XAdd and XChg on arm #45970

EgorBo · 2020-12-11T17:30:50Z

As far as I understand that memory barrier is not needed when we already emit LDADDAL and SWPAL with "acquire and release" semantics.

UPD: same for CASAL (emitted for Interlocked.CompareExchange)

For this reason I also replaced staddl with ldaddal for the case when we don't need the return value of Interlocked.Add.
Just like LLVM does: https://godbolt.org/z/a9GcT8

Here are the diff examples:

static int XAdd_ret(ref int x, int y) => Interlocked.Add(ref x, y);

G_M16300_IG01:
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M16300_IG02:
        B8E10000          ldaddal w1, w0, [x0]
-       D5033BBF          dmb     ish
        0B010000          add     w0, w0, w1
G_M16300_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr

static void XChg_noret(ref int x, int y) => Interlocked.Exchange(ref x, y);

G_M24897_IG01: 
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M24897_IG02: 
        B8E18000          swpal   w1, w0, [x0]
-       D5033BBF          dmb     ish
G_M24897_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr

/cc @dotnet/jit-contrib

EgorBo · 2020-12-12T16:30:30Z

src/coreclr/jit/codegenarm64.cpp

-                    GetEmitter()->emitIns_R_R_R(INS_ldaddal, dataSize, dataReg, targetReg, addrReg);
-                }
+                GetEmitter()->emitIns_R_R_R(INS_ldaddal, dataSize, dataReg, (targetReg == REG_NA) ? REG_ZR : targetReg,
+                                            addrReg);


NOTE: if targetReg is REG_NA (means we don't care about Interlocked.Add's return value) it uses WZR as a target register (it ignores writes and is always zero)

VSadov · 2020-12-17T23:03:36Z

As far as I understand that memory barrier is not needed when we already emit LDADDAL and SWPAL with "acquire and release"

The trailing dmb is needed after the LL/SC loop to make sure the whole thing acts as a full fence. Otherwise half-fenced stlxr could allow to observe speculative loads which happen while still inside the LL/SC loop, before the store is committed.

With InstructionSet_Atomics there is no loop and full-fence ordering can be explicitly requested, so no need for another fence.

VSadov · 2020-12-17T23:11:10Z

LGTM

sandreenko

LGTM

Remove redundant memory barrier for XAdd and XChg on arm

5ffd823

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 11, 2020

Update codegenarm64.cpp

2ffe566

JulieLeeMSFT added this to the 6.0.0 milestone Dec 11, 2020

JulieLeeMSFT assigned EgorBo Dec 11, 2020

Same for casal

8f797f7

EgorBo commented Dec 12, 2020

View reviewed changes

EgorBo mentioned this pull request Dec 20, 2020

[RyuJIT] Implement Interlocked.And and Interlocked.Or for arm64-v8.1 #46253

Merged

sandreenko approved these changes Dec 23, 2020

View reviewed changes

sandreenko merged commit 7b0d89b into dotnet:master Dec 28, 2020

ghost locked as resolved and limited conversation to collaborators Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT: Remove redundant memory barrier for XAdd and XChg on arm #45970

RyuJIT: Remove redundant memory barrier for XAdd and XChg on arm #45970

EgorBo commented Dec 11, 2020 •

edited

Loading

EgorBo Dec 12, 2020

VSadov commented Dec 17, 2020

VSadov commented Dec 17, 2020

sandreenko left a comment

RyuJIT: Remove redundant memory barrier for XAdd and XChg on arm #45970

RyuJIT: Remove redundant memory barrier for XAdd and XChg on arm #45970

Conversation

EgorBo commented Dec 11, 2020 • edited Loading

EgorBo Dec 12, 2020

Choose a reason for hiding this comment

VSadov commented Dec 17, 2020

VSadov commented Dec 17, 2020

sandreenko left a comment

Choose a reason for hiding this comment

EgorBo commented Dec 11, 2020 •

edited

Loading