Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: Remove redundant memory barrier for XAdd and XChg on arm #45970

Merged
merged 3 commits into from
Dec 28, 2020

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Dec 11, 2020

As far as I understand that memory barrier is not needed when we already emit LDADDAL and SWPAL with "acquire and release" semantics.

UPD: same for CASAL (emitted for Interlocked.CompareExchange)

For this reason I also replaced staddl with ldaddal for the case when we don't need the return value of Interlocked.Add.
Just like LLVM does: https://godbolt.org/z/a9GcT8

Here are the diff examples:

static int XAdd_ret(ref int x, int y) => Interlocked.Add(ref x, y);
G_M16300_IG01:
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M16300_IG02:
        B8E10000          ldaddal w1, w0, [x0]
-       D5033BBF          dmb     ish
        0B010000          add     w0, w0, w1
G_M16300_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr


static void XChg_noret(ref int x, int y) => Interlocked.Exchange(ref x, y);
G_M24897_IG01: 
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M24897_IG02: 
        B8E18000          swpal   w1, w0, [x0]
-       D5033BBF          dmb     ish
G_M24897_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr

/cc @dotnet/jit-contrib

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 11, 2020
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Dec 11, 2020
GetEmitter()->emitIns_R_R_R(INS_ldaddal, dataSize, dataReg, targetReg, addrReg);
}
GetEmitter()->emitIns_R_R_R(INS_ldaddal, dataSize, dataReg, (targetReg == REG_NA) ? REG_ZR : targetReg,
addrReg);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: if targetReg is REG_NA (means we don't care about Interlocked.Add's return value) it uses WZR as a target register (it ignores writes and is always zero)

@VSadov
Copy link
Member

VSadov commented Dec 17, 2020

As far as I understand that memory barrier is not needed when we already emit LDADDAL and SWPAL with "acquire and release"

The trailing dmb is needed after the LL/SC loop to make sure the whole thing acts as a full fence. Otherwise half-fenced stlxr could allow to observe speculative loads which happen while still inside the LL/SC loop, before the store is committed.

With InstructionSet_Atomics there is no loop and full-fence ordering can be explicitly requested, so no need for another fence.

@VSadov
Copy link
Member

VSadov commented Dec 17, 2020

LGTM

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sandreenko sandreenko merged commit 7b0d89b into dotnet:master Dec 28, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants