Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT][arm64] recognize Interlocked.Increment in volatile increment #60021

Closed
EgorBo opened this issue Oct 5, 2021 · 3 comments · Fixed by #60219
Closed

[JIT][arm64] recognize Interlocked.Increment in volatile increment #60021

EgorBo opened this issue Oct 5, 2021 · 3 comments · Fixed by #60219
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Oct 5, 2021

volatile int a;

void Test() => a++;

Currently emits two dmb:

; Method Program:Test():this
G_M27469_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M27469_IG02:              ;; offset=0008H
        B9400801          ldr     w1, [x0,#8]
        D50339BF          dmb     ishld
        52800022          mov     w2, #1
        0B020021          add     w1, w1, w2
        D5033BBF          dmb     ish
        B9000801          str     w1, [x0,#8]
						;; bbWeight=1    PerfScore 25.00

G_M27469_IG03:              ;; offset=0020H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 40

I guess we can use armv8.1 LDADDAL (acquire and release) here? to get:

; Method Program:Test2():this
G_M27469_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M27469_IG02:              ;; offset=0008H
        B940001F          ldr     wzr, [x0]
        D2800101          mov     x1, #8
        8B010000          add     x0, x0, x1
        52800021          mov     w1, #1
        B8E10000          ldaddal w1, w0, [x0]
						;; bbWeight=1    PerfScore 7.50

G_M27469_IG03:              ;; offset=001CH
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 36

Basically replace

[000006] -A-XGO------              *  ASG       int   
[000005] V--XGO-N----              +--*  FIELD     int    a
[000000] ------------              |  \--*  LCL_VAR   ref    V00 this         
[000004] ---XGO------              \--*  ADD       int   
[000002] V--XGO-N----                 +--*  FIELD     int    a
[000001] ------------                 |  \--*  LCL_VAR   ref    V00 this         
[000003] ------------                 \--*  CNS_INT   int    1

with GT_XADD(GT_FIELD)

am I correct? @dotnet/jit-contrib

@EgorBo EgorBo added the tenet-performance Performance related issue label Oct 5, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Oct 5, 2021
@ghost
Copy link

ghost commented Oct 5, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details
volatile int a;

void Test() => a++;

Currently emits:

; Method Program:Test():this
G_M27469_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M27469_IG02:              ;; offset=0008H
        B9400801          ldr     w1, [x0,#8]
        D50339BF          dmb     ishld
        52800022          mov     w2, #1
        0B020021          add     w1, w1, w2
        D5033BBF          dmb     ish
        B9000801          str     w1, [x0,#8]
						;; bbWeight=1    PerfScore 25.00

G_M27469_IG03:              ;; offset=0020H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 40

I guess we can use armv8.1 LDADDAL (acquire and release) here? to get:

; Method Program:Test2():this
G_M27469_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M27469_IG02:              ;; offset=0008H
        B940001F          ldr     wzr, [x0]
        D2800101          mov     x1, #8
        8B010000          add     x0, x0, x1
        52800021          mov     w1, #1
        B8E10000          ldaddal w1, w0, [x0]
						;; bbWeight=1    PerfScore 7.50

G_M27469_IG03:              ;; offset=001CH
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 36

Basically replace

[000006] -A-XGO------              *  ASG       int   
[000005] V--XGO-N----              +--*  FIELD     int    a
[000000] ------------              |  \--*  LCL_VAR   ref    V00 this         
[000004] ---XGO------              \--*  ADD       int   
[000002] V--XGO-N----                 +--*  FIELD     int    a
[000001] ------------                 |  \--*  LCL_VAR   ref    V00 this         
[000003] ------------                 \--*  CNS_INT   int    1

with GT_XADD(GT_FIELD)

am I correct? @dotnet/jit-contrib

Author: EgorBo
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added this to the 7.0.0 milestone Oct 5, 2021
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Oct 5, 2021
@EgorBo
Copy link
Member Author

EgorBo commented Oct 5, 2021

NOTE: we have >1000 of volatile fields across the base class libs, perhaps some of the access patterns also can be optimized with Atomics

@echesakov
Copy link
Contributor

I am wondering whether in the baseline case (Armv8.0) it should be sufficient to have one memory barrier inserted after the load?

ldr     w1, [x0,#8]
dmb     ishld
mov     w2, #1
add     w1, w1, w2
str     w1, [x0,#8]

Also interesting why we are not using add w1, w1, #1 form of the instruction.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 9, 2021
@EgorBo EgorBo self-assigned this Oct 9, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 13, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants