Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: better addressing mode for floating point on arm64 #65468

Merged
merged 2 commits into from
Feb 17, 2022

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 16, 2022

Closes #64819

float Test(float[] arr, int i) => arr[i];

codegen diff:

G_M60861_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
G_M60861_IG02:
            ldr     w0, [x1,#8]
            cmp     w2, w0
            bhs     G_M60861_IG04
-           ubfiz   x0, x2, #2, #32
-           add     x0, x0, #16
-           ldr     s0, [x1, x0]
+           add     x0, x1, #16
+           ldr     s0, [x0, w2, UXTW #2]
G_M60861_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
G_M60861_IG04:
            bl      CORINFO_HELP_RNGCHKFAIL
            brk_windows #0
-; Total bytes of code: 48
+; Total bytes of code: 44

Diffs

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 16, 2022
@ghost ghost assigned EgorBo Feb 16, 2022
@ghost
Copy link

ghost commented Feb 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #64819

float Test(float[] arr, int i) => arr[i];

codegen diff:

G_M60861_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
G_M60861_IG02:
            ldr     w0, [x1,#8]
            cmp     w2, w0
            bhs     G_M60861_IG04
-           ubfiz   x0, x2, #2, #32
-           add     x0, x0, #16
-           ldr     s0, [x1, x0]
+           add     x0, x1, #16
+           ldr     s0, [x0, w2, UXTW #2]
G_M60861_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
G_M60861_IG04:
            bl      CORINFO_HELP_RNGCHKFAIL
            brk_windows #0
-; Total bytes of code: 48
+; Total bytes of code: 44
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo changed the title Add addressing mode support for floats JIT: better addressing mode for floating point on arm64 Feb 16, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Feb 16, 2022

@kunalspathak PTAL

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member

Seems there are quite a few regressions. E.g. linux arm64 coreclr_tests. Can you double check?

          52 (3.94 % of base) : 249396.dasm - Benchstone.BenchF.LLoops:Init():this
          48 (6.22 % of base) : 161900.dasm - testout1:Sub_Funclet_303():double
          44 (3.63 % of base) : 237914.dasm - SciMark2.LU:factor(System.Double[][],System.Int32[]):int

@EgorBo
Copy link
Member Author

EgorBo commented Feb 17, 2022

I pushed one more change and it made diffs much better.
Unfortunately, the way addr mode works now on ARM64 will always produce regressions because we have to decide early what we're going to make CSE-friendly:

1) ArrayRef + (ConstDataOffset + Index)
2) (ArrayRef + ConstDataOffset) + Index

(we don't care about it on x64 because the whole thing can be contained)

2nd is usually better for ARM64 but there are cases where it's not, e.g:

static double[] x;
static double[] y;

double foo(int i)
{
    return x[i] + y[i];
}

here we access two different arrays with the same index so we'd better do CSE for (ConstDataOffset + Index)

I propose we merge this as is (because all collections except tests are improved) and then think of a better way to address this problem, most likely it needs a special "Reassociate" phase. or special casing in CSE

non floating-point addressing modes are also affected, I'll file an issue with more details.

@kunalspathak
Copy link
Member

I propose we merge this as is (because all collections except tests are improved)

Sounds good to me. Can you share how does the regression looks like?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 17, 2022

I propose we merge this as is (because all collections except tests are improved)

Sounds good to me. Can you share how does the regression looks like?

Sure, here is both good and bad cases:

static double[] x;
static double[] y;

static double Bad(int i) 
    => x[i] + y[i]; // (ArrayRef + dataConstOffset) can not be CSE'd, only index is

static double Good(int i) 
    => x[i] + x[i + 1]; // (ArrayRef + dataConstOffset) can be CSE'd, index is not

Diff: https://www.diffchecker.com/voeWEctL
We just guess what would be better to give CSE. I don't think we correctly guess in morph at all so this all needs better thinking and a more complex optimization.

Btw, other primitives are also affected

@kunalspathak
Copy link
Member

Btw, other primitives are also affected

You mean with this change other primitives are also affected?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 17, 2022

Btw, other primitives are also affected

You mean with this change other primitives are also affected?

No, I mean other primitives already go the path "prefer CSE for 'ArrayRef + DataConstOffset'" - this PR basically aligns behavior between floats and other primitives

@kunalspathak
Copy link
Member

Makes sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arm64: Better addressing mode for float/double array access
2 participants