Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Optimize redundant sign extensions in indexers #52414

Closed
wants to merge 5 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 6, 2021

Just a quick fix for the issue noticed by @GrabYourPitchforks - we emit redundant sign-extension movs for indexers, e.g.:

int Test(int[] a, int i) => a[i];

Codegen:

G_M40406_IG01:
       sub      rsp, 40
G_M40406_IG02:
       cmp      r8d, dword ptr [rdx+8]
       jae      SHORT G_M40406_IG04
       movsxd   rax, r8d               ;;  <---- 
       mov      eax, dword ptr [rdx+4*rax+16]
G_M40406_IG03:
       add      rsp, 40
       ret      
G_M40406_IG04:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
; Total bytes of code: 28

New codegen: https://www.diffchecker.com/FukMpulc (mov is still there 🤔 but at least without the sign extension)

asm.benchmarks.run.windows.x64.checked

Total bytes of base: 2045371
Total bytes of diff: 2037944
Total bytes of delta: -7427 (-0.36% of base)
    diff is an improvement.
asm.libraries.crossgen.windows.x64.checked.1

Total bytes of base: 4987213
Total bytes of diff: 4970149
Total bytes of delta: -17064 (-0.34% of base)
    diff is an improvement.
asm.libraries.crossgen2.windows.x64.checked.1

Total bytes of base: 5527650
Total bytes of diff: 5510132
Total bytes of delta: -17518 (-0.32% of base)
    diff is an improvement.
asm.libraries.pmi.windows.x64.checked

Total bytes of base: 8099472
Total bytes of diff: 8076977
Total bytes of delta: -22495 (-0.28% of base)
    diff is an improvement.
asm.tests.pmi.windows.x64.checked

Total bytes of base: 2401541
Total bytes of diff: 2393379
Total bytes of delta: -8162 (-0.34% of base)
    diff is an improvement.
asm.tests_libraries.pmi.windows.x64.checked

Total bytes of base: 9775982
Total bytes of diff: 9748266
Total bytes of delta: -27716 (-0.28% of base)
    diff is an improvement.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 6, 2021
@GrabYourPitchforks
Copy link
Member

I ran an experiment a while back that involved touching a few more files, since I was trying to get array and span and string indexers all in one shot. GrabYourPitchforks/coreclr@d14b977

Though TBH I didn't have a clue what I was doing and it's entirely possible that prototype code is hot garbage. :)

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix #12218 and contribute to #12402?

Just to be sure: it will work for arrays and spans?

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented May 9, 2021

This should fix #12218 and contribute to #12402?

Partially, I guess a separate PR somewhere in RA/Emitter should also get rid of the mov (see https://www.diffchecker.com/FukMpulc)

Just to be sure: it will work for arrays and spans?

With @GrabYourPitchforks's patch yes!

@GrabYourPitchforks do you mind if I include your patch? it handles Spans 🙂

@AndyAyersMS
Copy link
Member

Typically one wants to avoid introducing casts into expressions like these as it complicates secondary IV analysis (which we don't yet do, but someday soon). I wonder if we need some other kind of cast node here or some extra GTF flag that makes it clear this is known to be a cast that does not alter the value.

@BruceForstall thoughts?

@EgorBo
Copy link
Member Author

EgorBo commented May 10, 2021

I wonder if we need some other kind of cast node

Wonder if we can split GenTreeCast into let's say GenTreeZeroExt, GenTreeSignExt and, perhaps, separate types for FP <-> integers.

@BruceForstall
Copy link
Member

I forget: do all platform ABIs sign-/zero-extend arg registers, so the full register value is readable? If not, in the example here you'd still need a zero extend.

Wonder if we can split GenTreeCast into let's say GenTreeZeroExt, GenTreeSignExt and, perhaps, separate types for FP <-> integers.

Doesn't the current Cast node have the right info to determine this? What would adding new node types improve?

@BruceForstall thoughts?

I haven't thought about how IV widening would work, but it doesn't seem like we need to inhibit this opt. if we don't have a plan there yet.

Seems like this zero extend case could also possibly be handled by some kind of bitwise zero forward propagation.

If the index wasn't from an argument, would it be different?

@EgorBo
Copy link
Member Author

EgorBo commented Jun 14, 2021

Ping myself

@EgorBo
Copy link
Member Author

EgorBo commented Jun 28, 2021

Will revise this week

@EgorBo
Copy link
Member Author

EgorBo commented Sep 6, 2021

Closing for now as it should be done in a more general way like in assertprop or something like that, to e.g. handle this: sharplab

@EgorBo EgorBo closed this Sep 6, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Oct 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants