Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT's loop cloning optimization has questionable CQ #4929

Closed
RussKeldorph opened this issue Jan 13, 2016 · 5 comments
Closed

RyuJIT's loop cloning optimization has questionable CQ #4929

RussKeldorph opened this issue Jan 13, 2016 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@RussKeldorph
Copy link
Contributor

See CQ comments at the top of #4922.

category:cq
theme:loop-opt
skill-level:expert
cost:large

@mikedn
Copy link
Contributor

mikedn commented Sep 16, 2017

Reposting the original example (with updated assembly) to make this easier to read.

static void Copy(int[] src, int[] dst, int length)
{
    for (int i = 0; i < length; i++)
        dst[i] = src[i];
}

generates

G_M1752_IG01:
       4883EC28             sub      rsp, 40
G_M1752_IG02:
       33C0                 xor      eax, eax

; check if length is > 0, ok
       4585C0               test     r8d, r8d
       7E7A                 jle      SHORT G_M1752_IG06
; check if src and dst are non-null
; not needed because we already know that at least one loop iteration will be executed
; update: of course, this is not need in this particular example. In the general case these
; null checks are needed and it may be difficult to get rid of them.
       4885C9               test     rcx, rcx
       410F95C1             setne    r9b
       450FB6C9             movzx    r9, r9b
       4885D2               test     rdx, rdx
       410F95C2             setne    r10b
       450FB6D2             movzx    r10, r10b
       4585CA               test     r9d, r10d
       7441                 je       SHORT G_M1752_IG05

G_M1752_IG03:
; hoisted bound check, ok
       44394108             cmp      dword ptr [rcx+8], r8d
       410F9DC1             setge    r9b
       450FB6C9             movzx    r9, r9b
; check if length is >= 0, dubious
; the length was already checked at the start of the method
; previously this was `test r8, r8` because loop cloning tends to mix up integer types
; a change done in LowerCompare hides this issue
       4585C0               test     r8d, r8d
       410F9DC2             setge    r10b
       450FB6D2             movzx    r10, r10b
       4523CA               and      r9d, r10d
; hoisted bound check, ok
       44394208             cmp      dword ptr [rdx+8], r8d
       410F9DC2             setge    r10b
       450FB6D2             movzx    r10, r10b
       4585CA               test     r9d, r10d
       7416                 je       SHORT G_M1752_IG05

; finally, the loop without range checks
G_M1752_IG04:
       4C63C8               movsxd   r9, eax
       468B548910           mov      r10d, dword ptr [rcx+4*r9+16]
       4689548A10           mov      dword ptr [rdx+4*r9+16], r10d
       FFC0                 inc      eax
       413BC0               cmp      eax, r8d
       7CEC                 jl       SHORT G_M1752_IG04
; it's unfortunate that the loop that is normally executed is the one
; requiring a jmp, IG04 and IG05 should be reversed and IG05
; should probably be "run rarely"
       EB1E                 jmp      SHORT G_M1752_IG06

; and the loop with range checks
G_M1752_IG05:
       3B4108               cmp      eax, dword ptr [rcx+8]
       731E                 jae      SHORT G_M1752_IG07
       4C63C8               movsxd   r9, eax
       468B548910           mov      r10d, dword ptr [rcx+4*r9+16]
       3B4208               cmp      eax, dword ptr [rdx+8]
       7311                 jae      SHORT G_M1752_IG07
       4689548A10           mov      dword ptr [rdx+4*r9+16], r10d
       FFC0                 inc      eax
       413BC0               cmp      eax, r8d
       7CE2                 jl       SHORT G_M1752_IG05

G_M1752_IG06:
       4883C428             add      rsp, 40
       C3                   ret

G_M1752_IG07:
       E8D1FD395F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

@mikedn
Copy link
Contributor

mikedn commented Sep 16, 2017

A few more things I noticed since creating (or didn't mention in) the original issue:

  • It's not visible in the assembly code now but loop cloning produces compares with mixed operand types (e.g. RELOP(int, long)). This requires special casing in LowerCompare.
  • Loop selection code doesn't use short circuit evaluation - length < src.Length & length < dst.Length instead of length < src.Length && length < dst.Length. I wonder if that's a good thing. If it is then maybe we should check if those movzx instructions can be avoided, they're not needed in this case.
  • Loop cloning doesn't handle byte arrays. Why?!
  • Loop cloning handles cases where range checks are completely unnecessary and removes them without cloning the loop. That's a good thing but this means that if we ever consider disabling loop cloning (or moving it later in the phase list) for whatever reasons we need to be careful that doing so will push more work down to assertion prop and range check phases.

@mikedn
Copy link
Contributor

mikedn commented Sep 16, 2017

cc @AndyAyersMS You asked about the loop cloning issue, I added a few notes.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall modified the milestones: Future, 6.0.0 Oct 30, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Oct 30, 2020
@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021
@JulieLeeMSFT JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jun 7, 2021
@BruceForstall BruceForstall modified the milestones: 6.0.0, 7.0.0 Jul 6, 2021
@AndyAyersMS
Copy link
Member

@BruceForstall should this be moved to future? Or are the issues above mostly handled now?

@BruceForstall
Copy link
Member

They're mostly handled, so I'll go ahead and close this.

@ghost ghost locked as resolved and limited conversation to collaborators May 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants