Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM64/Linux] Inefficient conditionals branching #12735

Open
Tracked by #64820 ...
TamarChristinaArm opened this issue May 22, 2019 · 6 comments
Open
Tracked by #64820 ...

[ARM64/Linux] Inefficient conditionals branching #12735

TamarChristinaArm opened this issue May 22, 2019 · 6 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage
Milestone

Comments

@TamarChristinaArm
Copy link
Contributor

The following

static int[] Test(int[] a, int n)
{
  int y;
  if (n == 3)
    y = n;
  else
    y = n + 1;
  a[2] = y;
  return a;
}

produces the following conditional (with QuickJIT off)

G_M32566_IG02:
        71000C5F          cmp     w2, dotnet/coreclr#3
        54000061          bne     G_M32566_IG03
        52800061          mov     w1, dotnet/coreclr#3
        14000002          b       G_M32566_IG04

so for both the if and else case it branches, it has no fall through case that doesn't
require the branch.

but also it has generated an extra mov here because it knows n is 3. However change the
condition to

static int[] Test(int[] a, int n)
{
  int y;
  if (n > 3)
    y = n;
  else
    y = n + 1;
  a[2] = y;
  return a;
}

and it now just re-uses the register containing n since it's dead after the if anyway

G_M32566_IG02:
        71000C5F          cmp     w2, dotnet/coreclr#3
        5400004D          ble     G_M32566_IG03
        14000002          b       G_M32566_IG04

G_M32566_IG03:
        11000442          add     w2, w2, dotnet/coreclr#1

G_M32566_IG04:
        52800041          mov     w1, dotnet/coreclr#2
        B9800803          ldrsw   x3, [x0,#8]
        6B03003F          cmp     w1, w3
        54000082          bhs     G_M32566_IG06
        B9001802          str     w2, [x0,#24]

So not quite sure why the second case generates smaller code.

But again the conditional can just be re-arranged to

G_M32566_IG02:
        71000C5F          cmp     w2, dotnet/coreclr#3
        14000002          bgt     G_M32566_IG04

and remove a branch.

/CC @CarolEidt @tannergooding

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

@mikedn
Copy link
Contributor

mikedn commented May 22, 2019

Not ARM64 specific, x64 generates:

G_M55886_IG01:
       4883EC28             sub      rsp, 40
       90                   nop
G_M55886_IG02:
       83FA03               cmp      edx, 3
       7507                 jne      SHORT G_M55886_IG03
       B803000000           mov      eax, 3
       EB03                 jmp      SHORT G_M55886_IG04
G_M55886_IG03:
       8D4201               lea      eax, [rdx+1]
G_M55886_IG04:
       83790802             cmp      dword ptr [rcx+8], 2
       760B                 jbe      SHORT G_M55886_IG06
       894118               mov      dword ptr [rcx+24], eax
       488BC1               mov      rax, rcx
G_M55886_IG05:
       4883C428             add      rsp, 40
       C3                   ret
G_M55886_IG06:
       E8169E385F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

Here assertion propagation isn't very smart and substitutes n with 3 even if it's not beneficial in this case. And I can only guess that it wouldn't be that easy for LSRA to recover from this.

So not quite sure why the second case generates smaller code.

Because of the n > 3 condition that prevents assertion propagation to conclude that n is 3 in that block.

But again the conditional can just be re-arranged to

Yes but it's a bit complicated because this isn't obvious in the IR until after register allocation when you'd need to figure out that the store is useless because both variables have been assigned the same register and that the block is empty otherwise.

@TamarChristinaArm
Copy link
Contributor Author

Yes but it's a bit complicated because this isn't obvious in the IR until after register allocation when you'd need to figure out that the store is useless because both variables have been assigned the same register and that the block is empty otherwise.

But the block doesn't have to be empty to remove the unconditional branch. The unconditional branch is always the last statement in the block, so if the edges were re-arranged such that the block you unconditionally branch to is the next BB after the one with the conditional you could just call through, or alternatively re-order the branch conditions, then you don't have to change the edges.

Unless i'm missing something can't both of these be done before regalloc just purely from the control flow of the IR?

@mikedn
Copy link
Contributor

mikedn commented May 24, 2019

I'm not sure I understand. If neither block is empty then the only flow control transform that you can do here is branch inversion and that doesn't help. Perhaps you have in mind an additional IR transform?

int y;
if (n == 3)
    y = n;
else
    y = n + 1;

could become

int y = n;
if (n != 3)
    y = y + 1;

That would work but yes, the JIT doesn't do anything like this today. Though I happen to have an if conversion experiment that could be extended to handle this case. Maybe one day...

There's a rather similar issue I opened a long time ago - #4326.

@TamarChristinaArm
Copy link
Contributor Author

I'm not sure I understand. If neither block is empty then the only flow control transform that you can do here is branch inversion and that doesn't help.

No that's correct, In my original post I had confused the cases where it generated the mov w1, 3. I mixed the two issues of why it had generated that mov and why in the second case the additional branch was needed.

You're right that if-conversion would certainly give the best code in such cases.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@a74nh
Copy link
Contributor

a74nh commented Dec 6, 2022

With the latest HEAD, the code gets if converted, but the extra mov of 3 still exists:

  if (n == 3)
    y = n;
  else
    y = n + 1;

IN0001: 000008                    add     w2, w1, #1
IN0002: 00000C                    mov     w3, #3
IN0003: 000010                    cmp     w1, #3
IN0004: 000014                    csel    w1, w2, w3, ne
  if (n > 3)
    y = n;
  else
    y = n + 1;

IN0001: 000008                    add     w2, w1, #1
IN0002: 00000C                    cmp     w1, #3
IN0003: 000010                    csel    w1, w2, w1, le

Assertion propagation happens before the if conversion, so it makes sense that this still happens:

Because of the n > 3 condition that prevents assertion propagation to conclude that n is 3 in that block.

@TIHan
Copy link
Contributor

TIHan commented Apr 18, 2023

I can confirm the extra mov w3, #3 still occurs.

I did some hacking and got the IfConversion phase to run earlier and indeed the mov w3, #3 disappears. To get the phase to run earlier I had to disable a lot of asserts just so I could see the results - so it's not viable to do at the moment.

Unfortunately, without introducing a new concept or doing IfConversion before AssertionProp, there isn't a safe way to do this.

Also, this pattern:

  if (n == 3) // including other relops
    y = n;
  else
    y = n + 1;

Can be reduced to:

cmp     w0, #3
cinc    w0, w0, <cond>

Related: #82031

@kunalspathak kunalspathak assigned a74nh and unassigned TIHan Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage
Projects
None yet
Development

No branches or pull requests

6 participants