Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][CodeGen][CodeSize] Redundant 'and' can be remove with shifts in addr mode #34101

Open
llvmbot opened this issue Sep 27, 2017 · 7 comments
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla confirmed Verified by a second party good first issue https://github.com/llvm/llvm-project/contribute missed-optimization

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 27, 2017

Bugzilla Link 34753
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor

Extended Description

Test case:

int test(unsigned long long a, unsigned long long b, int *table) {    
  return table[(a * b) >> 58];  
}

Current assembly:

test:                                   // @test
// BB#0:                                // %entry
        mul x8, x1, x0
        lsr x8, x8, #56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret

We should be able to remove the 'and' for a code size with as follows:

test:                                   // @test
// BB#0:                                // %entry
        mul x8, x1, x0
        lsr x8, x8, #58
        ldr w0, [x2, x8, lsl#2]
        ret

I'm not interested in pursuing this optimization, but I figured I'd file the bug, regardless.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 28, 2017

In DAGCombiner.cpp around line 5590 we have this target-independent combine which is folding the 'srl' and introducing the 'and'.

// fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK) or
//                               (and (srl x, (sub c1, c2), MASK)
// Only fold this if the inner shift has no other uses -- if it does, folding
// this will increase the total number of instructions.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 28, 2017

Here's the test in IR:

define i32 @test(i64 %a, i64 %b, i32* nocapture readonly %table) {
entry:
  %mul = mul i64 %b, %a
  %shr = lshr i64 %mul, 58
  %arrayidx = getelementptr inbounds i32, i32* %table, i64 %shr
  %0 = load i32, i32* %arrayidx, align 4
  ret i32 %0
}

And the initial DAG:

Initial selection DAG: BB#0 'test:entry'
SelectionDAG has 19 nodes:
  t0: ch = EntryToken
  t13: i64 = Constant<0>
        t6: i64,ch = CopyFromReg t0, Register:i64 %vreg2
              t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
              t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
            t7: i64 = mul t4, t2
          t9: i64 = srl t7, Constant:i64<58>
        t11: i64 = shl t9, Constant:i64<2>
      t12: i64 = add t6, t11
    t15: i32,ch = load<LD4[%arrayidx]> t0, t12, undef:i64
  t17: ch,glue = CopyToReg t0, Register:i32 %W0, t15
  t18: ch = AArch64ISD::RET_FLAG t17, Register:i32 %W0, t17:1

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 13, 2017

I'm not sure how to implement this. This assembly

test:                                   // @test
// BB#0:                                // %entry
        mul x8, x1, x0
        lsr x8, x8, #&#8203;58
        ldr w0, [x2, x8, lsl#2]
        ret

I get when I skip fold (shl (srl x, c1), c2) -> (and (srl x, (sub c1, c2), MASK).
I'm not sure what opt-addr-mode means. As I understood I need to add an if statement with canFoldInAddressingMode before fold to check add mode. Please, correct me.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
@ilinpv ilinpv added the good first issue https://github.com/llvm/llvm-project/contribute label Mar 30, 2022
@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 30, 2022

@llvm/issue-subscribers-good-first-issue

@varunkumare99
Copy link
Contributor

is this still open to work on ?

@RKSimon
Copy link
Collaborator

RKSimon commented May 30, 2023

Still seems to be a problem in trunk: https://simd.godbolt.org/z/595f991fP

@ParkHanbum
Copy link
Contributor

can this issue be resolved by disabling optimization fold (shl (srl x, c1), c2) -> (and (shl x, (sub c2, c1), MASK) if code size-related option enabled?

ParkHanbum added a commit to ParkHanbum/llvm-project that referenced this issue Apr 21, 2024
Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, llvm#56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL llvm#2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, llvm#58
        ldr w0, [x2, x8, lsl llvm#2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
ParkHanbum added a commit to ParkHanbum/llvm-project that referenced this issue Apr 21, 2024
Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, llvm#56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL llvm#2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, llvm#58
        ldr w0, [x2, x8, lsl llvm#2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
davemgreen pushed a commit that referenced this issue Aug 24, 2024
Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, #56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL #2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, #58
        ldr w0, [x2, x8, lsl #2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
dmpolukhin pushed a commit to dmpolukhin/llvm-project that referenced this issue Sep 2, 2024
)

Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, llvm#56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL llvm#2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, llvm#58
        ldr w0, [x2, x8, lsl llvm#2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this issue Sep 30, 2024
…1863d2644

Local branch amd-gfx 08f1863 Merged main:6f618a7b8249e7baa3b2d18f8bbec3c5b6f6d24e into amd-gfx:e5edfda5900b
Remote branch main 77fccb3 [AArch64] Replace AND with LSL#2 for LDR target (llvm#34101) (llvm#89531)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla confirmed Verified by a second party good first issue https://github.com/llvm/llvm-project/contribute missed-optimization
Projects
None yet
Development

No branches or pull requests

7 participants