Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable constant CSE for ARM #100054

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

BruceForstall
Copy link
Member

Also, extract out the code that determines if constant CSE or shared constant CSE are enabled, and rewrite it to be easier to understand.

Also, extract out the code that determines if constant CSE or
shared constant CSE are enabled, and rewrite it to be easier to understand.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 21, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@BruceForstall
Copy link
Member Author

Diffs

Only diffs on arm32, as expected. Overall PerfScore wins, and large code size improvement, with both regressions and improvements. TP cost (regression) somewhat large on some collections.

@BruceForstall
Copy link
Member Author

Failures are all known issues.

@BruceForstall
Copy link
Member Author

@EgorBo @AndyAyersMS PTAL
cc @dotnet/jit-contrib

This seems like a win. Should we enable it?

I was curious what the result would be because I was confused to see that shared constant CSE was enabled for arm32 (#70580), but not non-shared constant CSE.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding that CSE for constants is a bit less useful on ARM32 because every 32bit constant can be done via not more than 2 instructions (and it's up to 4 instructions on ARM64). The diffs seem to be dominated by coreclr_tests.run collection

@tannergooding
Copy link
Member

tannergooding commented Mar 21, 2024

because every 32bit constant can be done via not more than 2 instructions (and it's up to 4 instructions on ARM64)

The same is true for Arm64, any 32-bit constant is no more than 2 instructions since movk can write 16-bits to bits 0..15, 16..31, 32..47, or 48..63. It's only up to 4 for 64-bit constants, which is also the same for Arm32. There are then many patterns that can allow some constants to be fewer instructions.

Notably, however, some of the C/C++ compilers still opt to emit a load for such constants from memory rather than emitting the 2-instruction sequence. For Arm32, clang emits ldr for 32-bit and ldr; ldr for 64-bit constants. While gcc emits movw; movt for 32-bit and ldrd for 64-bit constants. For Arm64, clang and gcc both emit mov; movk for 32-bit and mov; movk; movk; movk for 64-bit constants. While MSVC emits mov; movk for 32-bit and ldr for 64-bit constants. -- In all cases, this is a constant that can't be represented by a shorter alternative sequence.

It seems there's no real consensus for what's best and I imagine it likely depends partially on whether the underlying hardware allows for fusing of the mov; movk; ... movk sequences, whether those instructions are in a loop, whether they straddle a code alignment boundary (which may impact the decode window), etc.

@EgorBo
Copy link
Member

EgorBo commented Mar 21, 2024

The same is true for Arm64, any 32-bit constant is no more than 2 instructions since movk can write 16-bits to bits 0..15, 16..31, 32..47, or 48..63. It's only up to 4 for 64-bit constants, which is also the same for Arm32. There are then many patterns that can allow some constants to be fewer instructions.

I meant that a good chunks of constants are handles, and they're 32bit on arm32 and 64bit on arm64

@BruceForstall
Copy link
Member Author

Failures are all known.

@BruceForstall
Copy link
Member Author

/ba-g Failure is #100047 -- should have been marked ok here

@BruceForstall BruceForstall merged commit b549229 into dotnet:main Mar 22, 2024
181 of 189 checks passed
@BruceForstall BruceForstall deleted the EnableConstantCSEForArm branch March 22, 2024 21:37
@github-actions github-actions bot locked and limited conversation to collaborators Apr 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants