Improving ARM64 Performance in .NET 7.0 #64820

kunalspathak · 2022-02-04T18:13:34Z

In .NET 7.0, we will continue our efforts to improve the Arm64 code quality and closing the performance gap with x64. Similar to how we did this in .NET 5 in #35853 , we will continue the trend of tracking all the Arm64 issues in a top level issue.

Moved to Future Work

ghost · 2022-02-04T18:13:40Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

In .NET 7.0, we will continue our efforts to improve the Arm64 code quality and closing the performance gap with x64. Similar to how we did this in .NET 5 in #35853 , we will continue the trend of tracking all the Arm64 issues in a top level issue.

Author:	kunalspathak
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`, `User Story`
Milestone:	-

kunalspathak · 2022-02-04T18:13:46Z

@dotnet/jit-contrib

adamsitnik · 2022-04-01T10:32:29Z

Based on #67339 I think it would be good to add #62302 to this list.

kunalspathak · 2022-04-07T21:52:59Z

Based on #67339 I think it would be good to add #62302 to this list.

Done. Thanks for preparing the report.

kunalspathak · 2022-04-08T22:07:59Z

.NET 7 items:

Issue	Owner	ETA	Doable in .NET 7
Arm64: Use 8.1 atomics #67824	@kpathak	June' 22	Yes
Arm64: Align methods containing loops to 32B #59828	@kpathak	DONE
Loop Alignment support for Arm64 #60135	@kpathak	DONE
Hide 'align' instruction behind jmp #60787	@kpathak	DONE
Arm64: Better addressing mode for float/double array access #64819	@EgorBo	DONE
Correctly get the last level cache size used by GC #60166	@mangod9	DONE
The thread pool's global queue doesn't scale well on machines with a large processor count #67845	@mangod9	DONE	Yes
Equivalent thread pool change in Kestrel #67845	@sebastienros	Done
Arm64: Environment.ProcessorCount returns wrong value on higher core machine #67180 ~~(WIP: #68639)~~	@mangod9	June' 22	?
Arm64: Revisit the heuristics for IO completion poller threads #67266	@mangod9	June' 22	?
Optimize jump stubs on arm64 #62302	@EgorBo	June' 22	?
ARM64: Optimize a % b operation #34937	@TIHan	DONE
Double constants usage in a loop can be CSEed #35257 (WIP)	@TIHan	June' 22	?
x64 vs ARM64 Microbenchmarks Performance Study Report #67339	@EgorBo, @kpathak	June' 22	Yes
Arm64: Better addressing mode for array access whose elements are accessed byref #67981	@EgorBo	June' 22	Yes
Arm64: Forward memset/memcpy to CRT implementation #67326	@kpathak	DONE
Arm64: Have CpBlkUnroll and InitBlkUnroll use SIMD registers #68085	@kpathak	DONE
Hoisting the invariant out of multi-level nested loops #61420	@kpathak, @BruceForstall	Done
Arm64: Generate conditional comparison and selection instructions #55364	@a74nh	WIP: #67894	Yes
Optimize System.Text.ASCIIUtility for arm64 using cross-platform intrinsics #41292	@a74nh	Done
Optimize System.Buffers for arm64 using cross-platform intrinsics #35033	@a74nh	Done

Stretch goals:

Issue	Owner	ETA
[Arm64] Peephole optimization opportunities #55365	@a74nh	TBD
[LSRA] Add support for allocating consecutive registers #39457	@kpathak	Future
Enable multi-register intrinsics support for Arm64 #64921	@BruceForstall	Future
API Proposal : Arm TableVectorLookup and TableVectorExtension intrinsics #1277	@a74nh	Future
jitdump output not accepted by ARM streamline #62456	@RobertHenry6bev	Future
Optimize set_brick code in GC	@Maoni0	TBD
[ARM64] Performance regression: Utf8Encoding #41699	@a74nh	TBD
[ARM64/Linux] Inefficient conditionals branching #12735	@a74nh	TBD
JIT: Redundant fmov's on arm64 for a simple function #58954	@a74nh	TBD
Arm64: Consider using "DC ZVA" instruction #67244	@kpathak, @a74nh	TBD
Review the multi-op instruction usage for Arm64 #68028	@TIHan	Future
Arm64: Evaluate if it is possible to combine subsequent field loads in a single load #64815 (lower priority)	TBD	TBD
Arm64: In mod operation happening inside the loop, if divisor is an invariant, hoist the divisor checks #64795	@TIHan	.NET 8

JulieLeeMSFT · 2022-04-18T23:42:45Z

#68028

kunalspathak · 2022-04-19T02:17:38Z

#68028

Included in the table above.

a74nh · 2022-07-13T16:42:43Z

Optimize System.Text.ASCIIUtility for arm64 using cross-platform intrinsics
Issue: #41292
PR: #70080 and #71637
Approach taken:
The existing Sse2/Sse41 implementation was moved to use the Vector128 API.
Arm64 was switched to use the Vector128 implementation instead of the vector generic version.
Where required for performance, the Sse2/Sse41/AdvSimd APIs were used.
Impact:
Small improvement in the relevant microbenchmarks on Arm64.
These changes were not significant enough to be picked up by the Performanceautofiler post merge.
Performance gain was small due to the vector generic version being good/simple.
Use of the Vector128 API helps to reduce code debt.
Follow on work:
None.

Optimize System.Buffers for arm64 using cross-platform intrinsics
Issue: #35033
PR: #70654 and dotnet/performance#2479
Approach taken:
Fixed issue with the microbenchmarks using invalid data.
The existing Ssse3 implementation was moved to use the Vector128 API.
Arm64 was switched to use the Vector128 implementation instead of non-vectorised version.
Where required for performance, the Sse3/AdvSimd APIs were used.
Impact:
Large improvement in the relevant microbenchmarks on Arm64.
Performance improvements detected by the Performanceautofiler:
dotnet/perf-autofiling-issues#6346
dotnet/perf-autofiling-issues#6334
dotnet/perf-autofiling-issues#6340
dotnet/perf-autofiling-issues#6328
dotnet/perf-autofiling-issues#6327
dotnet/perf-autofiling-issues#6326
dotnet/perf-autofiling-issues#6321
Use of the Vector128 API helps to reduce code debt.
Follow on Work:
Once multiple register instructions (such as LD4) have been implemented in Vector128, then further improvements may be possible switching to an implementation based on the NEON version of the Aklomp base64 algorithm.

kunalspathak · 2022-07-18T16:03:03Z

The only thing remaining from the list is #55364 and it is already set for .NET 7. I will move this to .NET 8 to track remaining work.

kunalspathak · 2022-10-13T15:17:37Z

Replaced with #77010

kunalspathak added the User Story A single user-facing feature. Can be grouped under an epic. label Feb 4, 2022

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 4, 2022

JulieLeeMSFT changed the title ~~Improving ARM64 Performance in .NET 7.0 – Closing the gap with x64~~ Improving ARM64 Performance in .NET 7.0 Feb 4, 2022

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 5, 2022

JulieLeeMSFT assigned kunalspathak Feb 5, 2022

JulieLeeMSFT added this to the 7.0.0 milestone Feb 5, 2022

JulieLeeMSFT mentioned this issue Jul 1, 2022

Arm64v8 ISA optimization, both in the JIT and where runtime and libraries have significant x64 optimizations #70527

Open

5 tasks

kunalspathak modified the milestones: 7.0.0, 8.0.0 Jul 18, 2022

kunalspathak mentioned this issue Oct 13, 2022

Improving Arm64 Performance in .NET 8.0 #77010

Closed

28 tasks

kunalspathak closed this as completed Oct 13, 2022

ghost locked as resolved and limited conversation to collaborators Nov 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving ARM64 Performance in .NET 7.0 #64820

Improving ARM64 Performance in .NET 7.0 #64820

kunalspathak commented Feb 4, 2022 •

edited by JulieLeeMSFT

Loading

ghost commented Feb 4, 2022

kunalspathak commented Feb 4, 2022

adamsitnik commented Apr 1, 2022

kunalspathak commented Apr 7, 2022

kunalspathak commented Apr 8, 2022 •

edited

Loading

JulieLeeMSFT commented Apr 18, 2022

kunalspathak commented Apr 19, 2022

a74nh commented Jul 13, 2022

kunalspathak commented Jul 18, 2022

kunalspathak commented Oct 13, 2022

Improving ARM64 Performance in .NET 7.0 #64820

Improving ARM64 Performance in .NET 7.0 #64820

Comments

kunalspathak commented Feb 4, 2022 • edited by JulieLeeMSFT Loading

Moved to Future Work

ghost commented Feb 4, 2022

kunalspathak commented Feb 4, 2022

adamsitnik commented Apr 1, 2022

kunalspathak commented Apr 7, 2022

kunalspathak commented Apr 8, 2022 • edited Loading

JulieLeeMSFT commented Apr 18, 2022

kunalspathak commented Apr 19, 2022

a74nh commented Jul 13, 2022

kunalspathak commented Jul 18, 2022

kunalspathak commented Oct 13, 2022

kunalspathak commented Feb 4, 2022 •

edited by JulieLeeMSFT

Loading

kunalspathak commented Apr 8, 2022 •

edited

Loading