Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that Sse3.MoveAndDuplicate correctly tracks supporting SIMD scalar loads #97783

Merged
merged 1 commit into from
Jan 31, 2024

Conversation

tannergooding
Copy link
Member

This resolves #97688 by ensuring that Sse3.MoveAndDuplicate correctly tracks supporting SIMD scalar loads so that the necessary containment happens everywhere its expected.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 31, 2024
@ghost ghost assigned tannergooding Jan 31, 2024
@ghost
Copy link

ghost commented Jan 31, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This resolves #97688 by ensuring that Sse3.MoveAndDuplicate correctly tracks supporting SIMD scalar loads so that the necessary containment happens everywhere its expected.

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

We may want to consider backporting the fix to .NET 8 given its LTS. Since we're not marking it contained, LSRA assigns the value to a register and may expect it there for a latter use. Where-as codegen currently treats it as contained anyways and consumes the value directly from memory.

@ryujit-bot
Copy link

Diff results for #97783

Assembly diffs

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,517,908 contexts (991,070 MinOpts, 1,526,838 FullOpts).

MISSED contexts: 1 (0.00%)

Overall (-468 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch 403,725,618 -282
libraries.pmi.linux.x64.checked.mch 60,420,245 -36
libraries_tests.run.linux.x64.Release.mch 337,126,106 -88
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,558,776 -62
MinOpts (-326 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch 279,754,702 -242
libraries_tests.run.linux.x64.Release.mch 183,759,693 -84
FullOpts (-142 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch 123,970,916 -40
libraries.pmi.linux.x64.checked.mch 60,307,388 -36
libraries_tests.run.linux.x64.Release.mch 153,366,413 -4
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,941,008 -62

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,512,209 contexts (997,391 MinOpts, 1,514,818 FullOpts).

MISSED contexts: 3 (0.00%)

Overall (-468 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x64.checked.mch 393,207,052 -332
libraries.crossgen2.windows.x64.checked.mch 39,486,563 +0
libraries.pmi.windows.x64.checked.mch 61,663,491 -28
libraries_tests.run.windows.x64.Release.mch 282,129,292 -84
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,066,325 -24
MinOpts (-358 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x64.checked.mch 272,788,854 -274
libraries_tests.run.windows.x64.Release.mch 175,858,318 -84
FullOpts (-110 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x64.checked.mch 120,418,198 -58
libraries.crossgen2.windows.x64.checked.mch 39,485,376 +0
libraries.pmi.windows.x64.checked.mch 61,549,970 -28
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 126,447,219 -24

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Warning: Different compilers used for base and diff JITs. Results may be misleading.
Base JIT's compiler: MSVC 193933218
Diff JIT's compiler: MSVC 193933321

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.01%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch -0.01%
libraries.crossgen2.linux.arm64.checked.mch -0.02%
libraries.pmi.linux.arm64.checked.mch -0.02%
libraries_tests.run.linux.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.02%
realworld.run.linux.arm64.checked.mch -0.02%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.01%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%
coreclr_tests.run.linux.arm64.checked.mch -0.02%
libraries.crossgen2.linux.arm64.checked.mch -0.02%
libraries.pmi.linux.arm64.checked.mch -0.02%
libraries_tests.run.linux.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.02%
realworld.run.linux.arm64.checked.mch -0.02%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%

Throughput diffs for linux/x64 ran on windows/x64

Warning: Different compilers used for base and diff JITs. Results may be misleading.
Base JIT's compiler: MSVC 193933218
Diff JIT's compiler: MSVC 193933321

Overall (-0.01% to -0.00%)
Collection PDIFF
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.01%
realworld.run.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%
FullOpts (-0.01% to -0.00%)
Collection PDIFF
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.01%
realworld.run.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Warning: Different compilers used for base and diff JITs. Results may be misleading.
Base JIT's compiler: MSVC 193933218
Diff JIT's compiler: MSVC 193933321

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.02%
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch -0.01%
libraries.crossgen2.osx.arm64.checked.mch -0.02%
libraries.pmi.osx.arm64.checked.mch -0.02%
libraries_tests.run.osx.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.02%
realworld.run.osx.arm64.checked.mch -0.02%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.02%
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.02%
coreclr_tests.run.osx.arm64.checked.mch -0.02%
libraries.crossgen2.osx.arm64.checked.mch -0.02%
libraries.pmi.osx.arm64.checked.mch -0.02%
libraries_tests.run.osx.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.02%
realworld.run.osx.arm64.checked.mch -0.02%

Throughput diffs for windows/arm64 ran on windows/x64

Warning: Different compilers used for base and diff JITs. Results may be misleading.
Base JIT's compiler: MSVC 193933218
Diff JIT's compiler: MSVC 193933321

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.02%
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.01%
coreclr_tests.run.windows.arm64.checked.mch -0.01%
libraries.crossgen2.windows.arm64.checked.mch -0.02%
libraries.pmi.windows.arm64.checked.mch -0.02%
libraries_tests.run.windows.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.02%
realworld.run.windows.arm64.checked.mch -0.02%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.02%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.02%
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.02%
coreclr_tests.run.windows.arm64.checked.mch -0.02%
libraries.crossgen2.windows.arm64.checked.mch -0.02%
libraries.pmi.windows.arm64.checked.mch -0.02%
libraries_tests.run.windows.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.02%
realworld.run.windows.arm64.checked.mch -0.02%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.02%

Throughput diffs for windows/x64 ran on windows/x64

Warning: Different compilers used for base and diff JITs. Results may be misleading.
Base JIT's compiler: MSVC 193933218
Diff JIT's compiler: MSVC 193933321

Overall (-0.01% to -0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%
benchmarks.run.windows.x64.checked.mch -0.01%
realworld.run.windows.x64.checked.mch -0.01%
smoke_tests.nativeaot.windows.x64.checked.mch -0.01%
FullOpts (-0.01% to -0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%
benchmarks.run.windows.x64.checked.mch -0.01%
benchmarks.run_pgo.windows.x64.checked.mch -0.01%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch -0.01%
realworld.run.windows.x64.checked.mch -0.01%
smoke_tests.nativeaot.windows.x64.checked.mch -0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97783

Assembly diffs

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,293,451 contexts (839,658 MinOpts, 1,453,793 FullOpts).

MISSED contexts: 45 (0.00%)

Overall (-542 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x86.checked.mch 309,385,411 -287
libraries.crossgen2.windows.x86.checked.mch 31,716,097 -80
libraries.pmi.windows.x86.checked.mch 49,269,895 -33
libraries_tests.run.windows.x86.Release.mch 186,683,551 -68
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,821,073 -74
MinOpts (-304 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x86.checked.mch 201,788,134 -236
libraries_tests.run.windows.x86.Release.mch 98,273,700 -68
FullOpts (-238 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x86.checked.mch 107,597,277 -51
libraries.crossgen2.windows.x86.checked.mch 31,715,037 -80
libraries.pmi.windows.x86.checked.mch 49,174,662 -33
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,141,009 -74

Details here


@tannergooding tannergooding merged commit e3ff66b into dotnet:main Jan 31, 2024
129 checks passed
@tannergooding tannergooding deleted the fix-97688 branch January 31, 2024 23:52
@tannergooding
Copy link
Member Author

@jakobbotsch do you want me to go ahead and backport this to .NET 8?

@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2024
@tannergooding
Copy link
Member Author

/backport to release/8.0-staging

@github-actions github-actions bot unlocked this conversation Mar 28, 2024
Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/8471953595

Copy link
Contributor

@tannergooding backporting to release/8.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Ensure that Sse3.MoveAndDuplicate correctly tracks supporting SIMD scalar loads
Using index info to reconstruct a base tree...
M	src/coreclr/jit/lowerxarch.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/jit/lowerxarch.cpp
CONFLICT (content): Merge conflict in src/coreclr/jit/lowerxarch.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Ensure that Sse3.MoveAndDuplicate correctly tracks supporting SIMD scalar loads
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Copy link
Contributor

@tannergooding an error occurred while backporting to release/8.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion failed 'hwintrinsicChild->isContained()' in 'System.Numerics.Tensors.TensorPrimitives+ScaleBOperator
3 participants