Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for InitBlock/CopyBlock operations #2154

Merged
merged 3 commits into from
Dec 3, 2021

Conversation

echesakov
Copy link

@echesakov echesakov commented Nov 24, 2021

This adds benchmark for GT_STORE_BLK node in the JIT that responsibly for two operations in .NET - InitBlock and CopyBlock.

@echesakov echesakov marked this pull request as ready for review November 24, 2021 01:58
@echesakov
Copy link
Author

@DrewScoggins @adamsitnik PTAL
cc @dotnet/jit-contrib

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks overall look good, but I am afraid that they are mostly testing the same code path. We should reduce the numbers of benchmarks. PTAL at my comments and links that I've provided.

@echesakovMSFT thank you!

src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
Copy link
Author

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik Thank you for your review! As I mentioned in the replies, at the moment, there is no difference between heap and localAddr code paths for InitBlock and CopyBlock. However, I am trying to implement an optimization to see if we can utilize 16-byte alignment of sp (and known alignment of fp) in the JIT on Arm64 and make a use of appropriate store operations.

For context: according to Arm Cortex-A76 Software Optimization Guide
(and similar guides for other microarchitectures) store operations that cross a 16-byte boundary can incur additional latency on Arm64. Hence, we should use these only when the JIT can proof that the store won't cross such boundary. The JIT won't be able to do this in a general case, but can do this for locals.

If you prefer, I can hold off on the benchmark PR until after dotnet/runtime#61030 is merged.

src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt Outdated Show resolved Hide resolved
@adamsitnik
Copy link
Member

If you prefer, I can hold off on the benchmark PR until after dotnet/runtime#61030 is merged.

@echesakovMSFT thank you for detailed answers! In such case it would be better to merge the benchmarks before dotnet/runtime#61030 so our Reporting System can start gathering data and when the PR is merged we can see how it affects the performance for all configs.

@DrewScoggins
Copy link
Member

I am a little confused by the last line of the .tt file. We don't use it when generating the tests, we use the loop for that, so what is it there for? I assume the plan is to instead of walking by eight bytes in the for loop we instead iterate over the array and use the values there to decide on the different sizes for the copying. If not, we should reduce the number of testcases that we generate across the different sizes and only use sizes we believe will give us meaningful differences between them.

@echesakov
Copy link
Author

I am a little confused by the last line of the .tt file. We don't use it when generating the tests, we use the loop for that, so what is it there for? I assume the plan is to instead of walking by eight bytes in the for loop we instead iterate over the array and use the values there to decide on the different sizes for the copying. If not, we should reduce the number of testcases that we generate across the different sizes and only use sizes we believe will give us meaningful differences between them.

You are right, I was using the array at the last line before - but then I wanted to do some extra testing locally and forgot to revert the change. Updated and reduced the number of tests.

@echesakov
Copy link
Author

@adamsitnik @DrewScoggins PTAL one more time - I believe I addressed all the suggestions.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @echesakovMSFT !

@adamsitnik adamsitnik merged commit 9e42179 into dotnet:main Dec 3, 2021
@echesakov echesakov deleted the InitBlock-CopyBlock-Benchmarks branch December 3, 2021 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants