-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64: Consider using "DC ZVA" instruction #67244
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWe should consider using DC ZVA instruction in the runtime so it can be directly used by GC or other places where memset is needed. “DC ZVA” seems effective and takes memory system into write streaming mode which avoids doing a linefill the L1 cache during scenarios like memset. Last year, we merged the work to use “DC ZVA” for zero init the frame. I was hoping to see the memset implementation to be similar to the one suggested in Arm64 optimizing guide. Some other refence: ARM-software/tf-issues#408
|
Moving to future. |
Note that in JIT, we already use this instruction for zero init the frame. runtime/src/coreclr/jit/codegenarm64.cpp Line 2034 in 3fded0b
|
Looking back through the various discussions and related pull requests, ZVA is already used in a number of places and in elsewhere it was dropped due to lack of performance impact. AIUI, this issue is only concerned with where coreclr today calls out to On Linux, I'm not sure what happens on Windows. If it doesn't use |
We should consider using DC ZVA instruction in the runtime so it can be directly used by GC or other places where memset is needed. “DC ZVA” seems effective and takes memory system into write streaming mode which avoids doing a linefill the L1 cache during scenarios like memset. Last year, we merged the work to use “DC ZVA” for zero init the frame. I was hoping to see the memset implementation to be similar to the one suggested in Arm64 optimizing guide.
Some other refence: ARM-software/tf-issues#408
category:implementation
theme:intrinsics
skill-level:expert
cost:medium
impact:small
The text was updated successfully, but these errors were encountered: