Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Consider using "DC ZVA" instruction #67244

Open
Tracked by #64820 ...
kunalspathak opened this issue Mar 28, 2022 · 6 comments
Open
Tracked by #64820 ...

Arm64: Consider using "DC ZVA" instruction #67244

kunalspathak opened this issue Mar 28, 2022 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@kunalspathak
Copy link
Member

kunalspathak commented Mar 28, 2022

We should consider using DC ZVA instruction in the runtime so it can be directly used by GC or other places where memset is needed. “DC ZVA” seems effective and takes memory system into write streaming mode which avoids doing a linefill the L1 cache during scenarios like memset. Last year, we merged the work to use “DC ZVA” for zero init the frame. I was hoping to see the memset implementation to be similar to the one suggested in Arm64 optimizing guide.

Some other refence: ARM-software/tf-issues#408

category:implementation
theme:intrinsics
skill-level:expert
cost:medium
impact:small

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 28, 2022
@kunalspathak
Copy link
Member Author

@a74nh

@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 29, 2022
@ghost
Copy link

ghost commented Mar 29, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

We should consider using DC ZVA instruction in the runtime so it can be directly used by GC or other places where memset is needed. “DC ZVA” seems effective and takes memory system into write streaming mode which avoids doing a linefill the L1 cache during scenarios like memset. Last year, we merged the work to use “DC ZVA” for zero init the frame. I was hoping to see the memset implementation to be similar to the one suggested in Arm64 optimizing guide.

Some other refence: ARM-software/tf-issues#408

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Apr 4, 2022
@kunalspathak kunalspathak modified the milestones: 7.0.0, Future May 18, 2022
@kunalspathak
Copy link
Member Author

Moving to future.

@kunalspathak
Copy link
Member Author

Note that in JIT, we already use this instruction for zero init the frame.

GetEmitter()->emitIns_R(INS_dczva, EA_PTRSIZE, addrReg);

@a74nh
Copy link
Contributor

a74nh commented Jul 14, 2023

Looking back through the various discussions and related pull requests, ZVA is already used in a number of places and in elsewhere it was dropped due to lack of performance impact.

AIUI, this issue is only concerned with where coreclr today calls out to memset().

On Linux, memset should already be fairly optimal. See https://github.com/bminor/glibc/blob/master/sysdeps/aarch64/memset.S
If the the size is big enough, and setting to 0, and zva exists on the hardware (via checking system register using mrs), then it uses zva.
A quick check on Ubuntu 22.04.2 confirmed it ended up using zva.

I'm not sure what happens on Windows. If it doesn't use zva and there are no plans for it to do so, then maybe it makes sense too add an implementation inside coreclr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants