Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Allow LCL_VARs to store as mask #99608
JIT ARM64-SVE: Allow LCL_VARs to store as mask #99608
Changes from 1 commit
6628904
ed574f9
02fa227
2e2e174
fcdb18a
687af37
1fc8d5b
9dbfe63
85f09bf
7945d51
bd5d951
ce61a40
b5502a6
39c02d0
d8dea0e
8ec8e38
3ec441c
f569512
0110170
ec05e34
71bcb48
24cd68b
5b995ae
3a82d5d
8baee38
b22755a
bd8db6e
e359c93
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this pass down
UNPREDICATED
rather than doing the inverse?That is, I imagine most instructions we encounter will end up unpredicated (or effectively unpredicated by using a
TrueMask
), so I'd expect we end up with overall less checks if we simply saidif (varTypeUsesMaskReg(targetType)) { insOpts |= INS_SCALABLE_OPTS_PREDICATED; }
Otherwise, we end up having to special case every single instruction that has a predicated and unpredicated form and additionally check if they use a mask register or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because the emit function is inconveniently the wrong way around. I was going to fix the emit function up in this PR, but, once register allocation is working we can get rid the enum and get rid of all these checks.
Given the register allocation work is going to take some time, then maybe I should fix up the emit code in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Just wanted to get clarification as it did seem backwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched this around. It no longer matches some of the other emit_R_R_etc functions, but that's ok because it'll all vanish eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we're doing this retyping here. But I don't see where we're doing any fixups to ensure that something which reads the
TYP_MASK
local but expects aTYP_SIMD
will get theConvertMaskToVector
inserted back.Imagine for example, something like:
Where
Vector.GreaterThan
will produce aTYP_MASK
, but the latter+
consumes it as a vector. -- Noting that this is an abnormal pattern, but something we still need to account for and ensure works correctly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might expect such logic to insert a general
ConvertMaskToVector
helper to exist as part ofimpSimdPopStack
and/or as part of the general import helpers we have inhwintrinsic.cpp
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do they need to be converted before being stored to memory?
Shouldn't this be entirely dependent on the target type, just with a specialization for locals since we want to allow efficient consumption in the typical case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a suggestion to change the comments, or a code change too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a change to the comment.
I think we only need to clarify that we're optimizing masks stored to locals to allow better consumption in the expected typical case and that we have the handling in place to ensure that consumers which actually need a vector get the
ConvertMaskToVector
inserted back in.Noting, however, that this could be an incorrect optimization in some cases. For example, if it's a user-defined local where actual vectors are also stored then it would require a
ConvertVectorToMask
to be inserted but it would also be a lossy conversion and therefore unsafe.So I'm not entirely certain this is the "right" place to be doing this either. We might need to actually do this in a later phase where we can observe all uses of a local from the perspective of user defined code, so that we only do this optimization when all values being stored are masks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right place either. Originally I wanted to avoid creating the conversion in the first place, but realised it has to be created and then removed after/during creation of the local. That's how I ended up putting it in importer.
I can't see an obvious later phase this would be done in. Morph? Or maybe during lowering? Anything that already parses local vars would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the correct location for this would be
lclmorph
, which is the same place we do sequencing, mark address exposed locals, and even combine field stores to SIMD.However, @EgorBo or @jakobbotsch may have a better idea on where the code should go.
The consideration in general is really just that we need to find non-address exposed
TYP_SIMD
locals where all stores areConvertMaskToVector
so we can replace it with aTYP_MASK
local instead. There are of course other optimizations that could be done, including splitting a local into two if some stores are masks and some are vectors, but those will be less common than the first.This work needs to happen after import since that's the only code that would be using pre-existing locals. Any locals introduced by CSE or other phases involving a mask will already be
TYP_MASK
themselves.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this kind of brute force changing of a local's type is not safe. It will break all sorts of IR invariants we want to be able to rely on if you don't also do a full pass to fix up other uses of the local.
It seems like this kind of optimization should be its own separate pass after local morph when we know whether or not it is address exposed. We have linked locals at that point, so finding the occurrences is relatively efficient. You can leave some breadcrumb around to only run the pass when there are opportunities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a new pass then. For now I've removed the importer changes.
Technically, this PR could be merged as is. It won't make any jit difference by itself, but it's quite a lot of code that is a blocker for other hw intrinsic work I'm doing (the embedded masks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand a little on what you mean here? I want to make sure I'm parsing the right data in the pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it in a separate PR sounds good to me. Presumably it needs some heuristics to figure out if it's profitable to make the replacements as well.
See
Statement::LocalsTreeList
. It allows to quickly check whether a statement contains a local you are interested in.