-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't retype struct as primitive types in import. #33225
Conversation
96ff2ec
to
6cb6b6c
Compare
PTAL @CarolEidt @dotnet/jit-contrib, I think that is ready for the first round of review. |
f7875df
to
2660e14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
It's awesome to remove the retyping from the front-end, but I think that in Lowering when we have something that's returned in a single register, we should go ahead and retype. I think it would simplify things and would not lose information that the backend would need.
BlockRange().InsertAfter(call, bitcast); | ||
callUse.ReplaceWith(comp, bitcast); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see us, perhaps in future, use BITCAST
only where we actually require that the bits get moved to a different register file. We should be able to handle those cases much like the way multireg returns are handled. In fact, it's not clear why we need separate handling for that.
25f7779
to
2e00f95
Compare
I think this is ready for the second round, maybe expect No diffs if
but the biggest part of it due to I have checked SPMI/crossgen/pmi locally. I will kick appropriate CI testing this night. |
src/coreclr/src/jit/gentree.cpp
Outdated
CORINFO_CLASS_HANDLE valStructHnd = gtGetStructHandleIfPresent(val); | ||
if (varTypeIsStruct(varDsc) && (valStructHnd == NO_CLASS_HANDLE) && !varTypeIsSIMD(valTyp)) | ||
{ | ||
// That is a very special case when we have lost classHandle from a LCL_FIELD node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not happy with that solution, but the cost of having fieldSeq
with information about overlapping fields was too high, both in terms of TP (checking the new flag in each access to it), in terms of memory consumption (make it 24 bytes instead of 16) and in implementation cost(there are dozens of places where we check for NotAField
, many of them are very old and it was unclear how they should work with the new type.
Also, we don't often have overlapping fields, so it felt like doing I was spending too much on such a rare case. The frameworks showed -12 bytes of improvements from my fix that was an awful regression in TP and memory.
|
||
static int Main(string[] args) | ||
{ | ||
TestClass(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test uses a class and a struct to do the same logic, guess what case we optimize better. I should probably create an issue about that, but again, overlapping fields are rare.
Hm, I think we are thinking about two different possible representations, with these changes we have the following new nodes in lowering: I want to keep this struct representation visible even after lower, so for Another approach would be to produce the old LIR after lowering, meaning replace all such |
ping @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still just a bit apprehensive about using GT_BITCAST
for structs, but I think the idea is growing on me.
I'd like to better understand why the elimination of retyping is tied to FEATURE_MULTIREG_RET
but otherwise it loosk good.
@@ -430,6 +430,14 @@ CONFIG_INTEGER(JitSaveFpLrWithCalleeSavedRegisters, W("JitSaveFpLrWithCalleeSave | |||
#endif // defined(TARGET_ARM64) | |||
#endif // DEBUG | |||
|
|||
#if !FEATURE_MULTIREG_RET | |||
CONFIG_INTEGER(JitAllowStructRetyping, W("JitAllowStructRetyping"), 0) // Allow Jit to retype structs as primitive types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will disable this for x64/windows and not for any other targets - is that due to the regressions you described in your PR comments? I'm not sure why this would be tied to FEATURE_MULTIREG_RET
.
Also, sorry for the focus on naming - but I think it might be good to give this a name that more clearly reflects that it's the "old" or "bad" way. "Allow" sounds too nice. I always described this as "lying about the types" but maybe that's just a bit too negative. Maybe "JitDoOldStructRetyping"? Or is that too verbose and/or negative?
2e00f95
to
c215a49
Compare
c215a49
to
e131051
Compare
How will this impact |
I'll let @sandreenko reply to the specific question of how this work impacts those types (I think it's largely agnostic to those types). In future, however, now that we have a |
The change is agnostic to |
e131051
to
6cc7061
Compare
Ok, the tests are green now, there were a few new failures due to tail call changes. The PR is ready for review. @AndyAyersMS of course, I will repeat analysis for the diffs and post results here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly comments and some non-blocking suggestions or questions.
Looks good!
@@ -437,6 +437,14 @@ CONFIG_INTEGER(JitSaveFpLrWithCalleeSavedRegisters, W("JitSaveFpLrWithCalleeSave | |||
#endif // defined(TARGET_ARM64) | |||
#endif // DEBUG | |||
|
|||
#if !FEATURE_MULTIREG_RET | |||
CONFIG_INTEGER(JitDoOldStructRetyping, W("JitDoOldStructRetyping"), 0) // Allow Jit to retype structs as primitive types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new name - thanks :-)
@@ -17277,6 +17309,9 @@ CORINFO_CLASS_HANDLE Compiler::gtGetStructHandleIfPresent(GenTree* tree) | |||
#endif | |||
break; | |||
} | |||
// TODO-1stClassStructs: add a check that `structHnd != NO_CLASS_HANDLE`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this TODO
// Arguments: | ||
// call - The call node to lower. | ||
// | ||
// Note: it transforms the call's user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit unfortunate that we have to transform the call's user. I wonder whether it would be feasible and reasonable to handle these when lowering the user. I think this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we have agreed to do that in the user lowering but when I did so I saw that:
- we have to process call in
LowerCall
when it is unused, so compilation time is spent to find a user in any case; - the processing goes into 2 functions (in addition to
LowerCall
)LowerBlockStore, LowerStoreIndir
and they are platform-specific, so the logic to find these cases should be duplicated for different platforms; - it takes more code and operations to find these cases from the user, because each struct call has its user that should be modified, but not all struct stores have such calls as they src.
so after a while, I changed it to process it in one place, I think it is more visible.
6cc7061
to
342b12d
Compare
@AndyAyersMS the first type, -580 (-40.93% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinopEasyOut:TypeToIndex(TypeSymbol):Nullable`1This is the first type of improvements.
where optimizations of both
Then we apply CSE and propagation optimizations that benefit from
asm after:
the second type, -203 (-20.86% of base) : Newtonsoft.Json.dasm - DefaultContractResolver:CreatePropertyFromConstructorParameter(JsonProperty,ParameterInfo):JsonProperty:thisThis is the second type of improvements, that was the initial motivation for that work (from SIMD bench). We are inlining a method that returns a small struct:
and before we were transforming it to:
that was blocking
after:
the third type, -13 (-81.25% of base) : System.Private.CoreLib.dasm - ValueTuple:Create():ValueTupleThat is a funny one, for IL like:
we were generating
now it is
(copy propagation for structs was unblocked in one of the preparation PRs).
we now have
when we create a After #34105 and #11413 are fixed we will probably see some other patterns as well instead of the current regressions. |
There are 2 issues that prevent it from being enbaled by default. They are causing significant asm regressions.
e06016f
to
169ce1a
Compare
@sandreenko looks good. I would expect |
yes, all returns and calls that return structs in a register should benefit. I have added |
This change adds
COMPlus_JitNoStructRetyping
that prevents struct retyping forcall struct
andreturn struct
cases. Currently, this retyping is happening in importer, we want to move it to lower.The current retyping forbids later phases to do optimization with these values, for example, it affects code for inlined methods
if we inline
foo()
we won't be able to promote or enregister fields ofnativeSizeStruct
because importer would access it asLCL_FIELD long
and setdoNotEnreg
inimpFixupCallStructReturn
.Why can't we do that right after inlining? Because other optimizations also need to know real types, for example, CSE and VN can't propagate a sequence like that:
but can do it if we don't do retyping.
Notes:
Note1: right after inlining is the right time to do retyping for methods that use return buffer, in that PR this is left in importer.Note2: that change doesn't fix retyping in
impNormStructVal
, for cases likemethodWithStructArgs(foo(), foo())
we currently always create a local var for each struct local argument and retype it as a primitive type. That would be fixed in a separate PR.Note3: that change doesn't fix retyping for limited struct promotion, when for a struct like:
struct PromotedStruct
{
StructWithOneField a; <- we will change the type of this field from struct to int, because we don't have recursive struct promotion.
int b;
}
struct StructWithOneField
{
int a;
}
but that change will get us closer to it.
So the phase in which we should retype struct types into ABI specific types is lowering. See details in https://github.com/dotnet/runtime/blob/master/docs/design/features/first-class-structs.md and #1231.
The PR is done to have no diffs if
COMPlus_JitNoStructRetyping=0
, it helps to catch unwanted side-effects and makes merging safer. I have checked that there are no diffs on all windows platforms using altjit and framework assemblies. The change with the flag enabled was tested on SPMI collections, crossgen of framework libraries, and in a pri1 test run.The overall design is simple: keep structs as struct until lowering, then do retyping for returns and calls, but insert
BITCAST
back to struct types to keep IR correct. Then teach the next phases (lsra, codegen) to work with the new struct nodes.New cases of struct nodes force us to have struct handle on all trees (except the right side of ASG), so
gtGetStructHandleIfPresent
becomes more important.Some initial diffs:
for the benchmark that motivated this change:
some other benchmarks are also winning, but not that significant:
code size changes for my small
StructABI\structreturn.cs
test, improvements happen when we inline the constructor method:Overall, right now, it is a regression, I will start fixing them in the next change. Maybe I will push them to this PR or merge this PR with the flag disabled and fix the regressions in the next.