Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enregister multireg lclVars #36862

Merged
merged 10 commits into from
Jun 11, 2020
Merged

Conversation

CarolEidt
Copy link
Contributor

Allow struct lclVars that are returned in multiple registers to be
enregistered, as long as the fields are a match for the registers.

Fix #34105

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 22, 2020
Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Can't use field by field assignment if the src is a call.
if (src->OperGet() == GT_CALL)
{
JITDUMP(" src is a call");
// C++ style CopyBlock with holes
requiresCopyBlock = true;
}

I was expecting changes in this part that is currently blocking independent struct promotion for ASG(LCL_VAR struct, call struct). Could you please explain how your change avoids that block?

src/coreclr/src/jit/codegenarmarch.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/codegenxarch.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/codegencommon.cpp Outdated Show resolved Hide resolved
{
// This should only be called for multireg lclVars.
assert(compiler->lvaEnregMultiRegVars);
assert(tree->IsMultiRegLclVar() || (tree->gtOper == GT_COPY));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about copy, won't it be better to have tree->IsMultiReg that recognizes both IsMultiRegLclVar and copy->GetRegCount() > 1?

src/coreclr/src/jit/codegenlinear.cpp Show resolved Hide resolved
src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/lclvars.cpp Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
if (!dest->IsMultiRegLclVar() || (blockWidth != destLclVar->lvExactSize) ||
(destLclVar->lvCustomLayout && destLclVar->lvContainsHoles))
{
// Mark it as DoNotEnregister.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it mark struct A { bool a; int b; } as DoNotEnregister?

@CarolEidt
Copy link
Contributor Author

The liveness model for multi-reg lclVar stores is challenging, since they both use and define multiple registers. Multireg calls (i.e. that define multiple registers), and multireg returns (that consume multiple registers don't suffer from the same issues as all the defs (in the former case) or the uses (in the latter case) can be modeled as occuring simultaneously. This will also be the case (eventually) for intrinsics that define multiple registers.

For multi-reg lclVar stores, we don't want to model them as occuring simultaneously, as then we must guarantee that all uses and defs have no conflicts, and without adding non-trivial complexity to the register allocator, that would actually mean that we couldn't support an assignment that reuses the source registers without spill. Take the following example with a 2-register return:

t1 = CALL
STORE V1<V10,V11> = t2
STORE V2<V12,V13> = V1<V10,V11>
RETURN V2

What we want this to generate is a simple call and return with no register spills or copies. While we want to ensure that the second field doesn't occupy the first return register at the point of the RETURN, we want the first field to do so.

So, the model for a STORE_LCL_VAR is as follows:
- First, use reg#0 of the source, including any reload or copy and updating liveness and GC info
- Then define reg#0 of the lclVar, including spilling if necessary and updating liveness and GC info
- Repeat for all registers

Getting this right is a bit tricky and requires factoring out some of the liveness, spill and GC updates.

@CarolEidt CarolEidt force-pushed the EnregMultiRegVars branch 2 times, most recently from 9396207 to 8dfb4fc Compare June 2, 2020 20:35
@CarolEidt
Copy link
Contributor Author

Regarding fgMorphCopyBlock():

I was expecting changes in this part that is currently blocking independent struct promotion for ASG(LCL_VAR struct, call struct). Could you please explain how your change avoids that block?

With these changes, that will remain a full struct assignment, with the destination lclVar being marked with GTF_VAR_MULTIREG. Here's an example:
Before and after morph (unchanged):

               [000060] -AC-G-------              *  ASG       simd12 (copy)
               [000058] M------N----              +--*  LCL_VAR   simd12<System.Numerics.Vector3>(P) V08 tmp2         
                                                  +--*    float  V08.X (offs=0x00) -> V72 tmp66        
                                                  +--*    float  V08.Y (offs=0x04) -> V73 tmp67        
                                                  +--*    float  V08.Z (offs=0x08) -> V74 tmp68        
               [000057] --C-G-------              \--*  CALL      struct VectorTest.F2_v3,NA,NA
               [000056] ------------ arg0            \--*  CNS_DBL   float  0.60000002384185791

After register allocation:

N299 ( 17,  7) [000057] --CXG-------        t57 = *  CALL      struct VectorTest.F2_v3 REG d0,d1,d2 $201
                                                  /--*  t57    struct 
N301 ( 21, 10) [000060] MA-XG-------              *  STORE_LCL_VAR simd16<System.Numerics.Vector3>(P) V08 tmp2          d16
                                                  *    float  V08.X (offs=0x00) -> V72 tmp66         d16
                                                  *    float  V08.Y (offs=0x04) -> V73 tmp67         d17
                                                  *    float  V08.Z (offs=0x08) -> V74 tmp68         d18 REG d16

Code Generated:

IN0063:                           bl      VectorTest:F2_v3(float):System.Numerics.Vector3
IN0064:                           fmov    s16, s0
IN0065:                           fmov    s17, s1
IN0066:                           fmov    s18, s2

@CarolEidt
Copy link
Contributor Author

@sandreenko - I think this is ready for another round of review. I'm not sure why all the test builds (not the test runs) failed for the jitstressregs leg, and similarly the perf test failures didn't seem related. I'm attempting to re-run them.

@CarolEidt
Copy link
Contributor Author

@dnceng @dotnet/jit-contrib - Can someone help me figure out how to determine what's going wrong with the perf runs? The log shows:

2020/06/03 15:37:29][INFO] Razor build server (process 7389) failed to shut down: The shutdown command failed: The application to execute does not exist: '/home/helixbot/work/BCB909DD/w/B34D09AF/e/.dotnet5273/sdk/5.0.100-preview.6.20266.3/Sdks/Microsoft.NET.Sdk.Razor/tools/netcoreapp3.0/rzc.dll'
[2020/06/03 15:37:29][INFO] 
[2020/06/03 15:37:30][INFO] MSBuild server shut down successfully.
[2020/06/03 15:37:30][INFO] $ popd
[2020/06/03 15:37:30][ERROR] Process exited with status 1
Traceback (most recent call last):
  File "/home/helixbot/work/A7A00953/p/performance/scripts/benchmarks_ci.py", line 264, in <module>
    __main(sys.argv[1:])
  File "/home/helixbot/work/A7A00953/p/performance/scripts/benchmarks_ci.py", line 250, in __main
    dotnet.shutdown_server(verbose)
  File "/home/helixbot/work/A7A00953/p/performance/scripts/dotnet.py", line 656, in shutdown_server
    get_repo_root_path())
  File "/home/helixbot/work/A7A00953/p/performance/scripts/performance/common.py", line 200, in run
    returncode, quoted_cmdline)
subprocess.CalledProcessError: Command '$ dotnet build-server shutdown' returned non-zero exit status 1.
+ export _commandExitCode=1

It's the same error for both the "Linux x64 release coreclr net5.0" and "Linux x64 release mono net5.0". It seems unlikely that this is an issue introduced with my PR. In the past I've found it possible to miss actual failures in these perf runs, but it reports 578 benchmarks run, and there are 578 instances of "Process xxx exited with code 0", so it doesn't appear to be an execution failure.

@BruceForstall
Copy link
Member

@dotnet/runtime-infrastructure @DrewScoggins Can you answer @CarolEidt 's question about problems with the dotnet-runtime-perf runs?

@CarolEidt
Copy link
Contributor Author

@dotnet/dnceng @dotnet/jit-contrib - I'm also having failures in the managed test build for the jitstressregs pipeline. The only error I can find in the log is here:

2020-06-03T17:47:39.3180740Z /Users/runner/runners/2.169.1/work/1/s/.dotnet/sdk/5.0.100-preview.6.20266.3/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(234,5): error NETSDK1004: Assets file '/Users/runner/runners/2.169.1/work/1/s/artifacts/obj/coreclr/ILVerification/project.assets.json' not found. Run a NuGet package restore to generate this file. [/Users/runner/runners/2.169.1/work/1/s/src/coreclr/src/tools/ILVerification/ILVerification.csproj]
2020-06-03T17:47:39.3313690Z ##[error].dotnet/sdk/5.0.100-preview.6.20266.3/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(234,5): error NETSDK1004: Assets file '/Users/runner/runners/2.169.1/work/1/s/artifacts/obj/coreclr/ILVerification/project.assets.json' not found. Run a NuGet package restore to generate this file.

@BruceForstall
Copy link
Member

@agocke @jkotas added the ILVerification stuff just recently, and maybe can help with that error.

@MattGal
Copy link
Member

MattGal commented Jun 3, 2020

@dotnet/dnceng @dotnet/jit-contrib - I'm also having failures in the managed test build for the jitstressregs pipeline. The only error I can find in the log is here:

2020-06-03T17:47:39.3180740Z /Users/runner/runners/2.169.1/work/1/s/.dotnet/sdk/5.0.100-preview.6.20266.3/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(234,5): error NETSDK1004: Assets file '/Users/runner/runners/2.169.1/work/1/s/artifacts/obj/coreclr/ILVerification/project.assets.json' not found. Run a NuGet package restore to generate this file. [/Users/runner/runners/2.169.1/work/1/s/src/coreclr/src/tools/ILVerification/ILVerification.csproj]
2020-06-03T17:47:39.3313690Z ##[error].dotnet/sdk/5.0.100-preview.6.20266.3/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(234,5): error NETSDK1004: Assets file '/Users/runner/runners/2.169.1/work/1/s/artifacts/obj/coreclr/ILVerification/project.assets.json' not found. Run a NuGet package restore to generate this file.

Sorry I missed this, thought it was your previous dnceng tag :) Taking a peek.

@MattGal
Copy link
Member

MattGal commented Jun 3, 2020

@sandreenko
Copy link
Contributor

I am currently trying your changes with JitDoOldStructRetyping == false and there are some failures, I need more time to understand if they should be fixed here or in a separate change. I hope to finish that and review the PR today.

@DrewScoggins
Copy link
Member

The perf run issue is because of a bug in dotnet.exe, we just tracked it down yesterday and checked in a workaround dotnet/performance#1346. You should not see that behavior on performance runs any longer.

CheckMultiRegLclVar(op1->AsLclVar(), &retTypeDesc);
}
}
#else // !FEATURE_MULTIREG_RET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this block put under !FEATURE_MULTIREG_RET?
It breaks compDoOldStructRetyping == false logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it; I'm not sure why I made that change.

//
// Arguments:
// tree - the GT_COPY node
// multiRegIndex - The index of the register to be copied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: formatting in this header is not consistent, asingle register, The index, -when the source, to the register allocated to the register.

regNumber CodeGen::genRegCopy(GenTree* treeNode, unsigned multiRegIndex)
{
assert(treeNode->OperGet() == GT_COPY);
GenTree* op1 = treeNode->AsOp()->gtOp1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GenTree* op1 = treeNode->AsOp()->gtOp1;
GenTree* op1 = copyNode ->gtGetOp1();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to treeNode->gtGetOp1()

assert(op1->IsMultiRegNode());

GenTreeCopyOrReload* copyNode = treeNode->AsCopyOrReload();
// GenTreeCopyOrReload only reports the highest index that has a valid register.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this comment repeats a few lines below and regCount is used only in an assert, maybe delete this block?

// var = call, where call returns a multi-reg return value
// case is handled separately.
if (data->gtSkipReloadOrCopy()->IsMultiRegCall())
// Multi-reg nodes are handled separately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsMultiReg is only for GenTreeLclVarCommon, IsMultiRegNode and IsMultiRegLclVar are for all tree, is it correct?
Maybe rename so:
GenTree has IsMultiRegNode, IsMultiRegLclVar(virtual)
GenTreeLclVarCommon has IsMultiRegLclVar.
or rename tree to lclVar in this function.

Also, I think this comment could be confusing, maybe Stores from a multi-reg source are handled separately?
What does tree->IsMultiReg() return when data->gtSkipReloadOrCopy()->IsMultiRegNode() == true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsMultiReg is only for GenTreeLclVarCommon, IsMultiRegNode and IsMultiRegLclVar are for all tree, is it correct?

Actually, isMultiReg is only for GenTreeLclVar (it must be GT_LCL_VAR not other variants).

Maybe rename so:
GenTree has IsMultiRegNode, IsMultiRegLclVar(virtual)
GenTreeLclVarCommon has IsMultiRegLclVar.

I don't think we really have much of a "standard" for this, but since we don't really support virtual methods on GenTree, I believe we generally keep the names distinct.

I'll rename tree to lclNode (I think that lclVar is confusing because one might expect it to be a LclVarDsc*. I'll make the same change to the version in codegenxarch.cpp.

// mov dst[i], reg[0]
// This effectively moves from `reg[0]` to `dst[i]`, leaving other dst bits unchanged till further
// iterations
// For the case where reg == dst, if we iterate so that we write dst[0] last, we eliminate the need for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we save a mov when reg == dst?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but this is pre-existing code (factored out of genMultiRegStoreToLocal) so I'd prefer not to make that change here.

// use reg #1 from src, including any reload or copy
// define reg #1
// If we defined it as using all the source registers, there would be more
// conflicts and higher register pressure. In addition, it complicates the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why we will have higher register pressure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment; perhaps it's overkill there or should be moved somewhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I forgot that we have not had an algorithm to find an optimal move sequence for such chains.

//
regNumber CodeGen::genConsumeReg(GenTree* tree, unsigned multiRegIndex)
{
if (tree->OperGet() == GT_COPY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off-topic: I prefer OperIs(GT_COPY) because it is usually shorter and doesn't not need additional brackets in && conditions.

void Compiler::fgComputeLifeUntrackedLocal(VARSET_TP& life,
//
// Returns:
// `true` if the node is a dead store (i.e. all fields are dead); `false` otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it currently return true somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I meant to respond to this. Yes, it returns true under the if (isDef) condition:

            // None of the fields were live, so this is a dead store.
            if (!opts.MinOpts())
            {
                // keepAliveVars always stay alive
                VARSET_TP keepAliveFields(VarSetOps::Intersection(this, fieldSet, keepAliveVars));
                noway_assert(VarSetOps::IsEmpty(this, keepAliveFields));

                // Do not consider this store dead if the parent local variable is an address exposed local.
                return !varDsc.lvAddrExposed;
            }

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, one issue in Lower with !compDoOldStructRetyping support and a few questions/nits.

(comp->lvaGetPromotionType(varDsc) != Compiler::PROMOTION_TYPE_INDEPENDENT) ||
(varDsc->lvFieldCnt > MAX_MULTIREG_COUNT))
{
lclNode->ClearMultiReg();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check (varDsc->lvFieldCnt > MAX_MULTIREG_COUNT) during importation and avoid setting MultiReg for such nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see us make these decisions in Lowering, since eventually we'd like to be able to promote register-passed structs with more than MAX_MULTIREG_COUNT fields (i.e. multiple fields packed into a single register).

//
// Arguments:
// lclNode - the GT_LCL_VAR node
// retTypeDesc - a return type descriptor for the consuming node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retTypeDesc can be both a user of this LCL_VAR: (RET(LCL_VAR) and the source STORE_LCL_VAR(call), is it correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can come from a GT_CALL source or a GT_RETURN user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comments.

if (fieldVarDsc->lvTracked && fgLocalVarLivenessDone && // Includes local variable liveness
((tree->gtFlags & GTF_VAR_DEATH) != 0))
if (fieldVarDsc->lvTracked && fgLocalVarLivenessDone &&
tree->AsLclVar()->IsLastUse(i - varDsc->lvFieldLclStart))
Copy link
Contributor

@sandreenko sandreenko Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be AsLclVarCommon or it will fail for GT_LCL_FLD.

Edit: or check that it is a LCL_VAR, as I see Common does not have IsLastUse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for tree->IsMultiRegLclVar(), as that's a more precise condition that it has the last-use bits.

@@ -1440,6 +1440,19 @@ GenTree* Compiler::impAssignStructPtr(GenTree* destAddr,
}
else if (compDoOldStructRetyping())
{
if (dest->OperIs(GT_LCL_VAR) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git has missed this merge conflict, I changed the condition from:

else
{
  dest->gtType = asgType;
}

to

else if (compDoOldStructRetyping())
{
  dest->gtType = asgType;
}

and you added code under the original else // no condition that should not be under the change condition.

Could you please move if (compDoOldStructRetyping()) to dest->gtType = asgType;?

Allow struct lclVars that are returned in multiple registers to be
enregistered, as long as the fields are a match for the registers.

Fix dotnet#34105
Undo change to `fgMorphBlkNode()`
Extract common code for `genMultiRegStoreToLocal`
Fix last use for multireg when extending lifetimes
Fix call dump
@CarolEidt
Copy link
Contributor Author

The jitstressregs leg has no new failures.
@sandreenko - there are 4 new commits since you last reviewed. The first was to fix the merge issue you pointed out, the second is just formatting, the third fixes an issue found by jitStressRegs, and the last fixes an issue that was exposed by #37280. Could you have a look?

@sandreenko
Copy link
Contributor

@sandreenko - there are 4 new commits since you last reviewed. The first was to fix the merge issue you pointed out, the second is just formatting, the third fixes an issue found by jitStressRegs, and the last fixes an issue that was exposed by #37280. Could you have a look?

Looks good, do you have diffs for your changes?

@CarolEidt
Copy link
Contributor Author

Here are the diffs:

Arch OS What Delta Methods Improved Methods Regressed
Arm Windows Crossgen fx+tests -11420 (-0.01%) 282 36
Arm64 Windows Crossgen fx+tests -477648 (-0.19%) 265 11
x64 Linux Crossgen fx+tests -1086482 (-0.55%) 765 9

No diffs for x64/windows or x86.

The regressions are cases where we promote where we didn't previously and which weren't mitigated by my earlier struct improvements. I expect we'll recover many/most of those when we enable enregistering of incoming arguments.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Support assignments of multi-reg values to enregistered promoted structs
6 participants