-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Model branch cond to enter scalar epilogue in VPlan. #92651
Conversation
Use DTU to queue DominatorTree updates directly when connecting basic blocks during VPlan execution. This simplifies DT updates and also automatically allows updating the DT for the VPlan-native path as additional benefit.
This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. To do so, this patch introduces a new VPBlockBase sub-type that simply wraps an existing IR block: VPIRWrapperBlock (name subject to better suggestions). This allows allows connecting blocks naturally to leave nodes. It can also be used to transition modeling of more bits of the skeleton to VPlan gradually. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. At the moment, VPIRWrapperBlocks only wrap an original IR block and don't allow any additions. We could allow IR wrapper blocks to also contain recipes that get added at the beginning of the block. Then the middle.block could also be wrapped. Something like that will also be needed to place the induction/reduction resume phi nodes in the scalar preheader. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the introduction and use of VPIRWrapperBlock. Note that this patch adds and uses a new helper reorderIncomingBlocks to preserve the original order of incoming blocks in created phi nodes; as this patch changes the order branches are created, the order of the predecessors changes, which would cause different order of incoming blocks in some phis we create. The helper ensures the created IR doesn't change (modulo some minor difference in name numbering). After this change landed, the tests should be updated and the helper removed separately. Depends on llvm#92525
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-risc-v Author: Florian Hahn (fhahn) ChangesThis patch moves branch condition creation to enter the scalar epilogue This allows allows connecting blocks naturally to leave nodes. It can Note that the middle.block is still created as part of the skeleton and After this patch lands, I plan to move induction resume value and phi At the moment, VPIRWrapperBlocks only wrap an original IR block and This is a re-worked version based on the earlier Note that this patch adds and uses a new helper reorderIncomingBlocks to Depends on #92525 Patch is 47.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92651.diff 13 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index fd652cb789549..0f9d68753a3d5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -59,6 +59,7 @@
#include "VPlan.h"
#include "VPlanAnalysis.h"
#include "VPlanHCFGBuilder.h"
+#include "VPlanPatternMatch.h"
#include "VPlanTransforms.h"
#include "VPlanVerifier.h"
#include "llvm/ADT/APInt.h"
@@ -2972,22 +2973,7 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
nullptr, Twine(Prefix) + "scalar.ph");
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Set up the middle block terminator. Two cases:
- // 1) If we know that we must execute the scalar epilogue, emit an
- // unconditional branch.
- // 2) Otherwise, we must have a single unique exit block (due to how we
- // implement the multiple exit case). In this case, set up a conditional
- // branch from the middle block to the loop scalar preheader, and the
- // exit block. completeLoopSkeleton will update the condition to use an
- // iteration check, if required to decide whether to execute the remainder.
- BranchInst *BrInst =
- Cost->requiresScalarEpilogue(VF.isVector())
- ? BranchInst::Create(LoopScalarPreHeader)
- : BranchInst::Create(LoopExitBlock, LoopScalarPreHeader,
- Builder.getTrue());
- BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
+ auto *BrInst = new UnreachableInst(LoopMiddleBlock->getContext());
ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);
// Update dominator for loop exit. During skeleton creation, only the vector
@@ -3094,50 +3080,7 @@ void InnerLoopVectorizer::createInductionResumeValues(
}
}
-BasicBlock *InnerLoopVectorizer::completeLoopSkeleton() {
- // The trip counts should be cached by now.
- Value *Count = getTripCount();
- Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
-
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Add a check in the middle block to see if we have completed
- // all of the iterations in the first vector loop. Three cases:
- // 1) If we require a scalar epilogue, there is no conditional branch as
- // we unconditionally branch to the scalar preheader. Do nothing.
- // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
- // Thus if tail is to be folded, we know we don't need to run the
- // remainder and we can use the previous value for the condition (true).
- // 3) Otherwise, construct a runtime check.
- if (!Cost->requiresScalarEpilogue(VF.isVector()) &&
- !Cost->foldTailByMasking()) {
- // Here we use the same DebugLoc as the scalar loop latch terminator instead
- // of the corresponding compare because they may have ended up with
- // different line numbers and we want to avoid awkward line stepping while
- // debugging. Eg. if the compare has got a line number inside the loop.
- // TODO: At the moment, CreateICmpEQ will simplify conditions with constant
- // operands. Perform simplification directly on VPlan once the branch is
- // modeled there.
- IRBuilder<> B(LoopMiddleBlock->getTerminator());
- B.SetCurrentDebugLocation(ScalarLatchTerm->getDebugLoc());
- Value *CmpN = B.CreateICmpEQ(Count, VectorTripCount, "cmp.n");
- BranchInst &BI = *cast<BranchInst>(LoopMiddleBlock->getTerminator());
- BI.setCondition(CmpN);
- if (hasBranchWeightMD(*ScalarLatchTerm)) {
- // Assume that `Count % VectorTripCount` is equally distributed.
- unsigned TripCount = UF * VF.getKnownMinValue();
- assert(TripCount > 0 && "trip count should not be zero");
- const uint32_t Weights[] = {1, TripCount - 1};
- setBranchWeights(BI, Weights);
- }
- }
-
-#ifdef EXPENSIVE_CHECKS
- assert(DT->verify(DominatorTree::VerificationLevel::Fast));
-#endif
- return LoopVectorPreHeader;
-}
std::pair<BasicBlock *, Value *>
InnerLoopVectorizer::createVectorizedLoopSkeleton(
@@ -3198,7 +3141,7 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
// Emit phis for the new starting index of the scalar loop.
createInductionResumeValues(ExpandedSCEVs);
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
// Fix up external users of the induction variable. At this point, we are
@@ -3481,6 +3424,18 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VF.getKnownMinValue() * UF);
}
+// Helper to reorder blocks so they match the original order even after the
+// order of the predecessors changes. This is only used to avoid a number of
+// test changes due to reordering of incoming blocks in phi nodes and should be
+// removed soon, with the tests being updated.
+static void reorderIncomingBlocks(SmallVectorImpl<BasicBlock *> &Blocks,
+ BasicBlock *LoopMiddleBlock) {
+ if (Blocks.front() == LoopMiddleBlock)
+ std::swap(Blocks.front(), Blocks.back());
+ if (Blocks.size() == 3)
+ std::swap(Blocks[0], Blocks[1]);
+}
+
void InnerLoopVectorizer::fixFixedOrderRecurrence(
VPFirstOrderRecurrencePHIRecipe *PhiR, VPTransformState &State) {
// This is the second phase of vectorizing first-order recurrences. An
@@ -3591,7 +3546,9 @@ void InnerLoopVectorizer::fixFixedOrderRecurrence(
PHINode *Phi = cast<PHINode>(PhiR->getUnderlyingValue());
auto *Start = Builder.CreatePHI(Phi->getType(), 2, "scalar.recur.init");
auto *ScalarInit = PhiR->getStartValue()->getLiveInIRValue();
- for (auto *BB : predecessors(LoopScalarPreHeader)) {
+ SmallVector<BasicBlock *> Blocks(predecessors(LoopScalarPreHeader));
+ reorderIncomingBlocks(Blocks, LoopMiddleBlock);
+ for (auto *BB : Blocks) {
auto *Incoming = BB == LoopMiddleBlock ? ExtractForScalar : ScalarInit;
Start->addIncoming(Incoming, BB);
}
@@ -7480,7 +7437,9 @@ static void createAndCollectMergePhiForReduction(
// If we are fixing reductions in the epilogue loop then we should already
// have created a bc.merge.rdx Phi after the main vector body. Ensure that
// we carry over the incoming values correctly.
- for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
+ SmallVector<BasicBlock *> Blocks(predecessors(LoopScalarPreHeader));
+ reorderIncomingBlocks(Blocks, LoopMiddleBlock);
+ for (auto *Incoming : Blocks) {
if (Incoming == LoopMiddleBlock)
BCBlockPhi->addIncoming(FinalValue, Incoming);
else if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
@@ -7551,6 +7510,21 @@ LoopVectorizationPlanner::executePlan(
std::tie(State.CFG.PrevBB, CanonicalIVStartValue) =
ILV.createVectorizedLoopSkeleton(ExpandedSCEVs ? *ExpandedSCEVs
: State.ExpandedSCEVs);
+#ifdef EXPENSIVE_CHECKS
+ assert(DT->verify(DominatorTree::VerificationLevel::Fast));
+#endif
+
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(BestVPlan.getVectorLoopRegion()->getSingleSuccessor());
+
+ using namespace llvm::VPlanPatternMatch;
+ if (MiddleVPBB->begin() != MiddleVPBB->end() &&
+ match(&MiddleVPBB->back(), m_BranchOnCond(m_VPValue()))) {
+ cast<VPIRWrapperBlock>(MiddleVPBB->getSuccessors()[1])
+ ->resetBlock(OrigLoop->getLoopPreheader());
+ } else
+ cast<VPIRWrapperBlock>(MiddleVPBB->getSuccessors()[0])
+ ->resetBlock(OrigLoop->getLoopPreheader());
// Only use noalias metadata when using memory checks guaranteeing no overlap
// across all iterations.
@@ -7687,7 +7661,7 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// inductions in the epilogue loop are created before executing the plan for
// the epilogue loop.
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
void EpilogueVectorizerMainLoop::printDebugTracesAtStart() {
@@ -7811,8 +7785,11 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
VecEpilogueIterationCountCheck,
VecEpilogueIterationCountCheck->getSinglePredecessor());
- DT->changeImmediateDominator(LoopScalarPreHeader,
- EPI.EpilogueIterationCountCheck);
+ if (auto *N = DT->getNode(LoopScalarPreHeader))
+ DT->changeImmediateDominator(LoopScalarPreHeader,
+ EPI.EpilogueIterationCountCheck);
+ else
+ DT->addNewBlock(LoopScalarPreHeader, EPI.EpilogueIterationCountCheck);
if (!Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector()))
// If there is an epilogue which must run, there's no edge from the
// middle block to exit blocks and thus no need to update the immediate
@@ -7876,7 +7853,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
{VecEpilogueIterationCountCheck,
EPI.VectorTripCount} /* AdditionalBypass */);
- return {completeLoopSkeleton(), EPResumeVal};
+ return {LoopVectorPreHeader, EPResumeVal};
}
BasicBlock *
@@ -8625,9 +8602,25 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// modified; a basic block for the vector pre-header, followed by a region for
// the vector loop, followed by the middle basic block. The skeleton vector
// loop region contains a header and latch basic blocks.
+
+ // Add a check in the middle block to see if we have completed
+ // all of the iterations in the first vector loop. Three cases:
+ // 1) If we require a scalar epilogue, there is no conditional branch as
+ // we unconditionally branch to the scalar preheader. Do nothing.
+ // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
+ // Thus if tail is to be folded, we know we don't need to run the
+ // remainder and we can use the previous value for the condition (true).
+ // 3) Otherwise, construct a runtime check.
+ bool RequiresScalarEpilogueCheck =
+ LoopVectorizationPlanner::getDecisionAndClampRange(
+ [this](ElementCount VF) {
+ return !CM.requiresScalarEpilogue(VF.isVector());
+ },
+ Range);
VPlanPtr Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE());
+ *PSE.getSE(), RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
+ OrigLoop);
VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
@@ -8875,7 +8868,7 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// Create new empty VPlan
auto Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE());
+ *PSE.getSE(), true, false, OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
@@ -9084,6 +9077,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
}
}
Builder.setInsertPoint(&*LatchVPBB->begin());
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor());
+ VPBasicBlock::iterator IP = MiddleVPBB->begin();
for (VPRecipeBase &R :
Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
VPReductionPHIRecipe *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
@@ -9192,8 +9188,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
// also modeled in VPlan.
auto *FinalReductionResult = new VPInstruction(
VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
- cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor())
- ->appendRecipe(FinalReductionResult);
+ FinalReductionResult->insertBefore(*MiddleVPBB, IP);
+ IP = std::next(FinalReductionResult->getIterator());
OrigExitingVPV->replaceUsesWithIf(
FinalReductionResult,
[](VPUser &User, unsigned) { return isa<VPLiveOut>(&User); });
@@ -10390,15 +10386,9 @@ PreservedAnalyses LoopVectorizePass::run(Function &F,
RemoveRedundantDbgInstrs(&BB);
}
- // We currently do not preserve dominator analyses with outer loop
- // vectorization. Until this is addressed, mark these analyses as preserved
- // only for non-VPlan-native path.
- // TODO: Preserve Dominator analysis for VPlan-native path.
- if (!EnableVPlanNativePath) {
- PA.preserve<DominatorTreeAnalysis>();
- PA.preserve<ScalarEvolutionAnalysis>();
- }
PA.preserve<LoopAnalysis>();
+ PA.preserve<DominatorTreeAnalysis>();
+ PA.preserve<ScalarEvolutionAnalysis>();
if (Result.MadeCFGChange) {
// Making CFG changes likely means a loop got vectorized. Indicate that
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 27f8e239b1c09..19cff481c4fbb 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -25,6 +25,7 @@
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/Twine.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"
@@ -218,9 +219,9 @@ VPTransformState::VPTransformState(ElementCount VF, unsigned UF, LoopInfo *LI,
DominatorTree *DT, IRBuilderBase &Builder,
InnerLoopVectorizer *ILV, VPlan *Plan,
LLVMContext &Ctx)
- : VF(VF), UF(UF), LI(LI), DT(DT), Builder(Builder), ILV(ILV), Plan(Plan),
- LVer(nullptr),
- TypeAnalysis(Plan->getCanonicalIV()->getScalarType(), Ctx) {}
+ : VF(VF), UF(UF), LI(LI), Builder(Builder), ILV(ILV), Plan(Plan),
+ LVer(nullptr), TypeAnalysis(Plan->getCanonicalIV()->getScalarType(), Ctx),
+ DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy) {}
Value *VPTransformState::get(VPValue *Def, const VPIteration &Instance) {
if (Def->isLiveIn())
@@ -399,8 +400,8 @@ void VPTransformState::packScalarIntoVectorValue(VPValue *Def,
set(Def, VectorValue, Instance.Part);
}
-BasicBlock *
-VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
+BasicBlock *VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG,
+ DomTreeUpdater &DTU) {
// BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.
// Pred stands for Predessor. Prev stands for Previous - last visited/created.
BasicBlock *PrevBB = CFG.PrevBB;
@@ -436,10 +437,33 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
"Trying to reset an existing successor block.");
TermBr->setSuccessor(idx, NewBB);
}
+ DTU.applyUpdates({{DominatorTree::Insert, PredBB, NewBB}});
}
return NewBB;
}
+void VPIRWrapperBlock::execute(VPTransformState *State) {
+ for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
+ VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
+ auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
+ BasicBlock *PredBB = State->CFG.VPBB2IRBB[PredVPBB];
+
+ assert(PredBB && "Predecessor basic-block not found building successor.");
+ auto *PredBBTerminator = PredBB->getTerminator();
+ LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');
+
+ auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);
+ if (TermBr) {
+ // Set each forward successor here when it is created, excluding
+ // backedges. A backward successor is set when the branch is created.
+ unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
+ assert(!TermBr->getSuccessor(idx) &&
+ "Trying to reset an existing successor block.");
+ TermBr->setSuccessor(idx, WrappedBlock);
+ }
+ }
+}
+
void VPBasicBlock::execute(VPTransformState *State) {
bool Replica = State->Instance && !State->Instance->isFirstIteration();
VPBasicBlock *PrevVPBB = State->CFG.PrevVPBB;
@@ -467,6 +491,15 @@ void VPBasicBlock::execute(VPTransformState *State) {
// The Exit block of a loop is always set to be successor 0 of the Exiting
// block.
cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);
+ // Set the insert point for recipe execution in the block.
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ if (getSuccessors().size() == 1) {
+ BranchInst *Br = State->Builder.CreateBr(NewBB);
+ Br->setSuccessor(0, nullptr);
+ NewBB->getTerminator()->eraseFromParent();
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ }
+ State->DTU.applyUpdates({{DominatorTree::Insert, ExitingBB, NewBB}});
} else if (PrevVPBB && /* A */
!((SingleHPred = getSingleHierarchicalPredecessor()) &&
SingleHPred->getExitingBasicBlock() == PrevVPBB &&
@@ -483,7 +516,7 @@ void VPBasicBlock::execute(VPTransformState *State) {
// is the exiting VPBB of this region from a previous instance, or the
// predecessor of this region.
- NewBB = createEmptyBasicBlock(State->CFG);
+ NewBB = createEmptyBasicBlock(State->CFG, State->DTU);
State->Builder.SetInsertPoint(NewBB);
// Temporarily terminate with unreachable until CFG is rewired.
UnreachableInst *Terminator = State->Builder.CreateUnreachable();
@@ -622,6 +655,13 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
printSuccessors(O, Indent);
}
+
+void VPIRWrapperBlock::print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const {
+ O << Indent << "ir-bb<" << getName() << ">\n";
+ printSuccessors(O, Indent);
+}
+
#endif
static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry);
@@ -637,12 +677,23 @@ static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry) {
Entry);
for (VPBlockBase *BB : RPOT) {
VPBlockBase *NewBB = BB->clone();
- for (VPBlockBase *Pred : BB->getPredecessors())
- VPBlockUtils::connectBlocks(Old2NewVPBlocks[Pred], NewBB);
-
Old2NewVPBlocks[BB] = NewBB;
}
+ for (VPBlockBase *BB : RPOT) {
+ VPBlockBase *NewBB = Old2NewVPBlocks[BB];
+ SmallVector<VPBlockBase *> NewPreds;
+ for (VPBlockBase *Pred : BB->getPredecessors()) {
+ NewPreds.push_back(Old2NewVPBlocks[Pred]);
+ }
+ NewBB->setPredecessors(NewPreds);
+ SmallVector<VPBlockBase *> NewSuccs;
+ for (VPBlockBase *Succ : BB->successors()) {
+ NewSuccs.push_back(Old2NewVPBlocks[Succ]);
+ }
+ NewBB->setSuccessors(NewSuccs);
+ }
+
#if !defined(NDEBUG)
// Verify that the order of predecessors and successors matches in the cloned
// version.
@@ -766,7 +817,9 @@ VPlan::~VPlan() {
delete BackedgeTakenCount;
}
-VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE) {
+VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE,
+ bool RequiresScalarEpilogueCheck,
+ bool TailFolded, Loop *TheLoop) {
VPBasicBlock *Preheader = new VPBasicBlock("ph");
VPBasicBlock *VecPreheader = new VPBasicBlock("vector.ph");
auto Plan = std::make_unique<VPlan>(Preheader, VecPreheader);
@@ -777,6 +830,35 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE) {
VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
VPBasicBlock *MiddleVPBB = new VPBasicBlock("middle.block");
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
+
+ BasicBlock *EB = TheLoop->getUniqueExitBlock();
+ if (RequiresScalarEpilogueCheck) {
+ VPIRWrapperBlock *EBWrapper = new VPIRWrapperBlock(EB);
+ VPBlockUtils::insertBlockAfter(EBWrapper, MiddleVPBB);
+
+ auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
+ // Here we use the same DebugLoc as the scalar loop latch terminator instead
+ // of the corresponding compare because they may have ended up with
+ // different line numbers and we want to avoid awkward line stepping while
+ // debugging. Eg. if the compare has got a line number inside the loop.
+ VPValue *Cmp =
+ ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
…anch Conflicts: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.cpp
Split off the VPIRWrapperBlock changes to #93398 to use it for the pre-preheader, which required a bit of generalization. |
This patch adds a new special type of VPBasicBlock that wraps an existing IR basic block. Recipes of the block get added before the terminator of the wrapped IR basic block. Making it a subclass of VPBasicBlock avoids duplicating various APIs to manage recipes in a block, as well as makes sure the traversals filtering VPBasicBlocks automatically apply as well. Initially VPIRBasicBlock are only used for the pre-preheader (wrapping the original preheader of the scalar loop). As follow-up, this will be used to move more parts of the skeleton inside VPlan, starting with the branch and condition in the middle block. Separated out of #92651 PR: #93398
1773da7
to
2372421
Compare
Updated after landing VPIRBasicBlock separately in #93398 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated after landing split off changes.
BranchInst *CondBr = | ||
Builder.CreateCondBr(Cond, Builder.GetInsertBlock(), nullptr); | ||
|
||
if (getParent()->isExiting()) | ||
if (getParent()->isExiting()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in ab9c2b1
VPIRBasicBlock *Preheader = new VPIRBasicBlock(PH); | ||
bool RequiresScalarEpilogueCheck, | ||
bool TailFolded, Loop *TheLoop) { | ||
VPIRBasicBlock *Preheader = new VPIRBasicBlock(TheLoop->getLoopPreheader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 31a94bd
@@ -654,12 +680,23 @@ static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry) { | |||
Entry); | |||
for (VPBlockBase *BB : RPOT) { | |||
VPBlockBase *NewBB = BB->clone(); | |||
for (VPBlockBase *Pred : BB->getPredecessors()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split off in ef1773a
ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks.
Would be good to confirm the changes to two tests in llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll are as desired.
Many test changes are due to sinking middle-block's compare down to its branch, occasionally beyond the coverage of CHECKs.
Leaving various minor comments.
auto *MiddleTerm = | ||
cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator()); | ||
if (MiddleTerm->isConditional() && | ||
hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning weights deserves further discussion; the indicator could be recorded in VPlan itself - whether weights should be assigned to all new branches (possibly adjusting weights of existing branches).
// Thus if tail is to be folded, we know we don't need to run the | ||
// remainder and we can use the previous value for the condition (true). | ||
// 3) Otherwise, construct a runtime check. | ||
bool RequiresScalarEpilogueCheck = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree to have a follow up patch deal with regressions, but does retaining the original condition alone leads to any? I.e., RequiresScalarEpilogue = ... return CM.requiresScalarEpilogue(VF.isVector()) ...
? The range should be clamped the same.
cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor()) | ||
->appendRecipe(FinalReductionResult); | ||
FinalReductionResult->insertBefore(*MiddleVPBB, IP); | ||
IP = std::next(FinalReductionResult->getIterator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this std::next needed - having inserted FinalReductionResult before IP implies that the latter already follows the former?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed in the current version, it is sufficient to use the original IP
@@ -9006,6 +8956,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions( | |||
} | |||
} | |||
Builder.setInsertPoint(&*LatchVPBB->begin()); | |||
VPBasicBlock *MiddleVPBB = | |||
cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor()); | |||
VPBasicBlock::iterator IP = MiddleVPBB->getFirstNonPhi(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could (some of) this be pushed independently, earlier, to clarify change in insertion points, if any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8299bfa, thanks
@@ -448,11 +448,13 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) { | |||
} | |||
|
|||
void VPIRBasicBlock::execute(VPTransformState *State) { | |||
assert(getHierarchicalSuccessors().empty() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to assert empty-or-single?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count | ||
; CHECK-NEXT: Live-in ir<8> = original trip-count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For outer-loop vectorization.
; CHECK: for.body.preheader13: | ||
; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER10]] ], [ [[N_VEC:%.*]], [[MIDDLE_BLOCK:%.*]] ] | ||
; CHECK-NEXT: [[SUM_07_PH:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER10]] ], [ [[TMP7:%.*]], [[MIDDLE_BLOCK]] ] | ||
; CHECK-NEXT: br label [[FOR_BODY:%.*]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple changes to this test combined with the following sum_prefix_with_sum() test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes to the input was left over from debugging, restored, thanks!
; O2: for.body4.preheader9: | ||
; O2-NEXT: [[J_05_PH:%.*]] = phi i64 [ 0, [[FOR_BODY4_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK:%.*]] ] | ||
; O2-NEXT: br label [[FOR_BODY4:%.*]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting: basic-block hoisted from below.
auto Plan = VPlan::createInitialVPlan(SE->getBackedgeTakenCount(L), *SE, | ||
L->getLoopPreheader()); | ||
VPlanHCFGBuilder HCFGBuilder(L, LI.get(), *Plan); | ||
auto Plan = VPlan::createInitialVPlan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better not?
auto Plan = VPlan::createInitialVPlan(SE->getBackedgeTakenCount(L), *SE, | ||
L->getLoopPreheader()); | ||
VPlanHCFGBuilder HCFGBuilder(L, LI.get(), *Plan); | ||
auto Plan = VPlan::createInitialVPlan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Split off from #92651 as suggested.
nit: the last paragraph of the summary is obsolete: "Note that this patch adds and uses a new helper reorderIncomingBlocks ..." |
VPBlockBase *MiddleBB = | ||
IRBB->getPlan()->getVectorLoopRegion()->getSingleSuccessor(); | ||
if (IRBB != IRBB->getPlan()->getPreheader() && | ||
IRBB->getSinglePredecessor() != MiddleBB) { | ||
errs() << "VPIRBasicBlock can only be used as pre-header at the moment!\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errs() << "VPIRBasicBlock can only be used as pre-header at the moment!\n"; | |
errs() << "VPIRBasicBlock can only be used as pre-header or a successor of middle-block at the moment!\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted thanks!
assert(getHierarchicalSuccessors().empty() && | ||
"VPIRBasicBlock cannot have successors at the moment"); | ||
|
||
assert(getHierarchicalSuccessors().size() <= 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(getHierarchicalSuccessors().size() <= 2); | |
assert(getHierarchicalSuccessors().size() <= 2 && | |
"VPIRBasicBlock can have at most two successors at the moment"); |
can any block have more than 2? ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at the moment 2 is the maximum for any block, would probably be good to also check in the verifier.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/1270 Here is the relevant piece of the build log for the reference:
|
Expensive checks failure were caused by now obsolete DT updates during legacy skeleton creation, which have been removed in eedc2c8 |
Split off from llvm#92651 as suggested.
) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on llvm#92525 PR: llvm#92651
Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in llvm#92651. PR: llvm#95816
Reoder code to exit early if the BranchOnCond isn't in an exiting block. This delays retrieving the parent region, which may not be present. Split off from llvm#92651.
Split off from llvm#92651 as suggested.
Rewrite cloneSESE to perform 2 depth-first passes with the first one cloning blocks and the second one updating the predecessors and successors. This is needed to preserve the correct predecessor/successor ordering with llvm#92651 and has been split off as suggested.
…#94760) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on #92651 PR: #94760
…llvm#94760) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on llvm#92651 PR: llvm#94760
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently introduced
VPIRBasicBloc.
Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.
After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.
This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.
Depends on #92525