Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LV] Use SCEV to check if minimum iteration check is known. #111310

Merged
merged 13 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2438,12 +2438,21 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
};

TailFoldingStyle Style = Cost->getTailFoldingStyle();
if (Style == TailFoldingStyle::None)
CheckMinIters =
Builder.CreateICmp(P, Count, CreateStep(), "min.iters.check");
else if (VF.isScalable() &&
!isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
if (Style == TailFoldingStyle::None) {
Value *Step = CreateStep();
ScalarEvolution &SE = *PSE.getSE();
// Check if we can prove that the trip count is >= the step.
const SCEV *TripCountSCEV = SE.getTripCountFromExitCount(
PSE.getBackedgeTakenCount(), CountTy, OrigLoop);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified into (P)SE.getSCEV(Count)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks! Initially thought this may produce worse results, but at least for the existing tests the results don't get pessimized.

if (SE.isKnownPredicate(CmpInst::getInversePredicate(P),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other direction of this check is also interesting - when the entry guard is known to hold, and thus the bypass around the loop is never taken.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complementary case when Count is known to be smaller than Step is presumably avoided when setting MaxVF (and UF), perhaps worth asserting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one case where this improves, in case where we version the stride in LoopAccessAnalysis. Those predicates won't be used when retrieving the max trip count from SCEV, but triggers here.

Test case is in llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll

SE.applyLoopGuards(TripCountSCEV, OrigLoop),
SE.getSCEV(Step)))
CheckMinIters = Builder.getFalse();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth leaving a TODO to simplify skeleton by emitting an unconditional branch to vector preheader, instead of the conditional branch on false below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant - CheckMinIters is initialized to false above, serving tail-folding case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped, thanks!

else
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
} else if (VF.isScalable() &&
!isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization you're doing here applies to the check in this if-block as well. Maybe factor out an getOptimizedCompare lambda or something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime comparison introduced below checks for overflow, in case the overflow check is not known (to be false) at compile time. Perhaps worth asserting that this predicate is indeed unknown to SCEV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert, thanks!

// vscale is not necessarily a power-of-2, which means we cannot guarantee
// an overflow to zero when updating induction variables and so an
// additional overflow check is required before entering the vector loop.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ define void @f1(ptr %A) #0 {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step of 4 * vscale is known to be smaller than count of 1024, based on vscale_range(1,16) attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
Expand Down
44 changes: 9 additions & 35 deletions llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,7 @@ target triple = "aarch64-unknown-linux-gnu"
define void @test_widen(ptr noalias %a, ptr readnone %b) #4 {
; TFNONE-LABEL: @test_widen(
; TFNONE-NEXT: entry:
; TFNONE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
; TFNONE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; TFNONE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE: vector.ph:
; TFNONE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
Expand Down Expand Up @@ -146,10 +143,7 @@ for.cond.cleanup:
define void @test_if_then(ptr noalias %a, ptr readnone %b) #4 {
; TFNONE-LABEL: @test_if_then(
; TFNONE-NEXT: entry:
; TFNONE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
; TFNONE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; TFNONE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE: vector.ph:
; TFNONE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
Expand Down Expand Up @@ -310,10 +304,7 @@ for.cond.cleanup:
define void @test_widen_if_then_else(ptr noalias %a, ptr readnone %b) #4 {
; TFNONE-LABEL: @test_widen_if_then_else(
; TFNONE-NEXT: entry:
; TFNONE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
; TFNONE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; TFNONE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE: vector.ph:
; TFNONE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
Expand Down Expand Up @@ -490,10 +481,7 @@ for.cond.cleanup:
define void @test_widen_nomask(ptr noalias %a, ptr readnone %b) #4 {
; TFNONE-LABEL: @test_widen_nomask(
; TFNONE-NEXT: entry:
; TFNONE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
; TFNONE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; TFNONE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE: vector.ph:
; TFNONE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
Expand Down Expand Up @@ -548,11 +536,6 @@ define void @test_widen_nomask(ptr noalias %a, ptr readnone %b) #4 {
;
; TFFALLBACK-LABEL: @test_widen_nomask(
; TFFALLBACK-NEXT: entry:
; TFFALLBACK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; TFFALLBACK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
; TFFALLBACK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; TFFALLBACK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFFALLBACK: vector.ph:
; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; TFFALLBACK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
; TFFALLBACK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1025, [[TMP3]]
Expand All @@ -561,20 +544,17 @@ define void @test_widen_nomask(ptr noalias %a, ptr readnone %b) #4 {
; TFFALLBACK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 2
; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]
; TFFALLBACK: vector.body:
; TFFALLBACK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; TFFALLBACK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; TFFALLBACK-NEXT: [[TMP6:%.*]] = getelementptr i64, ptr [[B:%.*]], i64 [[INDEX]]
; TFFALLBACK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 2 x i64>, ptr [[TMP6]], align 8
; TFFALLBACK-NEXT: [[TMP7:%.*]] = call <vscale x 2 x i64> @foo_vector_nomask(<vscale x 2 x i64> [[WIDE_LOAD]])
; TFFALLBACK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
; TFFALLBACK-NEXT: store <vscale x 2 x i64> [[TMP7]], ptr [[TMP8]], align 8
; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
; TFFALLBACK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; TFFALLBACK-NEXT: br i1 [[TMP9]], label [[SCALAR_PH]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; TFFALLBACK: scalar.ph:
; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[N_VEC]], [[VECTOR_BODY]] ]
; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
; TFFALLBACK-NEXT: br i1 [[TMP9]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; TFFALLBACK: for.body:
; TFFALLBACK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
; TFFALLBACK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[N_VEC]], [[VECTOR_BODY]] ]
; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 8
; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR5:[0-9]+]]
Expand Down Expand Up @@ -626,10 +606,7 @@ for.cond.cleanup:
define void @test_widen_optmask(ptr noalias %a, ptr readnone %b) #4 {
; TFNONE-LABEL: @test_widen_optmask(
; TFNONE-NEXT: entry:
; TFNONE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
; TFNONE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; TFNONE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE: vector.ph:
; TFNONE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
Expand Down Expand Up @@ -791,10 +768,7 @@ for.cond.cleanup:
define double @test_widen_fmuladd_and_call(ptr noalias %a, ptr readnone %b, double %m) #4 {
; TFNONE-LABEL: @test_widen_fmuladd_and_call(
; TFNONE-NEXT: entry:
; TFNONE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
; TFNONE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; TFNONE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; TFNONE-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto with vscale_range(2,16)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

; TFNONE: vector.ph:
; TFNONE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; TFNONE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ define void @test_invar_gep(ptr %dst) #0 {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 100, [[TMP1]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -757,8 +757,7 @@ define void @simple_memset_trip1024(i32 %val, ptr %ptr, i64 %n) #0 {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,7 @@ target triple = "aarch64-unknown-linux-gnu"
define void @test_widen(ptr noalias %a, ptr readnone %b) #1 {
; WIDE-LABEL: @test_widen(
; WIDE-NEXT: entry:
; WIDE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; WIDE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
; WIDE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
; WIDE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; WIDE-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; WIDE: vector.ph:
; WIDE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; WIDE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
Expand Down
3 changes: 1 addition & 2 deletions llvm/test/Transforms/LoopVectorize/if-reduction.ll
Original file line number Diff line number Diff line change
Expand Up @@ -1668,8 +1668,7 @@ define i32 @fcmp_0_sub_select1(ptr noalias %x, i32 %N) nounwind readonly {
; CHECK: [[FOR_HEADER]]:
; CHECK-NEXT: [[ZEXT:%.*]] = zext i32 [[N]] to i64
; CHECK-NEXT: [[TMP0:%.*]] = sub i64 0, [[ZEXT]]
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat confusing (min) iter check here, bumping %indvars.iv.next = sub nuw nsw i64 %indvars.iv, 1 repeatedly starting with %indvars.iv set to zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, might be worth fixing independently.

The simplification is fine for the input I think, BTC is (-1 + (-1 * (zext i32 %N to i64))<nsw>)<nsw>, trip count with info from the dominating loop guard is (-1 * (zext i32 (1 smax %N) to i64))<nsw> which should be u>= 4. https://llvm.godbolt.org/z/1EMWbGb81

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, worth fixing test independently, before or after. Subtracting 1 from 0 on first iteration, and implicitly casting the above negative BTC and trip count to unsigned, defy the claimed nuw.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth leaving behind a FIXME note.

; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 4
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -423,8 +423,7 @@ define void @zext_of_i1_stride(i1 %g, ptr %dst) mustprogress {
; CHECK-NEXT: [[G_64:%.*]] = zext i1 [[G]] to i64
; CHECK-NEXT: [[TMP0:%.*]] = udiv i64 15, [[G_64]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better divide 15 by G_64 after scevcheck'ing below that G is 1 (not 0), than before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep would probably be better for this particular check. There are other SCEV checks that are much more expensive (like wrapping checks), so we would probably need to distinguish between them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, guards should be ordered according to cost and frequency, but in this case a potential division by zero is introduced, unguarded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth leaving behind a FIXME note.

; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[TMP0]], 1
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count of 16 (assuming G = 1) is known to be greater than step of 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the step of 4 is used here, based on the versioned G

; CHECK: vector.scevcheck:
; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i1 [[G]], true
; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
Expand Down
Loading