-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Hierarchical Cohorts in Cache/Snapshot #3006
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/assign @mimowo |
/cc |
pkg/cache/algorithm.go
Outdated
// getRoot returns the root of the Cohort tree IFF there was not a | ||
// cycle. | ||
func getRoot(c *cohort) (*cohort, bool) { | ||
checker := cycleChecker{cycles: make(map[string]bool, 8)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarify in a comment that 8 is a reasonable upper-bound for hierarchy height
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted in this revision.
pkg/cache/algorithm.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hierarchy.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted in this revision. We now have cycle.go
in hierarchy
package
pkg/cache/algorithm.go
Outdated
return getRootHelper(c), true | ||
} | ||
|
||
func getRootHelper(c *cohort) *cohort { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getRootHelper(c *cohort) *cohort { | |
func getRoot(c *cohort) *cohort { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the scheduling cycle to call this function directly, as opposed to check for cycles again.
pkg/cache/algorithm.go
Outdated
|
||
// getRoot returns the root of the Cohort tree IFF there was not a | ||
// cycle. | ||
func getRoot(c *cohort) (*cohort, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getRoot(c *cohort) (*cohort, bool) { | |
func getRootUnsafe(c *cohort) (*cohort, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the other function be called unsafe, as it will enter an infinite loop if called with cycle?
pkg/cache/algorithm.go
Outdated
if c.cycles[cohort.GetName()] { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if c.cycles[cohort.GetName()] { | |
return true | |
} | |
if cycle, seen := c.cycles[cohort.GetName()]; seen { | |
return cycle | |
} |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better, thanks :)
pkg/cache/snapshot.go
Outdated
@@ -84,12 +84,19 @@ func (c *Cache) Snapshot() Snapshot { | |||
ResourceFlavors: make(map[kueue.ResourceFlavorReference]*kueue.ResourceFlavor, len(c.resourceFlavors)), | |||
InactiveClusterQueueSets: sets.New[string](), | |||
} | |||
cycleChecker := cycleChecker{cycles: make(map[string]bool, len(c.hm.Cohorts))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to check at this point?
I think the information should be stored in the cache somehow. We need to communicate it to the CQ and cohort statuses anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm storing this structure in the hierarchy.Manager (and thus cache) now, but it will still be called again during snapshotting - using the memoized result.
2578618
to
f7c6a41
Compare
pkg/cache/cache_test.go
Outdated
_ = cache.AddCohort(cohort) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ = cache.AddCohort(cohort) | |
return nil | |
return cache.AddCohort(cohort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/cache/cache_test.go
Outdated
t.Fatal("Expected success") | ||
} | ||
|
||
t.Run("before move", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use t.Run
for non-independent tests cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/retest |
@@ -104,7 +104,9 @@ func (r *CohortReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr | |||
} | |||
|
|||
log.V(2).Info("Cohort is being created or updated", "resources", cohort.Spec.ResourceGroups) | |||
r.cache.AddOrUpdateCohort(&cohort) | |||
if err := r.cache.AddOrUpdateCohort(&cohort); err != nil { | |||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this cause a requeue into the reconciler?
Maybe we should log the issue, but then don't return the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: c3d67eb01af97818021dc8b9379cf9317d75678c
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, gabesaba The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Only attaching release note to a single PR, where we define the API #2693 /release-note-edit
|
What type of PR is this?
/kind feature
What this PR does / why we need it:
Support HierarchicalCohorts (#79), with cycle-detection, in the cache and snapshot.
Special notes for your reviewer:
I wrote tests which check resource usage. I will add them to this PR once #3005 merges.
Does this PR introduce a user-facing change?