queue-manager: reorganize into strategies #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I realized that we need flexibility in defining queue strategies, not just in how the worker is designed, but also how the queue strategy handles the schedule function. This is an overhaul (not quite done yet) that does that. I stil need to plug the final query back in to move provisional to the worker queue. Also note that it looks like we have priority, pending, and other insert params to play with. And since things get lost in slack, here is a visual of the design:
The queue strategy I'm starting with is FCFS with backfill, which is (sort of) what Kubernetes can do, assuming it would schedule groups without clogging (allowing smaller groups that can be scheduled to fill in). This work is almost done - I need to finish the query to select the provisional pods that have groups at quorum, and then add them to the worker queues. I've already tested this step - once a group hits the worker queue, at least for this strategy, that is where we call "AskFlux" to do an allocation. It's FCFS with backfill because that allocation request can be denied if resources aren't ready, the job will go back into the queue, and the next group will be retried.
The events (subscriptions) are also working, and by updating args with the node assignment this is how we will send the signal back to the scheduler, and then call the binding. I haven't yet removed the original fluence in tree design, but that is happening slowly, and when the functionality is fully working here, I will remove it entirely in favor of that. I will need to think about how to properly handle current in tree plugins, because two different scheduling strategies doesn't make sense. My hope is that I can move the functionality of current (essential) in tree plugins to work in our new framework, whatever that might look like. 👀
Note that this branch goes into another branch that doesn't have a PR open yet.
Needs before merge here