-
-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding the ability to prioritize tasks #519
base: master
Are you sure you want to change the base?
Conversation
Thanks for updating the PR 👍. Will try and find some time to have a look at it soon! |
Did a quick look through. I can see most lines come from the opt-in toggle 😅. Good job realigning the PR with
Another thing. It would be great to see the numbers for how polling performs, specifically for postgres, how many buffers are read to satisfy the query, with and without priority order. A test like:
Run the due-query in postgres with (analyze on, buffers on). Possibly with index-variations also (priority,execution_time) or (execution_time,priority). Not sure if this is something you or someone else are up to, otherwise I need to run it through myself. |
Thanks for looking into it! In near days I will update the PR according to your suggestions, but got a question regarding the
So the schema allows null values, but maybe make it just clearer that in this upgrade note we are talking about existing values? About that testing of changes, didn't do int earlier, but I'm happy to try. If any problems occur for me, I will let you know. Overall, it's a good exercise diving into this code, helped recently with understanding exactly how to improve performance 😄 |
Ah, I didn't check all schemas, just assumed after I think I saw one. Checked now, and looks like mssql has |
...heduler/src/test/java/com/github/kagkarlsson/scheduler/functional/PriorityExecutionTest.java
Outdated
Show resolved
Hide resolved
...heduler/src/test/java/com/github/kagkarlsson/scheduler/functional/PriorityExecutionTest.java
Outdated
Show resolved
Hide resolved
I think I read a discussion about the index usage with prioritization somewhere on this repo, but I can't find it. In case it's still relevant: Have you considered splitting the "due task detection" from the picking step? I.e. something like this: UPDATE scheduled_tasks
SET due = TRUE
WHERE NOT due
AND NOT picked
AND execution_time < NOW();
SELECT * FROM schedules_tasks
WHERE due
AND NOT picked
ORDER BY priority, execution_time; Both queries could be optimized (even for arbitrary priority cardinality) using indices over Also, this PR uses descending priorities. Older versions of MySQL/MariaDB don't support direction on index columns, which would prevent them from using the index when sorting by |
I made some tests @kagkarlsson for the 10M executions due, wasn't certain what did you mean with "not due" executions, so happy to add them later 😄 TL;DR I conducted tests on:
Scheduler was configured with lock-and-fetch:
I have scheduled 10M executions that were due, with random priority And tested two types of indexes:
ResultsResults for scheduler without prioritization
Results for prioritization with index
Results for prioritization with index
Query plans - explain (analyze, buffers)EXPLAIN (ANALYZE, BUFFERS)
SELECT task_name, task_instance
FROM scheduled_tasks WHERE picked = false and execution_time <= now()
ORDER BY priority desc, execution_time ASC FOR UPDATE SKIP LOCKED
LIMIT 100 Query plan without index on priority
Query plan with index
Query plan with index
Query plan when prioritization is disabled (
|
Sorry I haven't followed up earlier. Good job on the testing! Excellent with a full test using concurrent schedulers and detailed statistics 👏.
To make the testing more realistic we have to assume that there are a large number of executions which are not due (also high priority executions that are not due yet). Index (priority,execution_time) will be better when most executions are due, i.e. execution-time have passed. My assumption would be that the realistic scenario would be that there are more future executions than due. If there are throughput problems, there will however eventually be a significant amount of due executions, which is the scenario where priority is useful. So for the testing, I think we should add at least the same amount of future executions to the table as there are due (maybe even a factor higher). |
@GeorgEchterling not really. That would require an additional update and roundtrip to the database 🤔 (on the other hand, the performance will likely be more predictable) |
Thanks for your explanation @kagkarlsson. I ran the tests again, started the same instances and filled the scheduler with 1M due tasks (random -60 minutes) and 10M in the future (random +60 minutes), with random priorities from 1 to 10.
ResultsResults for scheduler without prioritization
Results for prioritization with index
Results for prioritization with index
Query plansQuery plan with index
Query plan with index
|
Basically, I would say that the usage of prioritization depends on the usage scenario of the scheduler. I got an instance of scheduler where there are millions of recurring tasks with a persistent schedule and a few millions of one-time tasks added with execution time But in instances where we operate only on one-time tasks that are always added with execution time |
@kagkarlsson What would we do next with this PR? :) |
I did some testing on my own and I think we probably need to add both indexes (or at least supply them). (i.e both (pririty,execution_time) and (execution_time,priority) With some luck I can review your changes (and possibly contribute some) next week 🤞 |
How do you feel about |
Thanks for looking into this :) I'm good with changing the name to |
…is currently the same for all databases
…curring, and MEDIUM for the rest)
…now have a TaskInstance.Builder.
I pushed some changes, addressing One big question I have is: Should a high int-value for priority mean higher priority or the reverse? 🤯 |
The changes are good for me @kagkarlsson, anything else I could help with? |
I think it is very close. I started a refactoring that I feel is a bit unfinished still. Slightly unrelated 😬 See 26e0492 e.g. ( scheduler.schedule(
MY_TASK
.instanceWithId("1045")
.data(new MyTaskData(1001L))
.scheduledTo(Instant.now().plusSeconds(5))); i.e. stop using And also, if it makes sense, deprecated/reduce use of Update: I think I have completed this refactoring now. |
How do we feel about these "defaults"? Users are free to use what suits them, as longs a it fits in the column.. public class Priority {
public static final int HIGH = 90;
public static final int MEDIUM = 50;
public static final int LOW = 10;
} |
…eprecating TaskWithDataDescriptor and TaskWithoutDataDescriptor. Updated examples.
Changes for Those predefined priorities are great for default usage. Also, there are good description how value of field translates into priority. For my part, it's important to have an ability to dynamically set value of priority. |
Hopefully I will get some time on Friday to go through this PR one last time. If everything checks out, I will try to release it. You are happy with the current version of the PR @simeq ? |
I confirm my happiness with the current version @kagkarlsson :) |
…o upgrade without modifying the schema)
I am quickly going to check how hard it is to avoid touching |
I am adding some missing tests for result ordering for the different cases |
On one hand, I think it's a good idea to be able to upgrade db-scheduler without database changes. On another, there would need to be explicit instructions on how to ALTER tables to work with priority, and should we have |
We will still update all the schema files with column |
(I am doing this as a service to long-time users that do not need priority. This way they can simply bump major and keep going.) |
…eduledExecutions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good now 👍
Thanks for the changes and comment it's clear for me now, and looks good @kagkarlsson 🙂 |
Will release soon! |
This change introduces the possibility of using a task prioritization mechanism. This mechanism is disabled by default, and can be enabled using the
enablePrioritization()
method.Reminders
mvn spotless:apply
cc @kagkarlsson open to any suggestions and comments regarding tests that would be worth adding