-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Speed up clear_task_instances by doing a single sql delete for TaskRe…
…schedule (#14048) Clearing large number of tasks takes a long time. Most of the time is spent at this line in clear_task_instances (more than 95% time). This slowness sometimes causes the webserver to timeout because the web_server_worker_timeout is hit. ``` # Clear all reschedules related to the ti to clear session.query(TR).filter( TR.dag_id == ti.dag_id, TR.task_id == ti.task_id, TR.execution_date == ti.execution_date, TR.try_number == ti.try_number, ).delete() ``` This line was very slow because it's deleting TaskReschedule rows in a for loop one by one. This PR simply changes this code to delete TaskReschedule in a single sql query with a bunch of OR conditions. It's effectively doing the same, but now it's much faster. Some profiling showed great speed improvement (something like 40 to 50 times faster) compared to the first iteration. So the overall performance should now be 300 times faster than the original for loop deletion. (cherry picked from commit 9036ce2)
- Loading branch information
Showing
2 changed files
with
77 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters