-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-2516] Fix mysql deadlocks #6988
[AIRFLOW-2516] Fix mysql deadlocks #6988
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6988 +/- ##
=========================================
Coverage ? 85.03%
=========================================
Files ? 707
Lines ? 39361
Branches ? 0
=========================================
Hits ? 33472
Misses ? 5889
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we confirm with any MySQL user facing this issue that it solved the issue for them?
Not yet @kaxil -> that's why it's still Draft . But I provided the users with patched versions of jobs.py/scheduled_job.py for 1.9, 1.10.6, 1.10.3 and asked them to test it. See https://issues.apache.org/jira/browse/AIRFLOW-2516?focusedCommentId=17006364&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17006364 and https://issues.apache.org/jira/browse/AIRFLOW-4498?focusedCommentId=17006370&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17006370 They tested it before for 1.9 and 1.10.6 so I expect they will come back after NewYear's holidays. |
And I keep my fingers crossed that it's going to help 🤞 |
🤞 |
Deadlocks were occuring in mysql when task_instance was modified by two queries at the same time. One query used state as selection criteria and updated it in the same query where second query just updated the state for the same table. The first query locked state index first and primary index afterwards, the second query locked primary index first and state afterwards - leading to deadlocks. This change splits the first query into two independent ones. First query makes select FOR UPDATE and selects all the task instances to act on (this will lock primary index only) and second updates all affected task instances. Note that performance impact for that is neglectable because this query is only run once every scheduler loop and the second part of it (looping through task instances) will only happen in case there are some manually modified DagRun states - so it is only run to correct some wrong states of DagRun. This should happen very infrequently.
8084255
to
706b8cc
Compare
@kaxil @ashb @mik-laj @nuclearpinguin -> It's 10 days without the deadlock for our customer https://issues.apache.org/jira/browse/AIRFLOW-2516 so it looks like the problem is solved. Please approve and i will merge it and cherry-pick to 1.10.8 |
Deadlocks were occuring in mysql when task_instance was modified by two queries at the same time. One query used state as selection criteria and updated it in the same query where second query just updated the state for the same table. The first query locked state index first and primary index afterwards, the second query locked primary index first and state afterwards - leading to deadlocks. This change splits the first query into two independent ones. First query makes select FOR UPDATE and selects all the task instances to act on (this will lock primary index only) and second updates all affected task instances. Note that performance impact for that is neglectable because this query is only run once every scheduler loop and the second part of it (looping through task instances) will only happen in case there are some manually modified DagRun states - so it is only run to correct some wrong states of DagRun. This should happen very infrequently. (cherry picked from commit 1a52182)
Deadlocks were occuring in mysql when task_instance was modified by two queries at the same time. One query used state as selection criteria and updated it in the same query where second query just updated the state for the same table. The first query locked state index first and primary index afterwards, the second query locked primary index first and state afterwards - leading to deadlocks. This change splits the first query into two independent ones. First query makes select FOR UPDATE and selects all the task instances to act on (this will lock primary index only) and second updates all affected task instances. Note that performance impact for that is neglectable because this query is only run once every scheduler loop and the second part of it (looping through task instances) will only happen in case there are some manually modified DagRun states - so it is only run to correct some wrong states of DagRun. This should happen very infrequently. (cherry picked from commit 1a52182)
Deadlocks were occuring in mysql when task_instance was modified by two queries at the same time. One query used state as selection criteria and updated it in the same query where second query just updated the state for the same table. The first query locked state index first and primary index afterwards, the second query locked primary index first and state afterwards - leading to deadlocks. This change splits the first query into two independent ones. First query makes select FOR UPDATE and selects all the task instances to act on (this will lock primary index only) and second updates all affected task instances. Note that performance impact for that is neglectable because this query is only run once every scheduler loop and the second part of it (looping through task instances) will only happen in case there are some manually modified DagRun states - so it is only run to correct some wrong states of DagRun. This should happen very infrequently. (cherry picked from commit 1a52182)
Deadlocks were occuring in mysql when task_instance was modified by two queries at the same time. One query used state as selection criteria and updated it in the same query where second query just updated the state for the same table. The first query locked state index first and primary index afterwards, the second query locked primary index first and state afterwards - leading to deadlocks. This change splits the first query into two independent ones. First query makes select FOR UPDATE and selects all the task instances to act on (this will lock primary index only) and second updates all affected task instances. Note that performance impact for that is neglectable because this query is only run once every scheduler loop and the second part of it (looping through task instances) will only happen in case there are some manually modified DagRun states - so it is only run to correct some wrong states of DagRun. This should happen very infrequently. (cherry picked from commit 1a52182)
Deadlocks were occuring in mysql when task_instance was modified by two queries at the same time. One query used state as selection criteria and updated it in the same query where second query just updated the state for the same table. The first query locked state index first and primary index afterwards, the second query locked primary index first and state afterwards - leading to deadlocks. This change splits the first query into two independent ones. First query makes select FOR UPDATE and selects all the task instances to act on (this will lock primary index only) and second updates all affected task instances. Note that performance impact for that is neglectable because this query is only run once every scheduler loop and the second part of it (looping through task instances) will only happen in case there are some manually modified DagRun states - so it is only run to correct some wrong states of DagRun. This should happen very infrequently. (cherry picked from commit 1a52182)
Deadlocks were occuring in mysql when task_instance was modified by two queries at the same time. One query used state as selection criteria and updated it in the same query where second query just updated the state for the same table. The first query locked state index first and primary index afterwards, the second query locked primary index first and state afterwards - leading to deadlocks. This change splits the first query into two independent ones. First query makes select FOR UPDATE and selects all the task instances to act on (this will lock primary index only) and second updates all affected task instances. Note that performance impact for that is neglectable because this query is only run once every scheduler loop and the second part of it (looping through task instances) will only happen in case there are some manually modified DagRun states - so it is only run to correct some wrong states of DagRun. This should happen very infrequently.
Deadlocks were occuring in mysql when task_instance was modified
by two queries at the same time. One query used state as selection
criteria and updated it in the same query where second query just
updated the state for the same table. The first query locked state
index first and primary index afterwards, the second query locked
primary index first and state afterwards - leading to deadlocks.
This change splits the first query into two independent ones.
First query makes select FOR UPDATE and selects all the task
instances to act on (this will lock primary index only)
and second updates all affected task instances.
Note that performance impact for that is neglectable because this
query is only run once every scheduler loop and the second part
of it (looping through task instances) will only happen in case
there are some manually modified DagRun states - so it is only
run to correct some wrong states of DagRun. This should happen
very infrequently.
Issue link: AIRFLOW-2516
[AIRFLOW-NNNN]
, where AIRFLOW-NNNN = JIRA ID*(*) For document-only changes, no JIRA issue is needed, commit message is
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.