-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"task_fail" contains duplicates for FK to "task_instance" table #26627
Comments
I tend to believe this is not the intention, and feel free to submit fixes where duplicates can be created. (Can you provide examples? You didn’t explain when duplicates happen.) |
Sorry, I didn't stated it clearly, basically multiple failures of the same task instance produce duplicates in task_fail table. |
Hmm yeah that makes sense. I’m guessing the duplicated entries should be expected, and deleting them on upgrade could be an accident…? Not sure. cc @dstandish |
Yes I think that removing the duplicates is not necessary. |
I'm taking a look at how we use taskfail though... i the effort is not super great, it would be best if we can figure out what is actually the key of this table and enforce it. i know it's used in the but from a purely DB-level referential integrity perspective, there's no issue because it is TF -> TI and not the other way around |
From a “recording the history” perspective, I’d expect repeated TaskFail events be all be kept, and if the UI can only show one, it should have logic to only pull in the last failure for a ti. |
Yeah, sigh... PR to remove the check: #26714 I think in my head the FK ref direction was flipped so in my mind the TF records had to be deduped before adding the key. My mistake. Fortunately, it seems of little consequence. I looked at the code in the task duration and gantt views, the two locations where some processing of taskfail records is done. Gantt actually seems to work correctly if you have retries enabled on your task. But for the task duration view, task fails are only accounted for in the cumulative view (they are ignored in the non-cumulative view), and the cumulative view seems broken because it can go down with increasing time. |
Maybe this belongs to a new issue but I think it stems for the same problem. During the upgrade the duplicates are moved to _airflow_moved__2_3__duplicates__task_fail. However the schema of this table is quite different from the task_fail table Here is the columns from the
And here is the columns from the new (?)
So it's not clear how I could revert this delete if I'd like to since I'm missing data like map_index and run_id. |
Something like this should work:
But you could also just delete the table. The task fail records are only used in the task duration and gantt views. So, the only way you can notice their impact at all is if you're looking at that specific run in the web UI, which you're probably unlikely to do. The failures show up as additional boxes in the gantt view. |
Do not purge duplicates from TaskFail table on db upgrade. It seems that duplicates in this table are expected and there is no sense in purging them, i.e. it was implemented by mistake. Purging duplicates violates integrity of this table. See apache/airflow#26627 for details. Change-Id: I280789879bc69f3e3e30228db2b263a63954739c GitOrigin-RevId: 809668b733ce3113ce0e6de7b6a3189bc11bac4d
Closing this issue; check removed here #26714 |
Apache Airflow version
Other Airflow 2 version
What happened
Airflow 2.3.3.
Task instance failures produce duplicates in "task_fail" table for this constraint (dag_id, task_id, run_id, map_index).
What you think should happen instead
Recently FK constraint between task_fail and task_instance tables was introduced:
#22260
And then there was a change to purge duplicates for this constraint from task_fail table (on db upgrade):
#22769
Removing duplicates on db upgrade to Airflow 2.3+ before establishing FK between tables makes sense, however these duplicates can occur in running Airflow 2.3+ instance (see "How to reproduce").
What is rationale for removing duplicates once on upgrading to Airflow 2.3+ but keeping this possibility to generate duplicates again? Isn’t it going to break foreign key and integrity of these two tables?
How to reproduce
Trigger DAG with task that fails multiple times for different tries, it will produce duplications in "task_fail" table.
Example (two tries with 5 mins interval for retries):
Operating System
Linux
Versions of Apache Airflow Providers
No response
Deployment
Composer
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: