Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write backfill script for Airflow parent runs #1944

Closed
collado-mike opened this issue Apr 11, 2022 · 1 comment
Closed

Write backfill script for Airflow parent runs #1944

collado-mike opened this issue Apr 11, 2022 · 1 comment
Assignees
Milestone

Comments

@collado-mike
Copy link
Collaborator

As described in OpenLineage/OpenLineage#664 , the OpenLineage Airflow implementation has always sent back the ParentRunFacet with less than correct information in the runId and job name fields. It's possible to backfill this information in existing databases, which is necessary for a complete and correct implementation of #1928 .

The approach would be to split the parent job name on the . character to determine the DAG name and concatenate that with the runId (usually something like scheduled__2022-03-14T01:40:10+00:00) to generate UUID (see UUIDv3 generation in the OpenLineage implementation here). Note that some job names have the DAG followed by the task group, then the task id, all separated by . characters, so we should choose the left-most name of the task id.

@collado-mike collado-mike self-assigned this Apr 11, 2022
@collado-mike collado-mike added this to the 0.22.0 milestone Apr 11, 2022
@conorbev
Copy link
Collaborator

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants