Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenLineage] fix: Fix parent id macro and remove unused utils #37877

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

kacpermuda
Copy link
Contributor

Source of this PR is here, thanks @blacklight.

When using lineage_parent_id, a user should receive all the information needed to create ParentRunFacet. Now we have this as lineage_parent_id function:

job_name = OpenLineageAdapter.build_task_instance_run_id(
dag_id=task_instance.dag_id,
task_id=task_instance.task.task_id,
execution_date=task_instance.execution_date,
try_number=task_instance.try_number,
)
return f"{_JOB_NAMESPACE}/{job_name}/{run_id}"

So as a job name, we get the UUID instead of dag_id.task_id. After the change, we will be able to easily create ParentRunFacet:

parent_id = lineage_parent_id(task_instance)
parent_namespace, parent_job_name, parent_run_id = parent_id.split("/")
parent_run_facet = ParentRunFacet.create(parent_run_id, parent_job_name, parent_namespace)

Also, we are extracting the namespace again in macros module, when we have one already extracted in adapter module, and this one is used in events so we should use the same one here.

Important

I am not sure if removing the run_id from the macro is a breaking change. We could probably leave it there, but there is no real use in it i think.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@blacklight
Copy link

LGTM - if this PR extends mine shall I proceed with closing the other one?

@kacpermuda
Copy link
Contributor Author

if this PR extends mine shall I proceed with closing the other one?

This one is for the provider package, and yours is for openlineage-airflow package, so we can proceed with both 😄

@eladkal eladkal requested a review from mobuchowski March 5, 2024 11:12
blacklight added a commit to blacklight/OpenLineage that referenced this pull request Mar 5, 2024
- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Signed-off-by: Fabio Manganiello <[email protected]>
@mobuchowski mobuchowski merged commit 2852976 into apache:main Mar 5, 2024
58 checks passed
@kacpermuda kacpermuda deleted the fix/ol-macros branch March 5, 2024 13:16
blacklight added a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
blacklight added a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
…arent_id`.

- Returned format: `<namespace>/<name>/<run_id>`.

- `name` should be `<dag_id>.<task_id>`, not a UUID.

- `run_id` should be a UUID, not `<run_timestamp>.<try_number>`.

- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Airflow-Reference: apache/airflow#37877
Signed-off-by: Fabio Manganiello <[email protected]>
blacklight added a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
…arent_id`.

- Returned format: `<namespace>/<name>/<run_id>`.

- `name` should be `<dag_id>.<task_id>`, not a UUID.

- `run_id` should be a UUID, not `<run_timestamp>.<try_number>`.

- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Airflow reference: apache/airflow#37877

Signed-off-by: Fabio Manganiello <[email protected]>
blacklight added a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
…arent_id`.

- Returned format: `<namespace>/<name>/<run_id>`.

- `name` should be `<dag_id>.<task_id>`, not a UUID.

- `run_id` should be a UUID, not `<run_timestamp>.<try_number>`.

- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Airflow reference: apache/airflow#37877

Signed-off-by: Fabio Manganiello <[email protected]>
kacpermuda pushed a commit to OpenLineage/OpenLineage that referenced this pull request Apr 5, 2024
#2578)

* [#2488] Fixed format returned by `airflow.macros.lineage_parent_id`.

- Returned format: `<namespace>/<name>/<run_id>`.

- `name` should be `<dag_id>.<task_id>`, not a UUID.

- `run_id` should be a UUID, not `<run_timestamp>.<try_number>`.

- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Airflow reference: apache/airflow#37877

Signed-off-by: Fabio Manganiello <[email protected]>

* Fixed redefinition of `get_unknown_source_attribute_run_facet` introduced upon merge.

Signed-off-by: Fabio Manganiello <[email protected]>

* Addressed #2578 (comment)

Signed-off-by: Fabio Manganiello <[email protected]>

* Fixed failing macro test

Signed-off-by: Fabio Manganiello <[email protected]>

---------

Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
utkarsharma2 pushed a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
fafnirZ pushed a commit to fafnirZ/OpenLineage that referenced this pull request Jul 3, 2024
OpenLineage#2578)

* [OpenLineage#2488] Fixed format returned by `airflow.macros.lineage_parent_id`.

- Returned format: `<namespace>/<name>/<run_id>`.

- `name` should be `<dag_id>.<task_id>`, not a UUID.

- `run_id` should be a UUID, not `<run_timestamp>.<try_number>`.

- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Airflow reference: apache/airflow#37877

Signed-off-by: Fabio Manganiello <[email protected]>

* Fixed redefinition of `get_unknown_source_attribute_run_facet` introduced upon merge.

Signed-off-by: Fabio Manganiello <[email protected]>

* Addressed OpenLineage#2578 (comment)

Signed-off-by: Fabio Manganiello <[email protected]>

* Fixed failing macro test

Signed-off-by: Fabio Manganiello <[email protected]>

---------

Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants