Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Airflow] Fixed run format returned by the lineage_parent_id macro #2489

Closed

Conversation

blacklight
Copy link
Contributor

Problem

The macro currently returns a string in the format <namespace>/<name>/<run_id>.

However, in this string:

  • name should be <dag_id>.<task_id>, not a UUID.

  • run_id should be a UUID, not <run_timestamp>.<try_number>.

This is to comply with the OpenLineage conventions used everywhere else.

Closes: #2488

Solution

As stated above, name and run_id should be populated in the expected format.

The change may break back-compatibility in cases where lineage_parent_id is used without keyword-arguments, since the signature has changed from:

def lineage_parent_id(run_id, task: "BaseOperator", task_instance: "TaskInstance"):
  ...

To:

def lineage_parent_id(
  task: "BaseOperator", task_instance: "TaskInstance", run_id: typing.Optional[str] = None
):
  ...

That makes the signature of lineage_parent_id consistent with that of lineage_run_id.

run_id is now an optional parameter - it defaults to the one calculated via lineage_run_id for the given instance, and it can optionally be overridden.

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

One-line summary:

Fixed run format returned by the lineage_parent_id Airflow macro.

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (if necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

@boring-cyborg boring-cyborg bot added the area:integration/airflow openlineage-airflow label Mar 4, 2024
Copy link

boring-cyborg bot commented Mar 4, 2024

Thanks for opening your first OpenLineage pull request! We appreciate your contribution. If you haven't already, please make sure you've reviewed our guide for new contributors (https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md).

Copy link
Contributor

@kacpermuda kacpermuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if the CI goes through i think we can merge that @blacklight

@mobuchowski
Copy link
Member

@blacklight still want to contribute that? I think this

PLUGIN_MACRO = "{{ macros.OpenLineagePlugin.lineage_parent_id(run_id, task, task_instance) }}"
breaks tests.

@blacklight
Copy link
Contributor Author

blacklight commented Apr 4, 2024

@mobuchowski yes I'm still planning to merge it - I just didn't have much bandwidth in these days to look into the failing tests.

I'll try and get the tests to pass today.

@mobuchowski
Copy link
Member

@blacklight yes.

@mobuchowski
Copy link
Member

blacklight added a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
…ents.

Some pieces of documentation and tests still referenced the previous
arguments for the `lineage_run_id` and `lineage_parent_id` macros.

Signed-off-by: Fabio Manganiello <[email protected]>
@boring-cyborg boring-cyborg bot added the area:documentation Improvements or additions to documentation label Apr 4, 2024
blacklight added a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
…ents.

Some pieces of documentation and tests still referenced the previous
arguments for the `lineage_run_id` and `lineage_parent_id` macros.

Signed-off-by: Fabio Manganiello <[email protected]>
blacklight and others added 16 commits April 4, 2024 22:52
…arent_id`.

- Returned format: `<namespace>/<name>/<run_id>`.

- `name` should be `<dag_id>.<task_id>`, not a UUID.

- `run_id` should be a UUID, not `<run_timestamp>.<try_number>`.

Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
- Both `lineage_run_id` and `lineage_parent_id` should expose the same
  interface - only a `TaskInstance` object is now required as argument.

- Import `_DAG_NAMESPACE` instead of inferring it again.

Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…in listener.on_task_instance_running (OpenLineage#2492)

Signed-off-by: Kacper Muda <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Kacper Muda <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Zhenqiu Huang <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…efactored the existing loadOpenLineageYaml(InputStream) method (OpenLineage#2490)

* Added a loadOpenLineageJson(InputStream) method, and refactored the loadOpenLineageYaml(InputStream) method
* Improved the error handling for loadOpenLineageYaml(ConfigPathProvider)
* Explicitly state the exceptions, despite being unchecked exceptions

Signed-off-by: Damien Hawes <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…penLineage#2478)

* Add the MSK IAM transport to support AWS MSK cluster instances without additional custom ones

Signed-off-by: Mattia Bertorello <[email protected]>

* Remove support to get the default from instance metadata

Signed-off-by: Mattia Bertorello <[email protected]>

* Remove test for instance metadata

Signed-off-by: Mattia Bertorello <[email protected]>

* Use only debug level logs for the transport

Signed-off-by: Mattia Bertorello <[email protected]>

* Remove redundant checks

Signed-off-by: Mattia Bertorello <[email protected]>

---------

Signed-off-by: Mattia Bertorello <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Zhenqiu Huang <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…ge#2502)

Bumps [com.amazonaws:amazon-kinesis-producer](https://github.com/awslabs/amazon-kinesis-producer) from 0.15.9 to 0.15.10.
- [Release notes](https://github.com/awslabs/amazon-kinesis-producer/releases)
- [Changelog](https://github.com/awslabs/amazon-kinesis-producer/blob/master/CHANGELOG.md)
- [Commits](awslabs/amazon-kinesis-producer@v0.15.9...v0.15.10)

---
updated-dependencies:
- dependency-name: com.amazonaws:amazon-kinesis-producer
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Fabio Manganiello <[email protected]>
…OpenLineage#2501)

Bumps plugin.serialization from 1.9.22 to 1.9.23.

---
updated-dependencies:
- dependency-name: plugin.serialization
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Fabio Manganiello <[email protected]>
…OpenLineage#2500)

Bumps plugin.serialization from 1.9.22 to 1.9.23.

---
updated-dependencies:
- dependency-name: plugin.serialization
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Zhenqiu Huang <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
JDarDagran and others added 21 commits April 4, 2024 22:52
Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…ents.

Some pieces of documentation and tests still referenced the previous
arguments for the `lineage_run_id` and `lineage_parent_id` macros.

Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
* Add missing changes to changelog.

---------

Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…in it's release job (OpenLineage#2575)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…t itself (OpenLineage#2576)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
…terfaces (OpenLineage#2577)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
Signed-off-by: merobi-hub <[email protected]>
Signed-off-by: Fabio Manganiello <[email protected]>
@boring-cyborg boring-cyborg bot added area:ci CI area:client/java openlineage-java area:tests Testing code labels Apr 4, 2024
@blacklight
Copy link
Contributor Author

@mobuchowski I have opened a new PR here.

I had conflicting commits pushed both with my work and personal account, then upon git rebase HEAD~n --signoff git happened to eagerly also sign a bunch of merge commits - ops.

I'll close this one and wait for the tests to complete on the other side.

@blacklight blacklight closed this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:ci CI area:client/java openlineage-java area:documentation Improvements or additions to documentation area:integration/airflow openlineage-airflow area:tests Testing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the logic of openlineage.airflow.macros.lineage_parent_id