Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-62: Add HookLineageCollector #40335

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

JDarDagran
Copy link
Contributor

Add HookLineageCollector that during task execution should register and hold lineage sent from hooks.
Add HookLineageReader that defines whether HookLineageCollector should be enabled to process lineage sent from hooks.
Add Dataset factories to make sure Datasets registered with HookLineageCollector is AIP-60 compliant.

Closes: #38766


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Contributor

@mobuchowski mobuchowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, however, I'd add some more "practical tests" that show how mechanism works.

Would also love to see better documentation, though I think it can just link to something that would be populated later in provider docs in followup PR.

airflow/providers_manager.py Show resolved Hide resolved
airflow/lineage/hook.py Outdated Show resolved Hide resolved
airflow/lineage/hook.py Outdated Show resolved Hide resolved
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch from 1293a07 to 4e81003 Compare July 7, 2024 22:44
@JDarDagran JDarDagran requested a review from potiuk as a code owner July 7, 2024 22:44
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch 2 times, most recently from ce0994c to 75ec801 Compare July 7, 2024 23:44
@mobuchowski mobuchowski requested a review from uranusjr July 8, 2024 16:35
@mobuchowski
Copy link
Contributor

@uranusjr want to take another look?

@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch 2 times, most recently from 585ec21 to f6af11a Compare July 9, 2024 15:16
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch 2 times, most recently from b651b1a to 6c9e124 Compare July 10, 2024 14:08
@mobuchowski mobuchowski added the AIP-62 Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentation label Jul 11, 2024
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch 3 times, most recently from 0380e68 to 5f6c253 Compare July 15, 2024 12:19
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch 2 times, most recently from 1f54e98 to 53f6a97 Compare July 15, 2024 13:55
airflow/lineage/hook.py Outdated Show resolved Hide resolved
airflow/lineage/hook.py Outdated Show resolved Hide resolved
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch from 53f6a97 to adecb55 Compare July 15, 2024 14:13
should register and hold lineage sent from hooks.

Add HookLineageReader that defines whether HookLineageCollector
should be enabled to process lineage sent from hooks.

Add Dataset factories to make sure Datasets registered with
HookLineageCollector is AIP-60 compliant.

Signed-off-by: Jakub Dardzinski <[email protected]>
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch from adecb55 to 89fed8e Compare July 15, 2024 21:28
Add section in experimental lineage docs.

Signed-off-by: Jakub Dardzinski <[email protected]>
@JDarDagran JDarDagran force-pushed the aip-62/hook-lineage-collector branch from 89fed8e to 351d2bc Compare July 15, 2024 22:26
@mobuchowski mobuchowski merged commit cd68840 into apache:main Jul 15, 2024
48 checks passed
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Jul 22, 2024
@ephraimbuddy ephraimbuddy added this to the Airflow 2.10.0 milestone Jul 23, 2024
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) and removed type:new-feature Changelog: New Features labels Jul 24, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
* Add HookLineageCollector that during task execution
should register and hold lineage sent from hooks.

Add HookLineageReader that defines whether HookLineageCollector
should be enabled to process lineage sent from hooks.

Add Dataset factories to make sure Datasets registered with
HookLineageCollector is AIP-60 compliant.

Signed-off-by: Jakub Dardzinski <[email protected]>

* Remove default `create_dataset` method.

Add section in experimental lineage docs.

Signed-off-by: Jakub Dardzinski <[email protected]>

---------

Signed-off-by: Jakub Dardzinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-62 Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentation area:lineage changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement HookLineageCollector for collection of Hook-generated datasets
5 participants