Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration][Airflow] Support OL Datasets in manual lineage inputs/outputs #1015

Merged
merged 9 commits into from
Aug 17, 2022

Conversation

conorbev
Copy link
Contributor

Signed-off-by: Conor Beverland [email protected]

Problem

The current support for manual lineage definition requires a user to create an Airflow airflow.lineage.entities.Table ( which is then converted to an OpenLineage Dataset ).

It would be good if users could create OpenLineage Dataset classes directly in their DAGs with no special conversion necessary.

Solution

This extends the current implementation to simply pass through Datasets which are specified in inlets or outlets without modification.

In addition it makes the BashOperatorExtractor behave more similarly to the PythonOperator when source code collection is disabled which allows it to work with the manual lineage collection feature.

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

Checklist

  • [ x] You've signed-off your work
  • [ x] Your pull request title follows our guidelines
  • [x ] Your changes are accompanied by tests (if relevant)
  • [x ] Your change contains a small diff and is self-contained

@codecov-commenter
Copy link

codecov-commenter commented Aug 15, 2022

Codecov Report

Merging #1015 (437b6a6) into main (46efab1) will increase coverage by 0.61%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #1015      +/-   ##
============================================
+ Coverage     84.59%   85.20%   +0.61%     
============================================
  Files            79       57      -22     
  Lines          3537     3143     -394     
  Branches         20        0      -20     
============================================
- Hits           2992     2678     -314     
+ Misses          517      465      -52     
+ Partials         28        0      -28     
Impacted Files Coverage Δ
...w/openlineage/airflow/extractors/bash_extractor.py 100.00% <100.00%> (ø)
.../airflow/openlineage/airflow/extractors/manager.py 97.18% <100.00%> (+0.40%) ⬆️
...ntegration/dagster/openlineage/dagster/__init__.py
.../openlineage/client/transports/KafkaTransport.java
.../io/openlineage/client/OpenLineageClientUtils.java
integration/dagster/openlineage/dagster/sensor.py
integration/dagster/openlineage/dagster/utils.py
...penlineage/client/transports/TransportFactory.java
.../java/io/openlineage/client/OpenLineageClient.java
...va/io/openlineage/client/transports/Transport.java
... and 14 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@JDarDagran JDarDagran merged commit a8673b1 into OpenLineage:main Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants