Skip to content

Commit

Permalink
docs(airflow): add docs on custom operators (datahub-project#7913)
Browse files Browse the repository at this point in the history
Co-authored-by: John Joyce <[email protected]>
  • Loading branch information
2 people authored and tusharm committed Jun 20, 2023
1 parent ff6eb4a commit 1398b58
Showing 1 changed file with 31 additions and 3 deletions.
34 changes: 31 additions & 3 deletions docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,41 @@ lazy_load_plugins = False
### How to validate installation
1. Go and check in Airflow at Admin -> Plugins menu if you can see the Datahub plugin
1. Go and check in Airflow at Admin -> Plugins menu if you can see the DataHub plugin
2. Run an Airflow DAG. In the task logs, you should see Datahub related log messages like:
```
Emitting Datahub ...
Emitting DataHub ...
```
## Using Datahub's Airflow lineage backend (deprecated)
### Emitting lineage via a custom operator to the Airflow Plugin
If you have created a custom Airflow operator [docs](https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html) that inherits from the BaseOperator class,
when overriding the `execute` function, set inlets and outlets via `context['ti'].task.inlets` and `context['ti'].task.outlets`.
The DataHub Airflow plugin will then pick up those inlets and outlets after the task runs.
```python
class DbtOperator(BaseOperator):
...
def execute(self, context):
# do something
inlets, outlets = self._get_lineage()
# inlets/outlets are lists of either datahub_provider.entities.Dataset or datahub_provider.entities.Urn
context['ti'].task.inlets = self.inlets
context['ti'].task.outlets = self.outlets
def _get_lineage(self):
# Do some processing to get inlets/outlets
return inlets, outlets
```
If you override the `pre_execute` and `post_execute` function, ensure they include the `@prepare_lineage` and `@apply_lineage` decorators respectively. [source](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/lineage.html#lineage)
## Using DataHub's Airflow lineage backend (deprecated)

:::caution

Expand Down Expand Up @@ -145,6 +172,7 @@ Take a look at this sample DAG:

In order to use this example, you must first configure the Datahub hook. Like in ingestion, we support a Datahub REST hook and a Kafka-based hook. See step 1 above for details.


## Debugging

### Incorrect URLs
Expand Down

0 comments on commit 1398b58

Please sign in to comment.