Skip to content

Commit

Permalink
docs(airflow): example query to get datajobs for a dataflow (#11034)
Browse files Browse the repository at this point in the history
  • Loading branch information
eboneil authored Jul 31, 2024
1 parent 27e1130 commit f73149a
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/api/graphql/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ For more information on, please refer to the following links."
- [Querying for Domain of a Dataset](/docs/api/tutorials/domains.md#read-domains)
- [Querying for Glossary Terms of a Dataset](/docs/api/tutorials/terms.md#read-terms)
- [Querying for Deprecation of a dataset](/docs/api/tutorials/deprecation.md#read-deprecation)
- [Querying for all DataJobs that belong to a DataFlow](/docs/lineage/airflow.md#get-all-datajobs-associated-with-a-dataflow)

### Search

Expand Down
28 changes: 28 additions & 0 deletions docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,34 @@ with DAG(
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`


## Get all dataJobs associated with a dataFlow

If you are looking to find all tasks (aka DataJobs) that belong to a specific pipeline (aka DataFlow), you can use the following GraphQL query:

```graphql
query {
dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") {
childJobs: relationships(
input: {
types: ["IsPartOf"],
direction: INCOMING,
start: 0,
count: 100
}
) {
total
relationships {
entity {
... on DataJob {
urn
}
}
}
}
}
}
```

## Emit Lineage Directly

If you can't use the plugin or annotate inlets/outlets, you can also emit lineage using the `DatahubEmitterOperator`.
Expand Down

0 comments on commit f73149a

Please sign in to comment.