Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs) Add feature guide for Manual Lineage #6933

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,10 @@ module.exports = {
"docs/glossary/business-glossary",
"docs/tags",
{
Lineage: ["docs/lineage/intro", "docs/lineage/sample_code"],
Lineage: [
"docs/lineage/intro",
"docs/lineage/sample_code",
],
},
],

Expand Down Expand Up @@ -478,6 +481,7 @@ module.exports = {
"docs/features/dataset-usage-and-query-history",
"docs/posts",
"docs/sync-status",
"docs/lineage/lineage-feature-guide",
// "docs/wip/ui-ingestion-guide", -- not needed
// "docs/wip/personal-access-tokens-guide", -- not needed

Expand Down
1 change: 1 addition & 0 deletions docs/authorization/policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ We currently support the following:
| Edit Domain | Allow actor to edit the Domain of an entity. |
| Edit Deprecation | Allow actor to edit the Deprecation status of an entity. |
| Edit Assertions | Allow actor to add and remove assertions from an entity. |
| Edit Lineage | Allow actor to add and remove upstream and downstream lineage edges. |
| Edit All | Allow actor to edit any information about an entity. Super user privileges. |

**Specific entity-level privileges** that are not generalizable.
Expand Down
2 changes: 1 addition & 1 deletion docs/domains.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ key to be human-readable. Proceed with caution: once you select a custom id, it
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/datahub/master/docs/imgs/set-domain-id.png"/>
</p>

By default, you don't need to worry about this. DataHub will auto-generate an unique Domain id for you.
By default, you don't need to worry about this. DataHub will auto-generate a unique Domain id for you.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


Once you've chosen a name and a description, click 'Create' to create the new Domain.

Expand Down
129 changes: 129 additions & 0 deletions docs/lineage/lineage-feature-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# About DataHub Lineage

<FeatureAvailability/>

In DataHub, lineage is how describe the way that data flows within and between your source systems. For a given entity, lineage allows you to see where this data is coming from (what's upstream) and where it's going (what's downstream). For more information, see [this video](https://www.youtube.com/watch?v=rONGpsndzRw&ab_channel=DataHub) for Lineage 101 in DataHub.
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative intro:

Lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.

Lineage can be useful for proactive change management, e.g. when a data producer is interested in notifying data consumers before a significant change is made to a data asset. It can also be valuable in reactive cases, e.g. for quickly understanding the upstream dependencies and origin of a data asset when an unexpected problem arises.

If you're using an ingestion source that supports extraction of Lineage (e.g. the "Table Lineage Capability"), then lineage information can be extracted automatically. For detailed instructions, refer to the source documentation for the source you are using.

If you are not using a Lineage-support ingestion source, you can also manage lineage connections by hand inside the DataHub web application. The remainder of this guide will focus on managing Lineage as done within DataHub directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is real nice


Starting in version `0.9.5`, DataHub supports the manual editing of lineage between entities. Data experts are free to add or remove upstream and downstream lineage edges in both the Lineage Visualization screen as well as the Lineage tab on entity pages. Use this feature to supplement automatic lineage extraction in ingestion or establish important entity relationships in sources that we don't support automatic extraction for yet! Editing lineage by hand is only supported for Datasets, Charts, Dashboards, and Data Jobs for now.

:::note

Lineage added by hand and programmatically may conflict with one another to cause unwanted overwrites. It is strongly recommend that lineage is edited manually in cases where lineage information is not also extracted in automated fashion, e.g. by running an ingestion source.

:::

## Manual Lineage Setup, Prerequisites, and Permissions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we'll also want to generalize this. Maybe we just call this "Lineage Setup, Prerequisite..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup! i'll call it that here


To edit lineage for an entity, you'll need the following [Metadata Privilege](../authorization/policies.md):

* **Edit Lineage** metadata privilege to edit lineage at the entity level

It is important to know that the **Edit Lineage** privilege is required for all entities whose lineage is affected by the changes. For example, in order to add "Dataset B" as an upstream dependency of "Dataset A", you'll need the **Edit Lineage** privilege for both Dataset A and Dataset B.

## Using Manual Lineage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Lineage?

Now that we've generalized... Maybe above in the introduction section we can say that this is limited in scope to editing lineage by hand from the DataHub web application, and mention that most commonly lineage will be extracted automatically by DataHub ingestion sources. Otherwise this change in context may be confusing to the reader


### Editing from Lineage Graph View

The first place that you can edit lineage for entities is from the Lineage Visualization screen. Click on the "Lineage" button on the top right of an entity's profile to get to this view.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/lineage-viz-button.png"/>
</p>

Once you find the entity that you want to edit the lineage of, click on the three-dot menu dropdown to select whether you want to edit lineage in the upstream direction or the downstream direction.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/edit-lineage-menu.png"/>
</p>

If you want to edit upstream lineage for entities downstream of the center node or downstream lineage for entities upstream of the center node, you can simply re-center to focus on the node you want to edit. Once focused on the desired node, you can edit lineage in either direction.
jjoyce0510 marked this conversation as resolved.
Show resolved Hide resolved

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/focus-to-edit.png"/>
</p>

#### Adding Lineage Edges

Once you click "Edit Upstream" or "Edit Downstream" a modal will open that allows you to manage lineage for the selected entity in the chosen direction. In order to add a lineage edge to a new entity, search for it by name in the provided search bar and select it. Once you're satisfied with everything you've added, click "Save Changes." If you change your mind you can always cancel or exit without saving the changes you've made.
chriscollins3456 marked this conversation as resolved.
Show resolved Hide resolved

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/add-upstream.png"/>
</p>

#### Removing Lineage Edges

From the same modal that you add new lineage edges, you can remove them as well. Find the edge(s) that you want to remove, and click the "X" on the right side of it. And just like adding, you need to click "Save Changes" to save and if you exit without saving, your changes won't be applied.
chriscollins3456 marked this conversation as resolved.
Show resolved Hide resolved

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/remove-lineage-edge.png"/>
</p>

#### Reviewing Changes

Any time lineage is edited manually, we keep track of who made the change and when they made it. You can see this information in the modal where you add and remove edges. If an edge was added manually, a user avatar will be in line with the edge that was added. You can hover over this avatar in order to see who added it and when.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/lineage-edge-audit-stamp.png"/>
</p>

### Editing from Lineage Tab

The other place that you can edit lineage for entities is from the Lineage Tab on an entity's profile. Click on the "Lineage" tab in an entity's profile and then find the "Edit" dropdown that allows you to edit upstream or downstream lineage for the given entity.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/edit-from-lineage-tab.png"/>
</p>

Using the modal from this view will work the same as described above for editing from the Lineage Visualization screen.

## Additional Resources

### Videos

**DataHub Basics: Lineage 101**

<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/rONGpsndzRw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</p>

**DataHub November Town Hall - Including Manual Lineage Demo**

<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/BlCLhG8lGoY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>

### GraphQL

* [updateLineage](../../graphql/mutations.md#updatelineage)
* [searchAcrossLineage](../../graphql/queries.md#searchacrosslineage)

#### Examples

**Updating Lineage**

```graphql
mutation updateLineage {
updateLineage(input: {
edgesToAdd: [
{
downstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
upstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:datahub,Dataset,PROD)"
}
],
edgesToRemove: [
{
downstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
upstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)"
}
]
})
}
```

*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*

### Related Features

* [Lineage](./intro.md)