Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Impact Analysis): Support impact analysis to check all downstreams of given entity #4322

Merged
merged 49 commits into from
Mar 5, 2022

Conversation

dexter-mh-lee
Copy link
Contributor

@dexter-mh-lee dexter-mh-lee commented Mar 4, 2022

Support impact analysis!!

To make this happen, we added the following functionalities:

  1. Standardize how we fetch lineage.

Traditionally, we enable relationships via the Relationship annotation in the PDL definition. This defines source - type - destination edge. However, some of these edges show up as an edge on the lineage view and some do not. Making things more complicated, for some, the direction of edge flips in the lineage graph i.e. Consumes vs Produces edge.
We added a lineage registry to keep track of all of this information in the Relationship annotation. You can add "isLineage" to denote that this edge is a lineage edge.

  1. Add multi-hop capability on elasticsearch based graph service

Added BFS logic to elasticsearch graph service to support the impact analysis feature. We have some timeouts in place to make sure we don't traverse indefinitely for extremely large graphs.
This feature is not yet available for neo4j and dgraph based implementations. Only single hops are supported for these backends, and the impact analysis button will not show up if the underlying implementation is not elasticsearch.

  1. Add LineageSearchService, which first fetches the downstream of a given entity and adds that as a filter in the search request to support search across the downstream entities (i.e. query, filter, and all)

  2. Add capability to download search result as CSV!!!

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@github-actions
Copy link

github-actions bot commented Mar 4, 2022

Unit Test Results (build & test)

  76 files  ±0    76 suites  ±0   25m 58s ⏱️ + 4m 17s
624 tests  - 6  564 ✔️  - 7  59 💤 ±0  1 +1 

For more details on these failures, see this check.

Results for commit 8872640. ± Comparison against base commit 787a7e6.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Mar 4, 2022

Unit Test Results (metadata ingestion)

       5 files         5 suites   44m 14s ⏱️
   347 tests    347 ✔️   0 💤 0
1 579 runs  1 548 ✔️ 31 💤 0

Results for commit 8872640.

♻️ This comment has been updated with latest results.

@shirshanka shirshanka merged commit 18dd5b6 into datahub-project:master Mar 5, 2022
@dexter-mh-lee dexter-mh-lee deleted the impact-analysis branch March 5, 2022 00:10
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants