-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern #7842
feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern #7842
Conversation
…tern - added tests, for current behavior next: - remove pattern check and update tests
@@ -228,36 +228,32 @@ def _populate_table_lineage(self): | |||
def get_table_upstream_workunits(self, discovered_tables): | |||
if self.config.include_table_lineage: | |||
for dataset_name in discovered_tables: | |||
if self._is_dataset_pattern_allowed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern check is not required, as its already present on discovered_tables here.
|
||
def get_view_upstream_workunits(self, discovered_views): | ||
if self.config.include_view_lineage: | ||
for view_name in discovered_views: | ||
if self._is_dataset_pattern_allowed(view_name, "view"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern check is not required, as its already present on discovered_views here.
@@ -285,6 +285,30 @@ def default_query_results(query): # noqa: C901 | |||
), | |||
} | |||
for op_idx in range(1, NUM_OPS + 1) | |||
] + [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is in in mocked query result for legacy lineage method.
@@ -307,7 +331,11 @@ def default_query_results(query): # noqa: C901 | |||
{ | |||
"upstream_object_name": "TEST_DB.TEST_SCHEMA.VIEW_1", | |||
"upstream_object_domain": "VIEW", | |||
} | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is in in mocked query result for new optimised lineage method.
] | ||
+ ( # This additional upstream is only for TABLE_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is in in mocked query result for new optimised lineage method, table lineage only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support customers making multiple ingestion pipelines for the same source? I thought our solution here was to combine into one. However, if this is something we want to support, then this looks good to me
The solution of combining recipes into one is the recommended solution, hence this config is disabled by default. The config option is only if for some reason, recipes per database need to be kept separate. |
Since the upstreams are not minted automatically, this change does not create ghost entities. Also, UI currently hides the non-minted upstreams.
If separate recipes are used to ingest from different snowflake databases in same snowflake account, one can set the below config to emit all upstreams :
validate_upstreams_against_patterns: false
However Having single recipe for a snowflake account remains the first recommended solution.
Also added tests for snowflake legacy lineage (default lineage method as of now.)
Checklist