-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/lookml): support views with derived_table
.explore_source
#7704
feat(ingest/lookml): support views with derived_table
.explore_source
#7704
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, except schema field urn generated for fine grained lineage.
@@ -771,6 +772,11 @@ def _get_fields( | |||
matched_field.replace('"', "").replace("`", "").lower() | |||
) | |||
upstream_fields.append(matched_field) | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any known facts that contradict this assumption ? Like - is it possible that maybe sql is missing due to missing permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or that, the field is named differently than the upstream column name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's an alias
option in lookml, but we don't support that one yet https://cloud.google.com/looker/docs/reference/param-field-alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is getting pretty long, was a bit hard to follow. But def doesn't have to be cleaned up here
fields, | ||
use_external_process=process_isolation_for_sql_parsing, | ||
) | ||
# Derived tables can either be a SQL query or a LookML explore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, adding link to this looker doc would be helpful here - https://cloud.google.com/looker/docs/derived-tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
# We want this to render the full lkml block | ||
# e.g. explore_source: source_name { ... } | ||
# As such, we use the full derived_table instead of the explore_source. | ||
view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting length by max_file_snippet_length
is new change and is okay. I wonder if we also required it when setting view_logic for SQL derived table's sql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a new change - we already do it above in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not wrong, earlier view_logic for derived table was simply this:
view_logic = str(derived_table["explore_source"])
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py#L871
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep and that logic still remains, we only special case it for derived_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. We do not special case it for SQL derived tables, that was the comment, but definitely not directly related to this PR :)
upstream_dataset_urn = LookerExplore( | ||
name=upstream_explore, model_name=looker_view.id.model_name | ||
).get_explore_urn(self.source_config) | ||
upstream_dataset_urns.append(upstream_dataset_urn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice !
So we will now get table level and column level lineage for native derived tables, i.e. below edge ?
looker explore(upstream) -> looker view (derived)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
{ | ||
"upstreamType": "FIELD_SET", | ||
"upstreams": [ | ||
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,data.explore.my_view_explore,PROD),my_view_explore.country)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, the correct schema field urns would be
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,data.explore.my_view_explore,PROD),country)"
Probably need to strip the starting explore name from column name ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this is correct - explore fields have the explore name as part of the schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, You mean this -
https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/schema/SchemaMetadataKey.pdl#L17
I wasn't aware that is is considered when generating schema field urns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, although this is only applicable for looker explores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. Curious, if we have tested this on UI, that column lineage edge shows up correctly, or is if any change required there. Otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I tested it locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't really follow the changes but trusting this is safe enough
@@ -771,6 +772,11 @@ def _get_fields( | |||
matched_field.replace('"', "").replace("`", "").lower() | |||
) | |||
upstream_fields.append(matched_field) | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is getting pretty long, was a bit hard to follow. But def doesn't have to be cleaned up here
Also includes some minor refactoring.
Checklist