feat(ingest/lookml): support views with `derived_table`.`explore_source` #7704

hsheth2 · 2023-03-28T23:24:25Z

Also includes some minor refactoring.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

mayurinehate

Overall looks good, except schema field urn generated for fine grained lineage.

mayurinehate · 2023-03-31T04:20:09Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

@@ -771,6 +772,11 @@ def _get_fields(
                            matched_field.replace('"', "").replace("`", "").lower()
                        )
                        upstream_fields.append(matched_field)
+                else:


are there any known facts that contradict this assumption ? Like - is it possible that maybe sql is missing due to missing permissions.

Or that, the field is named differently than the upstream column name.

there's an alias option in lookml, but we don't support that one yet https://cloud.google.com/looker/docs/reference/param-field-alias

This function is getting pretty long, was a bit hard to follow. But def doesn't have to be cleaned up here

mayurinehate · 2023-03-31T04:44:02Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

-                fields,
-                use_external_process=process_isolation_for_sql_parsing,
-            )
+            # Derived tables can either be a SQL query or a LookML explore.


I think, adding link to this looker doc would be helpful here - https://cloud.google.com/looker/docs/derived-tables

mayurinehate · 2023-03-31T05:05:01Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+                # We want this to render the full lkml block
+                # e.g. explore_source: source_name { ... }
+                # As such, we use the full derived_table instead of the explore_source.
+                view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length]


Limiting length by max_file_snippet_length is new change and is okay. I wonder if we also required it when setting view_logic for SQL derived table's sql.

not a new change - we already do it above in the code

If I am not wrong, earlier view_logic for derived table was simply this:

view_logic = str(derived_table["explore_source"])
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py#L871

yep and that logic still remains, we only special case it for derived_table

got it. We do not special case it for SQL derived tables, that was the comment, but definitely not directly related to this PR :)

mayurinehate · 2023-03-31T05:26:17Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+            upstream_dataset_urn = LookerExplore(
+                name=upstream_explore, model_name=looker_view.id.model_name
+            ).get_explore_urn(self.source_config)
+            upstream_dataset_urns.append(upstream_dataset_urn)


Nice !
So we will now get table level and column level lineage for native derived tables, i.e. below edge ?
looker explore(upstream) -> looker view (derived)

mayurinehate · 2023-03-31T05:52:42Z

metadata-ingestion/tests/integration/lookml/expected_output.json

+                            {
+                                "upstreamType": "FIELD_SET",
+                                "upstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,data.explore.my_view_explore,PROD),my_view_explore.country)"


I think, the correct schema field urns would be

"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,data.explore.my_view_explore,PROD),country)"

Probably need to strip the starting explore name from column name ?

actually this is correct - explore fields have the explore name as part of the schema

Oh, You mean this -
https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/schema/SchemaMetadataKey.pdl#L17

https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/tests/integration/looker/golden_looker_mces.json#L141

I wasn't aware that is is considered when generating schema field urns.

yup, although this is only applicable for looker explores

Ah, okay. Curious, if we have tested this on UI, that column lineage edge shows up correctly, or is if any change required there. Otherwise looks good.

yes I tested it locally

asikowitz

Didn't really follow the changes but trusting this is safe enough

asikowitz · 2023-04-03T23:15:08Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

@@ -771,6 +772,11 @@ def _get_fields(
                            matched_field.replace('"', "").replace("`", "").lower()
                        )
                        upstream_fields.append(matched_field)
+                else:


This function is getting pretty long, was a bit hard to follow. But def doesn't have to be cleaned up here

hsheth2 added 8 commits March 24, 2023 17:09

add native explore source test

8de7dab

update view logic for lkml code

d9970aa

start refactoring derived_table parsing

5f359a1

Merge branch 'master' into lookml-explore-source

5d47e5d

refactor fine-grained lineage logic

41d71d0

refactor looker view parsing

a3b4b37

finish updating for derived_table explore_source lineage

a104d85

update lineages in other tests

595524e

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Mar 28, 2023

vercel bot deployed to Preview March 28, 2023 23:32 View deployment

This comment was marked as outdated.

Sign in to view

update lint

01828b1

vercel bot deployed to Preview March 28, 2023 23:48 View deployment

mayurinehate requested changes Mar 31, 2023

View reviewed changes

add comment

58d2f7f

vercel bot deployed to Preview March 31, 2023 18:49 View deployment

hsheth2 enabled auto-merge (squash) April 3, 2023 18:01

hsheth2 assigned treff7es and asikowitz and unassigned treff7es Apr 3, 2023

asikowitz approved these changes Apr 3, 2023

View reviewed changes

hsheth2 merged commit f780da4 into datahub-project:master Apr 3, 2023

hsheth2 deleted the lookml-explore-source branch April 3, 2023 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest/lookml): support views with `derived_table`.`explore_source` #7704

feat(ingest/lookml): support views with `derived_table`.`explore_source` #7704

hsheth2 commented Mar 28, 2023

This comment was marked as outdated.

mayurinehate left a comment

mayurinehate Mar 31, 2023

mayurinehate Mar 31, 2023

hsheth2 Mar 31, 2023

asikowitz Apr 3, 2023

mayurinehate Mar 31, 2023

hsheth2 Mar 31, 2023

mayurinehate Mar 31, 2023

hsheth2 Mar 31, 2023

mayurinehate Mar 31, 2023

hsheth2 Mar 31, 2023

mayurinehate Mar 31, 2023

mayurinehate Mar 31, 2023

hsheth2 Mar 31, 2023

mayurinehate Mar 31, 2023

hsheth2 Mar 31, 2023

mayurinehate Mar 31, 2023

hsheth2 Apr 3, 2023

mayurinehate Apr 4, 2023

hsheth2 Apr 4, 2023

asikowitz left a comment

asikowitz Apr 3, 2023

feat(ingest/lookml): support views with derived_table.explore_source #7704

feat(ingest/lookml): support views with derived_table.explore_source #7704

Conversation

hsheth2 commented Mar 28, 2023

Checklist

This comment was marked as outdated.

mayurinehate left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asikowitz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feat(ingest/lookml): support views with `derived_table`.`explore_source` #7704

feat(ingest/lookml): support views with `derived_table`.`explore_source` #7704