Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/metabase): Fix for query template expressions and invalid URNs for Text Cards #10381

Merged

Conversation

pulsar256
Copy link
Contributor

@pulsar256 pulsar256 commented Apr 25, 2024

Fixes #10380
Fixes #9767

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Apr 25, 2024
query_patched = re.sub(r"\[\[.+\]\]", r" ", raw_query)

# replace {{FILTER}} with 1
query_patched = re.sub(r"\{\{.+\}\}", r"1", query_patched)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small improvement idea (which I assume you already indicated in the issue comment): we could theoretically put .+ into a matcher group and only replace it if it's a know parameter by matching it against card_details["parameters"].

Not sure if it's worth the effort, though, as non-replaced parameters would only cause the parser to fail and ignore the lineage for this card and there's likely no harm in globally replacing everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was considering doing that and decided against it to avoid introducing additional complexity without any practical benefit so far I could see. I mean, other than being more "correct" and pedantic.

The correct place to fix that is upstream, all we can do downstream is to apply workarounds and sanitization. An upstream "fix" should be rather an feature to query the dashboard and card characteristics such as dependencies / database objects being used but that is not trivial and IMHO unlikely to be implemented any time soon.

@pulsar256 pulsar256 changed the title Metabase metadata-ingestion fixes fix(ingestion/metabase) Fix for query template expressions and invalid URNs for Text Cards Apr 26, 2024
@pulsar256 pulsar256 force-pushed the bugfix/metabase_ingestion_fixes branch 2 times, most recently from 296cf37 to 0803993 Compare April 29, 2024 06:47
@pulsar256 pulsar256 force-pushed the bugfix/metabase_ingestion_fixes branch from 0803993 to ff7d6e0 Compare April 30, 2024 06:11
@pulsar256 pulsar256 force-pushed the bugfix/metabase_ingestion_fixes branch from ff7d6e0 to 9b9d059 Compare May 8, 2024 07:31
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, but this needs some tests

@pulsar256
Copy link
Contributor Author

The code looks good, but this needs some tests

please take a look if the approach is ok. If it is fine, i will add another set of tests for the other issue and perhaps cover more variants of the templated queries.

@pulsar256
Copy link
Contributor Author

@hsheth2 ready for round 2.

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests for #9767, it would be sufficient to just add some unit tests that call strip_template_expressions

For #10380, since it's a fairly small change, it would also be better to modify the existing test instead of creating an entirely new one. The duplication will be harder to manage down the line

@pulsar256
Copy link
Contributor Author

For tests for #9767, it would be sufficient to just add some unit tests that call strip_template_expressions

For #10380, since it's a fairly small change, it would also be better to modify the existing test instead of creating an entirely new one. The duplication will be harder to manage down the line

I have no problem undoing / removing the metabase mocked responses. I have considered monkey patching the existing mocked metabase responses I found it rather problematic given that I had no access to the metabase state to (re)generate those. In the end we want to test the ingestion with response payloads as close as possible to what metabase generates instead of hand crafted or patched responses. Hence I refactored the test suite to allow for side effect free (between the individual tests) usage of independent reponse mocks and I was looking into automating the capture process as well.

Do you want me to keep the refactored fixtures based approach for the response map in place or should we go back to the global state in the test suite?

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test setup changes look great

Left some tiny nits, but otherwise looks good



@freeze_time(FROZEN_TIME)
def test_9767_templated_query_is_stripped(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_9767_templated_query_is_stripped(
def test_strip_template_expressions(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pulsar256 pulsar256 force-pushed the bugfix/metabase_ingestion_fixes branch from c2e4b2a to a5f21b1 Compare May 21, 2024 05:37
@pulsar256
Copy link
Contributor Author

@hsheth2 thanks for the review. included both suggested changes, rebased.

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label May 21, 2024
@pulsar256
Copy link
Contributor Author

@hsheth2 thanks, I think the workflows need another approval. Perhaps rebasing resets that?

@hsheth2
Copy link
Collaborator

hsheth2 commented May 21, 2024

@pulsar256 yup I believe it does. No need to rebase/merge master - we'll merge it once CI is green.

@hsheth2 hsheth2 changed the title fix(ingestion/metabase) Fix for query template expressions and invalid URNs for Text Cards fix(ingest/metabase): Fix for query template expressions and invalid URNs for Text Cards May 24, 2024
@hsheth2 hsheth2 merged commit 1c1450e into datahub-project:master May 24, 2024
58 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
3 participants