Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data quality): update models, add assertions cli with snowflake integration #10602

Merged
merged 11 commits into from
May 31, 2024

Conversation

mayurinehate
Copy link
Collaborator

@mayurinehate mayurinehate commented May 28, 2024

Define data quality rules in DataHub

Step 1: Create assertions specification file
Step 2: Upsert these assertions in DataHub to view data quality expectations on entity page
Step 3: Compile and schedule these assertions to run on assertion backend of your choice (here - snowflake)
Step 4: Ingest the assertion results back in DataHub to view current status for data quality checks.

Assertions Specification File:

Simple yaml file to define assertions in declarative manner.

version: 1
assertions:
    - type: volume
      entity: urn:li:dataset:(urn:li:dataPlatform:snowflake,test_db.public.test_assertions_all_times,PROD)
      volume_metric: row_count
      condition:
          type: between
          min: 5
          max: 15
      schedule:
          type: on_table_change

Assertions CLI:

  • Upsert command to add assertions to DataHub
    datahub assertions upsert -f examples/library/assertions_configuration.yml

  • Compile command to compile assertions specification to chosen assertion backend.
    datahub assertions compile -f examples/library/assertions_configuration.yml -p snowflake -x DMF_SCHEMA=test_db.datahub_dmfs

Ingestion

For snowflake, run ingestion with include_assertion_results: true to ingest assertion results back into DataHub

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels May 28, 2024
Constants.GLOBAL_TAGS_ASPECT_NAME);

Constants.GLOBAL_TAGS_ASPECT_NAME,
Constants.ASSERTION_ACTIONS_ASPECT_NAME);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

@@ -48,6 +66,14 @@ public static Assertion map(@Nullable QueryContext context, final EntityResponse
result.setInfo(
mapAssertionInfo(context, new AssertionInfo(envelopedAssertionInfo.getValue().data())));
}

final EnvelopedAspect envelopedAssertionActions =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please remove?

@jjoyce0510
Copy link
Collaborator

CI is passing! Going to go ahead and merge this in. We can continue to iterate from here. Nice MVP!

@jjoyce0510 jjoyce0510 merged commit 81b655c into datahub-project:master May 31, 2024
62 of 63 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
yoonhyejin pushed a commit that referenced this pull request Jul 16, 2024
…odels, add assertions cli with snowflake integration (#10602)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants