Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(glue): allow resource links to be ignored #7639

Merged
merged 4 commits into from
Apr 21, 2023

Conversation

YusufMahtab
Copy link
Contributor

@YusufMahtab YusufMahtab commented Mar 20, 2023

Description

This PR adds a config option to the Glue recipe to detect when database is actually a resource link and automatically filter it out.

The get_all_tables_and_databases() method has also been refactored (and renamed) so that the boto3 calls are in their own methods.

Context

There are at least 2 ways of sharing Glue databases across accounts:

  • external catalog sharing
  • resource linking - the resource link may have a different name to the original database

If

  1. a database is shared via both methods
  2. the resource link has a different name to the original database
  3. the Glue recipe filter allows for both of them

then a KeyError is thrown. An example is shown in this Slack thread: https://datahubspace.slack.com/archives/CUMUWQU66/p1674044495369769.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Mar 20, 2023
@vercel
Copy link

vercel bot commented Mar 21, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
docs-website ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Mar 21, 2023 at 3:48PM (UTC)

@vercel vercel bot temporarily deployed to Preview March 21, 2023 10:31 Inactive
@hsheth2 hsheth2 merged commit fa10256 into datahub-project:master Apr 21, 2023
iprentic pushed a commit that referenced this pull request Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants