Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/trino): Add sibling support in ingestion #9853

Merged

Conversation

shubhamjagtap639
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Feb 14, 2024
class TrinoConfig(BasicSQLAlchemyConfig):
# defaults
scheme: str = Field(default="trino", description="", hidden_from_docs=True)

catalog_to_connector_details: Dict[str, ConnectorDetail] = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make sure we're using the terms catalog and connector in a way that's consistent with https://trino.io/docs/current/overview/concepts.html#data-sources

The docs for this are also pretty unclear - terms like "three tier connector" are not commonplace

Copy link
Contributor Author

@shubhamjagtap639 shubhamjagtap639 Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if its still unclear

catalog_name, ConnectorDetail()
)

if connector_platform_details:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we just always try to generate this relationship?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there can be some trino connector platform which datahub doesn't support yet.
Eg: https://trino.io/docs/current/connector/hudi.html

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few small things, but mostly looks good

"""
).strip()
res = connection.execute(sql.text(query))
catalog_connector_dict = {row.catalog_name: row.connector_name for row in res}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split this up into two methods - one to generate catalog_connector_dict, and the second doing the lookup

the lru_cache annotation should be on the first one

platform_instance=connector_details.platform_instance,
env=connector_details.env,
)
elif connector_details.connector_database: # else connector is three tier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an else clause that reports a warning that the connector_database is missing

)
connector_platform_name = KNOWN_CONNECTOR_PLATFORM_MAPPING.get(
connector_details.connector_platform
if connector_details.connector_platform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connector_details.connector_platform or connector_name

@asikowitz asikowitz removed the community-contribution PR or Issue raised by member(s) of DataHub Community label Feb 22, 2024
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shubhamjagtap639 shubhamjagtap639 marked this pull request as ready for review February 26, 2024 07:20
@anshbansal anshbansal merged commit 5921a33 into datahub-project:master Feb 26, 2024
54 checks passed
@shubhamjagtap639 shubhamjagtap639 deleted the Trino-add-sibling-support branch March 14, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants