Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/druid) Handling gracefully if no table returned in a schema #8203

Merged
merged 5 commits into from
Jun 15, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion metadata-ingestion/src/datahub/ingestion/source/sql/druid.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# This import verifies that the dependencies are available.
import pydruid # noqa: F401
from pydantic.fields import Field
from pydruid.db.sqlalchemy import DruidDialect
from sqlalchemy import text

from datahub.configuration.common import AllowDenyPattern
from datahub.ingestion.api.decorators import (
Expand All @@ -15,11 +17,25 @@
from datahub.ingestion.source.sql.sql_config import BasicSQLAlchemyConfig


def get_table_names(self, connection, schema=None, **kwargs):
query = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES"
if schema:
query = "{query} WHERE TABLE_SCHEMA = '{schema}'".format(
query=query, schema=schema
)

result = connection.execute(text(query))
return [row.TABLE_NAME for row in result] if result.rowcount > 0 else []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is this result.rowcount > 0 check doing? Does sqlalchemy normally error if the row count is 0?

Copy link
Contributor Author

@treff7es treff7es Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I threw an error (This result object does not return rows. It has been closed automatically.) if no rows were returned and it tried to iterate over the results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a druid sqlalchemy bug to me... or something that should be configurable. Why would returning no rows result in an error?

Would at least prefer if we referenced the existing code (e.g. wrapped with a try catch) rather than overwriting it, if possible



DruidDialect.get_table_names = get_table_names


class DruidConfig(BasicSQLAlchemyConfig):
# defaults
scheme = "druid"
schema_pattern: AllowDenyPattern = Field(
default=AllowDenyPattern(deny=["^(lookup|sys).*"]),
default=AllowDenyPattern(deny=["^(lookup|sys|view).*"]),
description="regex patterns for schemas to filter in ingestion.",
)

Expand Down