Skip to content

Commit

Permalink
feat(ingest): Add metabase name to platform instance mapping
Browse files Browse the repository at this point in the history
  • Loading branch information
k-popov committed Jul 3, 2023
1 parent cfa864e commit 01a5717
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 0 deletions.
7 changes: 7 additions & 0 deletions metadata-ingestion/docs/sources/metabase/metabase.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ the underlying datasets in the `glue` platform, the following snippet can be use
DataHub will try to determine database name from Metabase [api/database](https://www.metabase.com/docs/latest/api-documentation.html#database)
payload. However, the name can be overridden from `database_alias_map` for a given database connected to Metabase.

If several platform instances with the same platform (e.g. from several distinct clickhouse clusters) are present in DataHub,
the mapping between database name in Metabase and platform instance in DataHub may be configured with the following map:
```yml
name_to_instance_map:
DataBaseNameInMetabase: platform_instance_in_datahub
```
If `name_to_instance_map` is not specified platform instance is not used when constructing `urn` when searching for dataset relations.
## Compatibility

Metabase version [v0.41.2](https://www.metabase.com/start/oss/)
22 changes: 22 additions & 0 deletions metadata-ingestion/src/datahub/ingestion/source/metabase.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ class MetabaseConfig(DatasetLineageProviderConfigBase):
default=None,
description="Custom mappings between metabase database engines and DataHub platforms",
)
name_to_instance_map: Optional[Dict[str, str]] = Field(
default=None,
description="Custom mappings between metabase database name and DataHub platform instance",
)
default_schema: str = Field(
default="public",
description="Default schema name to use when schema is not provided in an SQL query",
Expand Down Expand Up @@ -273,6 +277,16 @@ def _get_ownership(self, creator_id: int) -> Optional[OwnershipClass]:
user_info_response.raise_for_status()
user_details = user_info_response.json()
except HTTPError as http_error:
if (
hasattr(http_error, "response")
and http_error.response.status_code == 404
):
self.report.report_warning(
key=f"metabase-user-{creator_id}",
reason=f"User {creator_id} is blocked in Metabase or missing",
)
return None
# For cases when the error is not 404 but something else
self.report.report_failure(
key=f"metabase-user-{creator_id}",
reason=f"Unable to retrieve User info. " f"Reason: {str(http_error)}",
Expand Down Expand Up @@ -572,6 +586,14 @@ def get_datasource_from_id(self, datasource_id):
else None
)

# For cases when metabase has several platform instances (e.g. several individual ClickHouse clusters)
datasource_name_in_metabase = dataset_json.get("name", "")
platform_instance = (
self.config.name_to_instance_map.get(datasource_name_in_metabase)
if self.config.name_to_instance_map
else None
)

field_for_dbname_mapping = {
"postgres": "dbname",
"sparksql": "dbname",
Expand Down

0 comments on commit 01a5717

Please sign in to comment.