Skip to content

Commit

Permalink
feat(ingest): Add metabase name to platform instance mapping
Browse files Browse the repository at this point in the history
  • Loading branch information
k-popov committed Jul 13, 2023
1 parent 83ebeb2 commit c79984f
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 1 deletion.
8 changes: 8 additions & 0 deletions metadata-ingestion/docs/sources/metabase/metabase.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ the underlying datasets in the `glue` platform, the following snippet can be use
DataHub will try to determine database name from Metabase [api/database](https://www.metabase.com/docs/latest/api-documentation.html#database)
payload. However, the name can be overridden from `database_alias_map` for a given database connected to Metabase.

If several platform instances with the same platform (e.g. from several distinct clickhouse clusters) are present in DataHub,
the mapping between database id in Metabase and platform instance in DataHub may be configured with the following map:
```yml
database_id_to_instance_map:
"42": platform_instance_in_datahub
```
The key in this map must be string, not integer although Metabase API provides `id` as number.
If `database_id_to_instance_map` is not specified, `platform_instance_map` is used for platform instance mapping. If none of the above are specified, platform instance is not used when constructing `urn` when searching for dataset relations.
## Compatibility

Metabase version [v0.41.2](https://www.metabase.com/start/oss/)
4 changes: 3 additions & 1 deletion metadata-ingestion/docs/sources/metabase/metabase.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ source:
# Optional mapping of platform types to instance ids
platform_instance_map: # optional
postgres: test_postgres # optional
database_id_to_instance_map: # optional
"42": platform_instance_in_datahub # optional

sink:
# sink configs
# sink configs
22 changes: 22 additions & 0 deletions metadata-ingestion/src/datahub/ingestion/source/metabase.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ class MetabaseConfig(DatasetLineageProviderConfigBase):
default=None,
description="Custom mappings between metabase database engines and DataHub platforms",
)
database_id_to_instance_map: Optional[Dict[str, str]] = Field(
default=None,
description="Custom mappings between metabase database id and DataHub platform instance",
)
default_schema: str = Field(
default="public",
description="Default schema name to use when schema is not provided in an SQL query",
Expand Down Expand Up @@ -273,6 +277,16 @@ def _get_ownership(self, creator_id: int) -> Optional[OwnershipClass]:
user_info_response.raise_for_status()
user_details = user_info_response.json()
except HTTPError as http_error:
if (
http_error.response is not None
and http_error.response.status_code == 404
):
self.report.report_warning(
key=f"metabase-user-{creator_id}",
reason=f"User {creator_id} is blocked in Metabase or missing",
)
return None
# For cases when the error is not 404 but something else
self.report.report_failure(
key=f"metabase-user-{creator_id}",
reason=f"Unable to retrieve User info. " f"Reason: {str(http_error)}",
Expand Down Expand Up @@ -572,6 +586,14 @@ def get_datasource_from_id(self, datasource_id):
else None
)

# For cases when metabase has several platform instances (e.g. several individual ClickHouse clusters)
datasource_id_in_metabase = dataset_json.get("id")
platform_instance = (
self.config.database_id_to_instance_map.get(str(datasource_id_in_metabase))
if datasource_id_in_metabase and self.config.database_id_to_instance_map
else None
)

field_for_dbname_mapping = {
"postgres": "dbname",
"sparksql": "dbname",
Expand Down

0 comments on commit c79984f

Please sign in to comment.