-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): Add metabase database id to platform instance mapping #8359
feat(ingest): Add metabase database id to platform instance mapping #8359
Conversation
6ccbfbe
to
01a5717
Compare
Another option for mapping would be using |
01a5717
to
5b32b3b
Compare
@k-popov thanks for the contribution. @mayurinehate could you take a look? |
Hey @k-popov can you not use
Or do you have multiple platform instances of clickhouse accessed via same metabase instance ? |
@mayurinehate thanks for feedback! Yes, in my case there are multiple platform instances of clickhouse all accessed by the same metabase instance. I first thought of "re-using" the |
5b32b3b
to
5dba990
Compare
Got it. I am inclined towards using id than name as that would solve the problem permanently and handle corner cases as well. About user friendliness, can you confirm if this is still true - #5647 (comment) ? If yes, I'm comfortable with using database id. - and the config becomes |
if ( | ||
hasattr(http_error, "response") | ||
and http_error.response.status_code == 404 | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( | |
hasattr(http_error, "response") | |
and http_error.response.status_code == 404 | |
): | |
if ( | |
hasattr(http_error, "response") | |
and http_error.response is not None | |
and http_error.response.status_code == 404 | |
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced the hasattr
check with http_error.response is not None
as http_error.response
is always set to None
in exception constructor: https://github.com/psf/requests/blob/cdbc2e271529f467b278b2760f12ee0b5d6930d3/requests/exceptions.py#L19
@mayurinehate as discussed, I've replaced key for mapping from |
d9f39c8
to
fa9cc74
Compare
I am not familiar with such limitation. This should be supported. Do you happen to have stack trace of the error, i.e. which line in code gave the error ? I am fine with keeping these as strings. I believe, it is not mandatory to enclose the id integer within quotes in recipe yml . |
c79984f
to
54d931e
Compare
Here is the stacktrace I see when using
Actually ingestion continues after this stacktrace but having any exception seems bad. |
# Set platform_instance if configuration provides a mapping from platform name to instance | ||
platform_instance = ( | ||
self.config.platform_instance_map.get(platform) | ||
if self.config.platform_instance_map | ||
if self.config.platform_instance_map and platform_instance is None | ||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if needs to be written as below for this to behave correctly for case platform_instance was chosen from datasource_id_in_metabase
.
if self.config.platform_instance_map and platform_instance is None:
platform_instance = self.config.platform_instance_map.get(platform)
Can you please refractor the entire get platform instance logic in separate method and add unit tests for that method ?
def get_platform_instance_for_datasource(datasource_id: str, platform: Optional[str])->Optional[str]:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved platform_instance detection into method.
Attempted to add a unit-test for this function but due to connection checks being done by __init__()
of MetabaseSource
I need to also mock http communication. Will continue working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayurinehate would you please check the PR once again? I've implemented the unit-test for the function and it passes normally.
I've also moved network initialization of MetabaseSource
class into separate function which is called in MetabaseSource.__init__()
. There are no changes in behavior but now init is logically split into "local" initialization and "network" initialization.
I see some tests are failing but these are not related to the change I made.
defd20a
to
ec76fe8
Compare
ec76fe8
to
ef4d65d
Compare
3d1c3bc
to
7064bf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
7064bf2
to
0468033
Compare
0468033
to
51a8fb5
Compare
@mayurinehate , thank you for your response. I've removed redundant attribute definition. Related tests look fine. Would you please check the MR once again? |
51a8fb5
to
88acb48
Compare
88acb48
to
5efbf6a
Compare
@mayurinehate , sorry to bother you once again but I might be missing something. After you approved the PR I still didn't have a button to merge it. I first thought that's because |
Hi @k-popov you don't need to do any action after approval. One of the project maintainers will merge the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm not a huge fan of having both platform_instance_map
and database_id_to_instance_map
-- in the future, does it make sense to deprecate the former in favor of the latter?
Checklist
Since DataHub supports platform instances there may be several distinct instances with the same platform in the same environment. When ingesting charts from Metabase it should be possible to find the corresponding datasets (tables) in DataHub.
For example, the dataset URN searched during ingestion is:
urn:li:dataPlatform:clickhouse,event.new_app,PROD
urn:li:dataPlatform:clickhouse,clichouse_analytics_prod.event.new_app,PROD
with only the latter URN being valid for dataset ingested with
platform_instance
specified.The change still leaves
name_to_instance_map
optional and ifplatform_instance
is not used, the mapping may be skipped.Also the change adds a case for Metabase user not being found which is normal for deactivated users. I believe this case should not be considered as failure as people in companies quit or change their roles while charts and dashboards they created are still used by the company. I believe this behavior has already been mentioned in #5294. Unfortunately Metabase API does not support GET'ting individual user details for deactivated users, only has options to list them. Thus checking for 404 looks like a reasonable and yet simple workaround.