-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/tableau): split table columns query from datasources query #8217
fix(ingest/tableau): split table columns query from datasources query #8217
Conversation
@@ -388,15 +419,7 @@ class TableauProject: | |||
@capability(SourceCapability.TAGS, "Requires recipe configuration") | |||
@capability(SourceCapability.LINEAGE_COARSE, "Enabled by default") | |||
class TableauSource(StatefulIngestionSourceBase): | |||
config: TableauConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed at class level.
self.config = config | ||
self.report = StaleEntityRemovalSourceReport() | ||
self.server = None | ||
self.upstream_tables = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed upstream_tables
variable. Replaced with database_tables
which is better typed.
@@ -189,17 +189,15 @@ class MetadataQueryException(Exception): | |||
upstreamTables { | |||
id | |||
name | |||
isEmbedded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to tables query
database_table_id_to_urn_map[table[tableau_constant.ID]] | ||
] | ||
columns = table.get(tableau_constant.COLUMNS, []) | ||
is_embedded = table.get(tableau_constant.IS_EMBEDDED) or False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table.get(tableau_constant.IS_EMBEDDED, False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isEmbedded is optional boolean. If its returned as None in graphql, then I would like to set is_embedded as False, which is not the case if we use suggested construct(which outputs None). I'd prefer to keep it as is.
https://help.tableau.com/current/api/metadata_api/en-us/reference/databasetable.doc.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, sure, thanks
type=SchemaFieldDataType(type=TypeClass()), | ||
description="", | ||
nativeDataType=nativeDataType, | ||
dataset_snapshot = DatasetSnapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get rid of DatasetSnapshot if possible in the long term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I just moved the existing pieces around, will track and address this in followup, if its okay.
schema_field = SchemaField( | ||
fieldPath=field[tableau_constant.NAME], | ||
type=SchemaFieldDataType(type=TypeClass()), | ||
description="", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is description a non-nullable field otherwise, I think omitting it would be better.
fields.append(schema_field) | ||
|
||
schema_metadata = SchemaMetadata( | ||
schemaName="test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why schemaName
is hardcoded test
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK schemaName is not used anywhere. Not clear about the impact of changing this. Also not sure what should be set as value of schemaName .
If datasource has multiple upstream tables and upstream tables have large number of columns, NodeLimitExceeded Error occurs even with page_size:1. This PR splits the query into two - first to get upstream tables of datasource and then to get columns of upstream table so that less number of nodes are fetched in query.
Checklist