-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/snowflake): support for more operation types #8158
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,20 @@ | |
"CREATE": OperationTypeClass.CREATE, | ||
"CREATE_TABLE": OperationTypeClass.CREATE, | ||
"CREATE_TABLE_AS_SELECT": OperationTypeClass.CREATE, | ||
"MERGE": OperationTypeClass.CUSTOM, | ||
"COPY": OperationTypeClass.CUSTOM, | ||
"TRUNCATE_TABLE": OperationTypeClass.CUSTOM, | ||
# TODO: Dataset for below query types are not detected by snowflake in snowflake.access_history.objects_modified. | ||
# However it seems possible to support these using sql parsing in future. | ||
# When this support is added, snowflake_query.operational_data_for_time_window needs to be updated. | ||
# "CREATE_VIEW": OperationTypeClass.CREATE, | ||
# "CREATE_EXTERNAL_TABLE": OperationTypeClass.CREATE, | ||
# "ALTER_TABLE_MODIFY_COLUMN": OperationTypeClass.ALTER, | ||
# "ALTER_TABLE_ADD_COLUMN": OperationTypeClass.ALTER, | ||
# "RENAME_COLUMN": OperationTypeClass.ALTER, | ||
# "ALTER_SET_TAG": OperationTypeClass.ALTER, | ||
# "ALTER_TABLE_DROP_COLUMN": OperationTypeClass.ALTER, | ||
# "ALTER": OperationTypeClass.ALTER, | ||
} | ||
|
||
|
||
|
@@ -328,12 +342,14 @@ def _check_usage_date_ranges(self) -> Any: | |
def _get_operation_aspect_work_unit( | ||
self, event: SnowflakeJoinedAccessEvent, discovered_datasets: List[str] | ||
) -> Iterable[MetadataWorkUnit]: | ||
if event.query_start_time and event.query_type in OPERATION_STATEMENT_TYPES: | ||
if event.query_start_time and event.query_type: | ||
start_time = event.query_start_time | ||
query_type = event.query_type | ||
user_email = event.email | ||
user_name = event.user_name | ||
operation_type = OPERATION_STATEMENT_TYPES[query_type] | ||
operation_type = OPERATION_STATEMENT_TYPES.get( | ||
query_type, OperationTypeClass.CUSTOM | ||
) | ||
reported_time: int = int(time.time() * 1000) | ||
last_updated_timestamp: int = int(start_time.timestamp() * 1000) | ||
user_urn = make_user_urn(self.get_user_identifier(user_name, user_email)) | ||
|
@@ -363,6 +379,9 @@ def _get_operation_aspect_work_unit( | |
lastUpdatedTimestamp=last_updated_timestamp, | ||
actor=user_urn, | ||
operationType=operation_type, | ||
customOperationType=query_type | ||
if operation_type is OperationTypeClass.CUSTOM | ||
else None, | ||
Comment on lines
+382
to
+384
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there harm in passing this if the operation type class is not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, I thought that myself as well. Then customOperationType just behaves more like |
||
) | ||
mcp = MetadataChangeProposalWrapper( | ||
entityUrn=dataset_urn, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that putting
CUSTOM
allows us to show thecustomOperationType
, but for other sources we try to label these:(unity)
(bigquery)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MERGE is in fact UPDATE and/or DELETE and/or INSERT, so I was hesitant to mark it as UPDATE. CUSTOM operationType is not great to use. cc: @jjoyce0510 . We could add more operation types in model maybe ? I am proposing addition of more operation types to - MERGE, COPY, TRUNCATE respectively. Does that make sense ?