Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on #10752

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions metadata-ingestion/src/datahub/ingestion/source/unity/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ def get_workunit_processors(self) -> List[Optional[MetadataWorkUnitProcessor]]:
def get_workunits_internal(self) -> Iterable[MetadataWorkUnit]:
self.report.report_ingestion_stage_start("Ingestion Setup")
wait_on_warehouse = None
if self.config.is_profiling_enabled() or self.config.include_hive_metastore:
if self.config.include_hive_metastore:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant variable initialization.

The variable wait_on_warehouse is initialized to None but is immediately assigned a value if the condition is met.

-        wait_on_warehouse = None
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.config.include_hive_metastore:
if self.config.include_hive_metastore:

self.report.report_ingestion_stage_start("Start warehouse")
# Can take several minutes, so start now and wait later
wait_on_warehouse = self.unity_catalog_api_proxy.start_warehouse()
Expand Down Expand Up @@ -309,9 +309,20 @@ def get_workunits_internal(self) -> Iterable[MetadataWorkUnit]:
)

if self.config.is_profiling_enabled():
self.report.report_ingestion_stage_start("Wait on warehouse")
assert wait_on_warehouse
wait_on_warehouse.result()
self.report.report_ingestion_stage_start("Start warehouse")
# Need to start the warehouse again for profiling,
# as it may have been stopped after ingestion might take
# longer time to complete
wait_on_warehouse = self.unity_catalog_api_proxy.start_warehouse()
if wait_on_warehouse is None:
self.report.report_failure(
"initialization",
f"SQL warehouse {self.config.profiling.warehouse_id} not found",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this happens when permission for sql warehouse is missing Or incorrect sql warehouse is specified ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check, as this was the existing codebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this happens when there is no warehouse with specified id.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

)
Comment on lines +317 to +321
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify the check for warehouse availability.

The check for wait_on_warehouse being None can be simplified.

-            if wait_on_warehouse is None:
+            if not wait_on_warehouse:
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if wait_on_warehouse is None:
self.report.report_failure(
"initialization",
f"SQL warehouse {self.config.profiling.warehouse_id} not found",
)
if not wait_on_warehouse:
self.report.report_failure(
"initialization",
f"SQL warehouse {self.config.profiling.warehouse_id} not found",
)

return
else:
# wait until warehouse is started
wait_on_warehouse.result()

self.report.report_ingestion_stage_start("Profiling")
if isinstance(self.config.profiling, UnityCatalogAnalyzeProfilerConfig):
Expand Down
Loading