Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert information into dict #752

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented May 8, 2023

resolves #751

Description

Checklist

@Fokko Fokko requested a review from a team as a code owner May 8, 2023 20:09
@Fokko Fokko requested a review from nssalian May 8, 2023 20:09
@cla-bot cla-bot bot added the cla:yes label May 8, 2023
@Fokko Fokko force-pushed the fd-information-dict branch 2 times, most recently from 17ad88f to af5ea1f Compare May 8, 2023 20:11
@Fokko Fokko changed the title Convert information into dict Convert information into dict May 8, 2023
@Fokko
Copy link
Contributor Author

Fokko commented May 9, 2023

I'll fix the UT's later today

@Fokko Fokko force-pushed the fd-information-dict branch 3 times, most recently from 457b311 to 723729c Compare May 9, 2023 20:23
@Fokko Fokko force-pushed the fd-information-dict branch 2 times, most recently from 61d68fa to 06cfeda Compare May 10, 2023 14:45
@@ -139,13 +144,54 @@ def add_schema_to_cache(self, schema) -> str:
def _get_relation_information(self, row: agate.Row) -> RelationInfo:
"""relation info was fetched with SHOW TABLES EXTENDED"""
try:
_schema, name, _, information = row
table_properties = {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this by directly separating the columns and properties when we parse the output. I think that makes more sense than splitting this later on because the SHOW TABLES EXTENDED output is different from the DESCRIBE TABLE EXTENDED output.

@Fokko
Copy link
Contributor Author

Fokko commented Jun 26, 2023

@mikealfare would you have time to look into this one? :)

dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved
@mikealfare mikealfare removed the request for review from nssalian July 5, 2023 18:56
@mikealfare
Copy link
Contributor

@mikealfare would you have time to look into this one? :)

This one appears to be a bit more involved and I see that @Fleid labelled this as "pr_tracked". That means it has its own item in our (internal) backlog (= we're aware of it, but it hasn't necessarily been slotted for a sprint yet). I would look to him to get capacity for this issue.

Copy link
Collaborator

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the goal of this PR: converting the information string to a columns and properties dict!

But ... the code changes are lot. If we can test it well and with the suggested changes, I think it is the good enough to move forward to the features that benefit from the columns and properties dict.

@@ -320,10 +322,15 @@ def test_parse_relation(self):
input_cols = [Row(keys=["col_name", "data_type"], values=r) for r in plain_rows]

config = self._get_target_http(self.project_cfg)
rows = SparkAdapter(config).parse_describe_extended(relation, input_cols)
self.assertEqual(len(rows), 4)
adapter = SparkAdapter(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko : Could you rewrite this code to have a similar flow as before, i.e. (re)move the three build-up lines and have one call to get the to-be-tested object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've condensed the code quite a bit. Let me know what you think!

@@ -33,8 +33,8 @@ class SparkRelation(BaseRelation):
is_delta: Optional[bool] = None
is_hudi: Optional[bool] = None
is_iceberg: Optional[bool] = None
# TODO: make this a dict everywhere
information: Optional[str] = None
columns: List[Tuple[str, str]] = field(default_factory=list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Much better than the information string ❤️

@@ -138,13 +143,54 @@ def quote(self, identifier: str) -> str: # type: ignore
def _get_relation_information(self, row: agate.Row) -> RelationInfo:
"""relation info was fetched with SHOW TABLES EXTENDED"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko : Could you extend the docstring to explain that the SHOW TABLES EXTENDED is preferred because fetching multiple tables at once is faster than fetching tables one by one. And, that we except the downside of parsing the |--- string given the performance gains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. I've added a docstring. Let me know what you think

self, table_results: agate.Table
) -> Tuple[List[Tuple[str, str]], Dict[str, str]]:
# Wrap it in an iter, so we continue reading the properties from where we stopped reading columns
table_results_itr = iter(table_results)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤩

dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh, the comment was still pending

@@ -320,10 +322,15 @@ def test_parse_relation(self):
input_cols = [Row(keys=["col_name", "data_type"], values=r) for r in plain_rows]

config = self._get_target_http(self.project_cfg)
rows = SparkAdapter(config).parse_describe_extended(relation, input_cols)
self.assertEqual(len(rows), 4)
adapter = SparkAdapter(config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've condensed the code quite a bit. Let me know what you think!

@@ -138,13 +143,54 @@ def quote(self, identifier: str) -> str: # type: ignore
def _get_relation_information(self, row: agate.Row) -> RelationInfo:
"""relation info was fetched with SHOW TABLES EXTENDED"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. I've added a docstring. Let me know what you think

Copy link
Contributor

github-actions bot commented Feb 7, 2024

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

Copy link
Contributor

github-actions bot commented Aug 6, 2024

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-518] Convert information to a dict
4 participants