Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create db_dtypes JSONDtype and JSONArray #284

Merged
merged 28 commits into from
Aug 8, 2024

Conversation

chelsea-lin
Copy link
Contributor

@chelsea-lin chelsea-lin commented Jul 22, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Fixes internal bug b/312728178 (design doc: go/bf-json)
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal bug b/312728178 🦕

@chelsea-lin chelsea-lin requested a review from tswast July 22, 2024 22:52
@chelsea-lin chelsea-lin requested review from a team as code owners July 22, 2024 22:52
@chelsea-lin chelsea-lin requested a review from ohmayr July 22, 2024 22:52
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-db-dtypes-pandas API. labels Jul 22, 2024
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite finished with the review, but sending comments now since I have a meeting and want to share what I have so far before I get back to it.

@property
def type(self) -> type[str]:
"""Return the scalar type for the array, e.g. int."""
return dict
Copy link
Collaborator

@tswast tswast Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a Union[dict, list, str, int, float]?

Please create a JSONScalarType = Union[dict, list, str, int, float] module-level variable and use it here, if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried JSONScalarType and typing.Any but both of them are hitting errors on pandas source codes. Also, I didn't find an example Extension dtype that has multiple types. For now, I would leave it as a str to indicate its storage type and the type of to_numpy as well. Here are the call stack of both of them:
https://screenshot.googleplex.com/6GiKkNU9T2e8GPm
https://screenshot.googleplex.com/9QfDFsrUzD7PMrT

db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments before I go to lunch. Thanks for your patience with the review.

db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
tests/compliance/json/conftest.py Outdated Show resolved Hide resolved
tests/compliance/json/conftest.py Outdated Show resolved Hide resolved
tests/compliance/json/conftest.py Outdated Show resolved Hide resolved
db_dtypes/__init__.py Outdated Show resolved Hide resolved
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_dbjson branch 3 times, most recently from 865f93b to 790f257 Compare August 6, 2024 19:11
db_dtypes/json.py Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
db_dtypes/json.py Outdated Show resolved Hide resolved
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_dbjson branch 2 times, most recently from 98debff to b4cfcd9 Compare August 7, 2024 22:02
db_dtypes/json.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once test coverage is reached.

@chelsea-lin chelsea-lin merged commit 76790a8 into main Aug 8, 2024
27 checks passed
@chelsea-lin chelsea-lin deleted the main_chelsealin_dbjson branch August 8, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-db-dtypes-pandas API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants