Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken JSON encoding #729

Merged
merged 11 commits into from
Sep 16, 2022
Merged

Conversation

forman
Copy link
Member

@forman forman commented Sep 15, 2022

  • The xcube Python environment is now requiring xarray >= 2022.6 and zarr >= 2.11 to ensure sparse Zarr datasets can be written using dataset.to_zarr(store). (xcube to write sparse zarrs #688)
  • Added new module xcube.util.jsonencoder that offers the class NumpyJSONEncoder used to serialize numpy-like scalar values to JSON. It also offers the function to_json_value() to convert Python objects into JSON-serializable versions. The new functionality is required to ensure dataset attributes that are JSON-serializable. For example, the latest version of the rioxarray package generates a _FillValue attribute with datatype np.uint8.

Closes #688.

EDIT

It is likely that other places where we json.dump() are affected too! In particular, where numeric metadata attributes are serialized such as _FillValue, add_offset, valid_max, etc.

Checklist:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/source/*
  • Changes documented in CHANGES.md
  • AppVeyor CI passes
  • Test coverage remains or increases (target 100%)

@forman forman self-assigned this Sep 15, 2022
@forman forman marked this pull request as ready for review September 15, 2022 17:25
@forman
Copy link
Member Author

forman commented Sep 16, 2022

Copy link
Member

@pont-us pont-us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and tests pass locally. I made some nitpicking suggestions, but I don't insist on any of them :).

Comment on lines 33 to 48
"np_bool": np.bool(True),
"np_int8": np.int8(1),
"np_uint8": np.uint8(2),
"np_int16": np.int16(3),
"np_uint16": np.uint8(4),
"np_int32": np.int32(5),
"np_uint32": np.uint32(6),
"np_int64": np.int64(7),
"np_uint64": np.uint64(8),
"np_float32": np.float32(9.1),
"np_float64": np.float64(9.2),
"py_bool": True,
"py_int": 11,
"py_float": 12.3,
"py_str": "Hallo",
"py_null": None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotation marks should be single per current xcube dev guide (or use dict(...) to avoid quotes entirely in the keys).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotation marks should be single

Right.

Comment on lines 67 to 82
"np_bool": bool(np.bool(True)),
"np_int8": int(np.int8(1)),
"np_uint8": int(np.uint8(2)),
"np_int16": int(np.int16(3)),
"np_uint16": int(np.uint8(4)),
"np_int32": int(np.int32(5)),
"np_uint32": int(np.uint32(6)),
"np_int64": int(np.int64(7)),
"np_uint64": int(np.uint64(8)),
"np_float32": float(np.float32(9.1)),
"np_float64": float(np.float64(9.2)),
"py_bool": True,
"py_int": 11,
"py_float": 12.3,
"py_str": "Hallo",
"py_null": None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, single quote marks / dict(...) preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.


def default(self, obj):
if hasattr(obj, 'dtype') and hasattr(obj, 'ndim'):
if obj.ndim == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to chain this condition onto the if above with another and rather than adding another indentation level. Python guarantees short-circuit evaluation in this case, so we're safe from AttributeErrors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - see my code comment.

Comment on lines 36 to 41
if np.issubdtype(obj.dtype, np.bool):
return bool(obj)
if np.issubdtype(obj.dtype, np.integer):
return int(obj)
elif np.issubdtype(obj.dtype, np.floating):
return float(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, if and elif are functionally equivalent for these conditions, since they're mutually exclusive and every clause exits the function. I don't have a stylistic preference for one or the other, but I would prefer them all to be the same :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

elif np.issubdtype(obj.dtype, np.floating):
return float(obj)
else:
return str(obj)
Copy link
Member

@pont-us pont-us Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which types are covered by the else case, but I guess it can't hurt.

Copy link
Member Author

@forman forman Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just discovered that my implementation is possibly wrong because JSONEncoder.default(self, obj) raises TypeError by default. Adding extra test...

@forman forman requested review from pont-us and removed request for TonioF September 16, 2022 11:00
@forman
Copy link
Member Author

forman commented Sep 16, 2022

FYI @TonioF, I need to merge today

xcube/util/jsonencoder.py Outdated Show resolved Hide resolved
xcube/util/jsonencoder.py Outdated Show resolved Hide resolved
test/util/test_jsonencoder.py Outdated Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Base: 92.45% // Head: 92.47% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (4166bd2) compared to base (bafda23).
Patch coverage: 95.83% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #729      +/-   ##
==========================================
+ Coverage   92.45%   92.47%   +0.01%     
==========================================
  Files         321      323       +2     
  Lines       30763    30881     +118     
==========================================
+ Hits        28443    28556     +113     
- Misses       2320     2325       +5     
Impacted Files Coverage Δ
xcube/util/jsonencoder.py 90.56% <90.56%> (ø)
test/util/test_jsonencoder.py 100.00% <100.00%> (ø)
xcube/cli/io.py 91.07% <100.00%> (+0.02%) ⬆️
xcube/core/store/fs/impl/dataset.py 90.53% <100.00%> (+0.40%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@pont-us pont-us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests pass, and code makes sense to me.

My one requested change just involves adding a TODO comment for a possible useful expansion, so ignore it if you're in a hurry.

Comment on lines 75 to 76
converted_obj = {k: to_json_value(v) for k, v in obj.items()}
if any(converted_obj[k] is not obj[k] for k in obj.keys()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys can be non-serializable too, so converted_obj = {to_json_value(k): to_json_value(v)... would be useful. And then of course we'd also need to check whether the keys are identical in the if statement on the next line.

No need to implement now if it's working for our current needs, but a TODO comment might be appropriate.

Copy link
Member Author

@forman forman Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted_obj = {to_json_value(k): to_json_value(v)... is wrong because keys must be strings in JSON. I agree to change that, but later.

Copy link
Member

@pont-us pont-us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review revised to "Approve" after discussion in chat.

@forman forman merged commit 71a96e7 into master Sep 16, 2022
@forman forman deleted the forman-688-update_env_and_encode_json branch September 16, 2022 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xcube to write sparse zarrs
3 participants