Fix broken JSON encoding #729

forman · 2022-09-15T16:31:46Z

The xcube Python environment is now requiring xarray >= 2022.6 and zarr >= 2.11 to ensure sparse Zarr datasets can be written using dataset.to_zarr(store). (xcube to write sparse zarrs #688)
Added new module xcube.util.jsonencoder that offers the class NumpyJSONEncoder used to serialize numpy-like scalar values to JSON. It also offers the function to_json_value() to convert Python objects into JSON-serializable versions. The new functionality is required to ensure dataset attributes that are JSON-serializable. For example, the latest version of the rioxarray package generates a _FillValue attribute with datatype np.uint8.

Closes #688.

EDIT

It is likely that other places where we json.dump() are affected too! In particular, where numeric metadata attributes are serialized such as _FillValue, add_offset, valid_max, etc.

Checklist:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
~~New/modified features documented in docs/source/*~~
Changes documented in CHANGES.md
AppVeyor CI passes
Test coverage remains or increases (target 100%)

forman · 2022-09-16T06:18:39Z

FYI @AliceBalfanz @thomasstorm @TejasMorbagal

pont-us

Looks good and tests pass locally. I made some nitpicking suggestions, but I don't insist on any of them :).

pont-us · 2022-09-16T07:04:05Z

test/util/test_jsonencoder.py

+        "np_bool": np.bool(True),
+        "np_int8": np.int8(1),
+        "np_uint8": np.uint8(2),
+        "np_int16": np.int16(3),
+        "np_uint16": np.uint8(4),
+        "np_int32": np.int32(5),
+        "np_uint32": np.uint32(6),
+        "np_int64": np.int64(7),
+        "np_uint64": np.uint64(8),
+        "np_float32": np.float32(9.1),
+        "np_float64": np.float64(9.2),
+        "py_bool": True,
+        "py_int": 11,
+        "py_float": 12.3,
+        "py_str": "Hallo",
+        "py_null": None,


Quotation marks should be single per current xcube dev guide (or use dict(...) to avoid quotes entirely in the keys).

Quotation marks should be single

Right.

pont-us · 2022-09-16T07:04:34Z

test/util/test_jsonencoder.py

+                "np_bool": bool(np.bool(True)),
+                "np_int8": int(np.int8(1)),
+                "np_uint8": int(np.uint8(2)),
+                "np_int16": int(np.int16(3)),
+                "np_uint16": int(np.uint8(4)),
+                "np_int32": int(np.int32(5)),
+                "np_uint32": int(np.uint32(6)),
+                "np_int64": int(np.int64(7)),
+                "np_uint64": int(np.uint64(8)),
+                "np_float32": float(np.float32(9.1)),
+                "np_float64": float(np.float64(9.2)),
+                "py_bool": True,
+                "py_int": 11,
+                "py_float": 12.3,
+                "py_str": "Hallo",
+                "py_null": None,


Again, single quote marks / dict(...) preferred.

pont-us · 2022-09-16T07:17:21Z

xcube/util/jsonencoder.py

+
+    def default(self, obj):
+        if hasattr(obj, 'dtype') and hasattr(obj, 'ndim'):
+            if obj.ndim == 0:


I'd prefer to chain this condition onto the if above with another and rather than adding another indentation level. Python guarantees short-circuit evaluation in this case, so we're safe from AttributeErrors.

No - see my code comment.

pont-us · 2022-09-16T07:20:38Z

xcube/util/jsonencoder.py

+                if np.issubdtype(obj.dtype, np.bool):
+                    return bool(obj)
+                if np.issubdtype(obj.dtype, np.integer):
+                    return int(obj)
+                elif np.issubdtype(obj.dtype, np.floating):
+                    return float(obj)


As far as I can see, if and elif are functionally equivalent for these conditions, since they're mutually exclusive and every clause exits the function. I don't have a stylistic preference for one or the other, but I would prefer them all to be the same :).

pont-us · 2022-09-16T07:21:05Z

xcube/util/jsonencoder.py

+                elif np.issubdtype(obj.dtype, np.floating):
+                    return float(obj)
+                else:
+                    return str(obj)


I'm not sure which types are covered by the else case, but I guess it can't hurt.

I just discovered that my implementation is possibly wrong because JSONEncoder.default(self, obj) raises TypeError by default. Adding extra test...

forman · 2022-09-16T11:01:25Z

FYI @TonioF, I need to merge today

xcube/util/jsonencoder.py

test/util/test_jsonencoder.py

CHANGES.md

codecov-commenter · 2022-09-16T12:15:45Z

Codecov Report

Base: 92.45% // Head: 92.47% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (4166bd2) compared to base (bafda23).
Patch coverage: 95.83% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #729      +/-   ##
==========================================
+ Coverage   92.45%   92.47%   +0.01%     
==========================================
  Files         321      323       +2     
  Lines       30763    30881     +118     
==========================================
+ Hits        28443    28556     +113     
- Misses       2320     2325       +5

Impacted Files	Coverage Δ
xcube/util/jsonencoder.py	`90.56% <90.56%> (ø)`
test/util/test_jsonencoder.py	`100.00% <100.00%> (ø)`
xcube/cli/io.py	`91.07% <100.00%> (+0.02%)`	⬆️
xcube/core/store/fs/impl/dataset.py	`90.53% <100.00%> (+0.40%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

pont-us

Tests pass, and code makes sense to me.

My one requested change just involves adding a TODO comment for a possible useful expansion, so ignore it if you're in a hurry.

pont-us · 2022-09-16T12:48:35Z

xcube/util/jsonencoder.py

+        converted_obj = {k: to_json_value(v) for k, v in obj.items()}
+        if any(converted_obj[k] is not obj[k] for k in obj.keys()):


Keys can be non-serializable too, so converted_obj = {to_json_value(k): to_json_value(v)... would be useful. And then of course we'd also need to check whether the keys are identical in the if statement on the next line.

No need to implement now if it's working for our current needs, but a TODO comment might be appropriate.

converted_obj = {to_json_value(k): to_json_value(v)... is wrong because keys must be strings in JSON. I agree to change that, but later.

pont-us

Review revised to "Approve" after discussion in chat.

new env required fixing tests

217b29d

forman requested review from TonioF and pont-us September 15, 2022 16:31

forman self-assigned this Sep 15, 2022

forman marked this pull request as ready for review September 15, 2022 17:25

Update jsonencoder.py

0d98375

pont-us requested changes Sep 16, 2022

View reviewed changes

forman added 2 commits September 16, 2022 12:59

update

99cb2bc

Added general JSON converter to_json_value()

88b972d

forman requested review from pont-us and removed request for TonioF September 16, 2022 11:00

forman commented Sep 16, 2022

View reviewed changes

xcube/util/jsonencoder.py Outdated Show resolved Hide resolved

xcube/util/jsonencoder.py Outdated Show resolved Hide resolved

test/util/test_jsonencoder.py Outdated Show resolved Hide resolved

CHANGES.md Outdated Show resolved Hide resolved

forman added 6 commits September 16, 2022 13:14

Update xcube/util/jsonencoder.py

9ca1b3d

Update xcube/util/jsonencoder.py

6734b5d

Update test/util/test_jsonencoder.py

cd6f621

Update CHANGES.md

26f0607

Update

2829608

Using to_json_value() where it is needed

4166bd2

forman requested a review from TejasMorbagal September 16, 2022 11:25

pont-us requested changes Sep 16, 2022

View reviewed changes

Checking keys too.

2e5d239

pont-us approved these changes Sep 16, 2022

View reviewed changes

forman merged commit 71a96e7 into master Sep 16, 2022

forman deleted the forman-688-update_env_and_encode_json branch September 16, 2022 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken JSON encoding #729

Fix broken JSON encoding #729

forman commented Sep 15, 2022 •

edited

Loading

forman commented Sep 16, 2022

pont-us left a comment

pont-us Sep 16, 2022

forman Sep 16, 2022

pont-us Sep 16, 2022

forman Sep 16, 2022

pont-us Sep 16, 2022

forman Sep 16, 2022

pont-us Sep 16, 2022

forman Sep 16, 2022

pont-us Sep 16, 2022 •

edited

Loading

forman Sep 16, 2022 •

edited

Loading

forman commented Sep 16, 2022

codecov-commenter commented Sep 16, 2022

pont-us left a comment

pont-us Sep 16, 2022

forman Sep 16, 2022 •

edited

Loading

pont-us left a comment

		converted_obj = {k: to_json_value(v) for k, v in obj.items()}
		if any(converted_obj[k] is not obj[k] for k in obj.keys()):

Fix broken JSON encoding #729

Fix broken JSON encoding #729

Conversation

forman commented Sep 15, 2022 • edited Loading

forman commented Sep 16, 2022

pont-us left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pont-us Sep 16, 2022 • edited Loading

Choose a reason for hiding this comment

forman Sep 16, 2022 • edited Loading

Choose a reason for hiding this comment

forman commented Sep 16, 2022

codecov-commenter commented Sep 16, 2022

Codecov Report

pont-us left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forman Sep 16, 2022 • edited Loading

Choose a reason for hiding this comment

pont-us left a comment

Choose a reason for hiding this comment

forman commented Sep 15, 2022 •

edited

Loading

pont-us Sep 16, 2022 •

edited

Loading

forman Sep 16, 2022 •

edited

Loading

forman Sep 16, 2022 •

edited

Loading