Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken JSON encoding #729

Merged
merged 11 commits into from
Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@
is used to select dataset labels along a given dimension using
user-defined predicate functions.

* The xcube Python environment is now requiring
`xarray >= 2022.6` and `zarr >= 2.11` to ensure sparse
Zarr datasets can be written using `dataset.to_zarr(store)`. (#688)

* Added new class `xcube.util.jsonencoder.NumpyJSONEncoder` that
is used to serialize numpy-like scalar values to JSON.

### Fixes

* The filesystem-based data stores for the "s3", "file", and "memory"
Expand Down
85 changes: 85 additions & 0 deletions test/util/test_jsonencoder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# The MIT License (MIT)
# Copyright (c) 2022 by the xcube development team and contributors
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

import json
import unittest

import numpy as np
import pytest

from xcube.util.jsonencoder import NumpyJSONEncoder


class NumpyJSONEncoderTest(unittest.TestCase):
TEST_DATA = {
"np_bool": np.bool(True),
"np_int8": np.int8(1),
"np_uint8": np.uint8(2),
"np_int16": np.int16(3),
"np_uint16": np.uint8(4),
"np_int32": np.int32(5),
"np_uint32": np.uint32(6),
"np_int64": np.int64(7),
"np_uint64": np.uint64(8),
"np_float32": np.float32(9.1),
"np_float64": np.float64(9.2),
"py_bool": True,
"py_int": 11,
"py_float": 12.3,
"py_str": "Hallo",
"py_null": None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotation marks should be single per current xcube dev guide (or use dict(...) to avoid quotes entirely in the keys).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotation marks should be single

Right.

}

def test_fail_without_encoder(self):
with pytest.raises(TypeError):
json.dumps(
self.TEST_DATA,
indent=2,
)

def test_encoder_encodes_all(self):
text = json.dumps(
self.TEST_DATA,
indent=2,
cls=NumpyJSONEncoder
)
data = json.loads(text)
self.assertEqual(
{
"np_bool": bool(np.bool(True)),
"np_int8": int(np.int8(1)),
"np_uint8": int(np.uint8(2)),
"np_int16": int(np.int16(3)),
"np_uint16": int(np.uint8(4)),
"np_int32": int(np.int32(5)),
"np_uint32": int(np.uint32(6)),
"np_int64": int(np.int64(7)),
"np_uint64": int(np.uint64(8)),
"np_float32": float(np.float32(9.1)),
"np_float64": float(np.float64(9.2)),
"py_bool": True,
"py_int": 11,
"py_float": 12.3,
"py_str": "Hallo",
"py_null": None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, single quote marks / dict(...) preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

},
data
)
4 changes: 3 additions & 1 deletion xcube/cli/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,9 @@ def format_cell_value(value: Any) -> str:

with open(output_file_path, 'w') as fp:
if output_format == 'json':
json.dump(dict(stores=store_list), fp, indent=2)
from xcube.util.jsonencoder import NumpyJSONEncoder
json.dump(dict(stores=store_list), fp, indent=2,
cls=NumpyJSONEncoder)
else:
yaml.dump(dict(stores=store_list), fp, indent=2)

Expand Down
45 changes: 45 additions & 0 deletions xcube/util/jsonencoder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# The MIT License (MIT)
# Copyright (c) 2022 by the xcube development team and contributors
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

import json

import numpy as np


class NumpyJSONEncoder(json.JSONEncoder):
"""A JSON encoder that converts numpy-like
scalars into corresponding serializable Python objects.
"""

def default(self, obj):
if hasattr(obj, 'dtype') and hasattr(obj, 'ndim'):
if obj.ndim == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to chain this condition onto the if above with another and rather than adding another indentation level. Python guarantees short-circuit evaluation in this case, so we're safe from AttributeErrors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - see my code comment.

# For time being just handle scalars.
if np.issubdtype(obj.dtype, np.bool):
return bool(obj)
if np.issubdtype(obj.dtype, np.integer):
return int(obj)
elif np.issubdtype(obj.dtype, np.floating):
return float(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, if and elif are functionally equivalent for these conditions, since they're mutually exclusive and every clause exits the function. I don't have a stylistic preference for one or the other, but I would prefer them all to be the same :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

else:
return str(obj)
Copy link
Member

@pont-us pont-us Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which types are covered by the else case, but I guess it can't hurt.

Copy link
Member Author

@forman forman Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just discovered that my implementation is possibly wrong because JSONEncoder.default(self, obj) raises TypeError by default. Adding extra test...

# We may add serialization for N-D arrays here.
return json.JSONEncoder.default(self, obj)