-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend padding functionalities #9353
Changes from 8 commits
7998afb
44e7d26
fd3e304
9f66ee2
2f33b40
b70b25d
d939030
48a972f
098706b
afda62d
8bdb6f1
992e20e
d3f0275
0a51b7c
3d40800
a848351
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6689,17 +6689,45 @@ def test_polyfit_warnings(self) -> None: | |
ds.var1.polyfit("dim2", 10, full=True) | ||
assert len(ws) == 1 | ||
|
||
def test_pad(self) -> None: | ||
@pytest.mark.parametrize( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this work if you want to pad along a dimension coordinate (aka. a variable that is called the same as it's dimension)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question, I'll investigate :D There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, as I understand it most dims in the test dataset are dimension coordinates and they pad correctly, so I think so. In any case I've extended the tests to pad all dimensions just to be sure nothing is behaving incorrectly. Let me know if I missed any case. |
||
["constant_values", "expected"], | ||
[ | ||
pytest.param(None, {"var1": np.nan}, id="default"), | ||
pytest.param(42, {"var1": 42, "var2": 42}, id="scalar"), | ||
pytest.param((42, 43), {"var1": (42, 43), "var2": (42, 43)}, id="tuple"), | ||
pytest.param({"dim2": 42}, {"var1": 42, "var2": 42}, id="per dim scalar"), | ||
pytest.param( | ||
{"dim2": (42, 43)}, | ||
{"var1": (42, 43), "var2": (42, 43)}, | ||
id="per dim tuple", | ||
), | ||
pytest.param( | ||
{"var1": 42, "var2": (42, 43)}, | ||
{"var1": 42, "var2": (42, 43)}, | ||
id="per var", | ||
), | ||
pytest.param( | ||
{"var1": 42, "dim2": (42, 43)}, | ||
{"var1": 42, "var2": (42, 43)}, | ||
id="mixed", | ||
), | ||
], | ||
) | ||
def test_pad(self, constant_values, expected) -> None: | ||
ds = create_test_data(seed=1) | ||
padded = ds.pad(dim2=(1, 1), constant_values=42) | ||
padded = ds.pad(dim2=(1, 1), constant_values=constant_values) | ||
|
||
assert padded["dim2"].shape == (11,) | ||
assert padded["var1"].shape == (8, 11) | ||
assert padded["var2"].shape == (8, 11) | ||
assert padded["var3"].shape == (10, 8) | ||
assert dict(padded.sizes) == {"dim1": 8, "dim2": 11, "dim3": 10, "time": 20} | ||
|
||
np.testing.assert_equal(padded["var1"].isel(dim2=[0, -1]).data, 42) | ||
for var, expected_value in expected.items(): | ||
np.testing.assert_equal( | ||
np.unique(padded[var].isel(dim2=[0, -1]).data), expected_value | ||
) | ||
# np.testing.assert_equal(padded["var1"].isel(dim2=[0, -1]).data, 42) | ||
np.testing.assert_equal(padded["dim2"][[0, -1]].data, np.nan) | ||
|
||
@pytest.mark.parametrize( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typing claims that any mapping works, but here you are only checking dicts.
I would propose to change to
utils.is_dict_like
and find an alternative to pop (Mapping does not define pop because it is read-only). The simplest way would be to transform it into a dict first.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, to be honest I just followed the way it is done in the implementation of
pad
for the variables invariable.py
. Maybe I should just change all these type hints todict
rather thanMapping
. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a typing perspective dict is always problematic because it is invariant.
I would try to stick with Mapping instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, so just I'll just change this one
isinstance(constant_values, dict)
toutils.is_dict_like(constant_values)
. Thanks for the clarifications!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no.
For the check it should suffice, but the general Mapping class has no
pop
defined because it is read-only (non-mutable).Anyway, I guess the user would find it weird that this input is changed in-place, so remove the
pop
method and replace it with a get item call (not sure if the Mapping ABC defines aget
method).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the check needs to be more thorough? It is totally possible to create
Dataset
objects with a data variable that has the same name as a dimension:so we could potentially break the original use case if we remove all data variables from the dictionary, because we could also remove a dimension name.
In general, I believe we need to be thorough in defining the new feature:
dict
and decide it's the padding for a specific variable?Dataset
, so either we need to specify a fallback (by dimension names), or we need to require specifying the padding for all affected variables (and raise otherwise).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very fair point, always forget that about dicts 🙃 , I'll figure out a different way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty, changed, now the input
constant_values
is not mutated in place.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a fair point, I have rewritten this section so the original
constant_variables
mapping is not mutated.Regarding the bullet points:
data var
has adim
with the same name and the user sets a value with this name in theconstant_values
then thedata var
will be padded with those values (regardless of the dim being padded) and all otherdata vars
that also contain thatdim
will be padded with that value, provided that it is thedim
being padded (a.k.a. the dim provided inpad_width
). In the tests I also show that one can mixdata var
anddim
in the same dict. In that case the value for thedata var
has priority over the value of thedim
.dims
that are being padded then they are be padded with 0, so I kept that. For example, if one sets aconstant_value
forvar1
and is padding adim
that is contained in in othervars
,var1
will be padded with that value along thedim
while all othervars
will be padded with 0 along the samedim
.I hope I didn't make it too confusing. Let me know if it answers your questions and if the logic makes sense :D