Pint support for variables #3706

keewis · 2020-01-17T23:36:45Z

I realized that most of the operations depend on Variable, which means that this should be the first step to making the other tests pass. This PR copies from #3611, making that PR a work-in-progress again until I remove those parts.

Closes New TestVariable.test_pad failure with pint 0.11 #3783
Passes black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

These are the current failures:

np.prod: not implemented by pint yet, but should work once it is
comparisons: identical fails to detect same value, but in different units (like 1 m and 100 cm) as different. This is hard to implement in a way that does not include isinstance or hasattr checks.
rank: not implemented for non-ndarrays, so maybe we should mark this as skip?
rolling_window: nputils._rolling_window uses np.lib.stride_tricks.as_strided, which cannot be overridden by pint. We probably have to use something different?
shift: ~~this tries to trim, then concatenate a filled array, but I think this should use np.pad after trimming? Are there any disadvantages to that?~~ it uses numpy.pad now which is supported since dask==0.18 or dask==0.19
concat: this was a misconception on my part, I didn't realize this was a classmethod. After fixing that this still fails because I assumed Variable.concat used the dimension names to reshape the arrays. It does not, so this fails: I rewrote the test to pass arrays that don't cause the failure

xr.Variable.concat([xr.Variable(("x", "y"), ...), xr.Variable(("y", "z"), ...)], dim="y")

pad_with_fill_value: ~~I think this is a bug in pint (see non-quantity constant_values parameter to pad hgrecco/pint#992)~~ fixed on pint master

Does anyone have any comments on these before I start fixing them? @dcherian?

keewis · 2020-01-19T23:02:40Z

I fixed the tests that were easier, but I have no idea about rolling_window / as_strided because I don't understand how it works, and identical would be easy to fix if we special-cased pint, but I'd like to avoid that

Edit: seems there are some issues with np.pad support in dask==1.2: __array_function__ support is probably only in later versions so we'd need to use da.pad instead, but that raises an error, too.

keewis · 2020-01-20T02:05:56Z

I'm not sure why, but dask==1.2 fails where later versions work:

ValueError: operands could not be broadcast together with shapes (2,0) (2,2)

Edit: Seems we can just bump dask to dask==2.1 or dask==2.2 and not worry about this. I'd wait on #3660 with this.

keewis · 2020-01-23T19:18:38Z

the easiest way to fix identical would be to pass a equiv function that forwards to duck_array_ops.array_equiv but also checks the units:

def units_equiv(a, b):
    if hasattr(a, "units") or hasattr(b, "units"):
        units_a = getattr(a, "units", None)
        units_b = getattr(b, "units", None)

	registry = getattr(units_a, "_REGISTRY", None) or getattr(b, "_REGISTRY", None)

	units_a = units_a or registry.dimensionless
	units_b = units_b or registry.dimensionless

	return units_a == units_b
    else:
	# no units at all
	return True

def equiv_with_units(a, b):
    return duck_array_ops.array_equiv(a, b) and units_equiv(a, b)

However, I'm out of ideas on how to automatically use that (I'd like to avoid having to call a.identical(b, equiv=equiv_with_units)).

There is also rolling_window that strips the units due to calling numpy.lib.stride_tricks.as_strided, but I don't know how to fix that unless we rewrite _rolling_window or dispatch differently in rolling_window.

There seems to have been some work to add a function named numpy.rolling_window or numpy.sliding_window that could then be overridden by pint, but I think that effort has stalled?

dcherian · 2020-01-23T21:29:14Z

I think your units_equiv thing could go in duck_array_ops.lazy_array_equiv which tries to check everything but the actual values themselves.

but I don't know how to fix that unless we rewrite _rolling_window or dispatch differently in rolling_window.

This may be necessary but perhaps not the most important thing right now?

keewis · 2020-01-23T21:41:50Z

I think your units_equiv thing could go in duck_array_ops.lazy_array_equiv

that could work for Dataset.identical but not Variable.identical, which I believe by default directly calls array_equiv

re rolling_window: should I leave it as xfail for now?

dcherian · 2020-01-23T21:51:39Z

array_equiv calls lazy_array_equiv before doing much else:

xarray/xarray/core/duck_array_ops.py

Lines 215 to 227 in 6d1434e

    
           def array_equiv(arr1, arr2): 
        
               """Like np.array_equal, but also allows values to be NaN in both arrays 
        
               """ 
        
               arr1 = asarray(arr1) 
        
               arr2 = asarray(arr2) 
        
               lazy_equiv = lazy_array_equiv(arr1, arr2) 
        
               if lazy_equiv is None: 
        
                   with warnings.catch_warnings(): 
        
                       warnings.filterwarnings("ignore", "In the future, 'NAT == x'") 
        
                       flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2)) 
        
                       return bool(flag_array.all()) 
        
               else: 
        
                   return lazy_equiv

dcherian · 2020-01-23T21:51:52Z

re rolling_window: should I leave it as xfail for now?

Fine by me. :)

keewis · 2020-01-23T23:19:20Z

hrmm... I had investigated this before and thought I remembered correctly. I can't put it in lazy_array_equiv, though, since array_equiv is also used by equals (which should not check the units, only the dimensionality)

dcherian · 2020-01-24T03:34:04Z

I guess we need to either pass a kwarg regarding checking units/dimensionality down to lazy_array_equiv or add a units_equiv(only_dimensionality=True/False) check to

xarray/xarray/core/variable.py

Lines 1645 to 1648 in 6d1434e

    
           try: 
        
               return self.dims == other.dims and ( 
        
                   self._data is other._data or equiv(self.data, other.data) 
        
               )

I think a separate units_equiv function may be cleaner?

Note that we explicitly use lazy_array_equiv in concat so it'd be nice to have something that could be easily used there too:

xarray/xarray/core/concat.py

Lines 194 to 208 in 6d1434e

    
           for k in getattr(datasets[0], subset): 
        
               if k not in concat_over: 
        
                   equals[k] = None 
        
                   variables = [ds.variables[k] for ds in datasets] 
        
                   # first check without comparing values i.e. no computes 
        
                   for var in variables[1:]: 
        
                       equals[k] = getattr(variables[0], compat)( 
        
                           var, equiv=lazy_array_equiv 
        
                       ) 
        
                       if equals[k] is not True: 
        
                           # exit early if we know these are not equal or that 
        
                           # equality cannot be determined i.e. one or all of 
        
                           # the variables wraps a numpy array 
        
                           break

keewis · 2020-01-24T14:07:08Z

I think a separate units_equiv function may be cleaner?

I agree. However, I'm reluctant to add that function since that would be the first pint dependent code we have except from the unit tests (but do tell me if that's not an issue).

I'm inclined to provide some kind of hook for wrapped libraries instead (allow them to define a method like metadata_identical or something that does the check for us) so the code in identical would become something like

def metadata_identical(arr1, arr2):
    if hasattr(arr1, "metadata_identical"):
        return arr1.metadata_identical(arr2)
    elif hasattr(arr2, "metadata_identical"):
        return arr2.metadata_identical(arr1)
    else:
        return True

return (
    self.dims == other.dims
    and (self._data is other._data or equiv(self.data, other.data))
    and metadata_identical(self.data, other.data)
)

Note that we explicitly use lazy_array_equiv in concat so it'd be nice to have something that could be easily used there too

I don't think that's a problem since what calls lazy_array_equiv is Variable.identical and if identical works correctly this should, too.

dcherian · 2020-01-24T15:31:32Z

OK let's see what @shoyer thinks

keewis · 2020-02-02T18:03:50Z

gentle ping, @shoyer

keewis · 2020-02-05T14:53:09Z

these issues (rolling_window and identical) seem too big to discuss here, so in order to keep moving forward let's mark these tests as xfail and discuss this in their own issues / PRs.

dcherian · 2020-02-05T15:57:10Z

let's mark these tests as xfail and discuss this in their own issues / PRs.

👍 to smaller PRs!

xarray/core/variable.py

xarray/core/duck_array_ops.py

shoyer · 2020-02-07T06:33:50Z

xarray/core/variable.py

+            if isinstance(mask, bool):
+                mask = not mask
+            else:
+                mask = ~mask


Could you flip the argument order rather than adding this? I’m a little puzzles here.

If the concern here is about consistency when applying ~ to bool objects and boolean dtype arrays, explicitly calling np.logical_not is a good alternative.

But it does feel a little weird to me to see this here. Maybe changing duck_array_ops.notnull would have the same effect?

same as the fillna issue above, in order to get the results in the units of data, we need to flip the arguments and for that I need invert the mask (if there is a different way to flip the arguments without inverting, please do tell me).

I tried to use mask = ~mask, but ~ does not work as expected for bool. I'll use np.logical_not instead.

shoyer · 2020-02-07T06:52:26Z

Thanks for pinging me again here (I get a lot of GitHub notifications). identical is an interesting case!

I think the current behavior (1 meter is identical to 100 centimeters) is arguably consistent with how identical currently works, which only check equality between array elements.

Right now, identical considers numbers of different data types equal, e.g., int 1 is identical to float 1.0``. I think units arguably have a similar to role to data types -- hopefully eventually libraries like pint could be implemented via custom NumPy dtypes, rather than needing to reimplement all of NumPy.

Did this come up in the context of some other downstream use-case, or is this just something that occurred to you for the sake of consistency?

keewis · 2020-02-07T11:21:55Z

when writing the unit tests, we decided the definition of equals to be different from identical (see #3238 (comment) and #3238 (comment)) which would be consistent with the behaviour of identical when a "units" attribute is set.

As far as I know there was no real use case (except from being able to use assert_identical in the unit tests), so we can change that.

cc @jthielen

jthielen · 2020-02-07T19:17:08Z

@keewis For what it is worth, as far as identical goes, I think it makes the most sense to treat unit matching like dtype matching as @shoyer mentioned. Although, I had interpreted @max-sixty's comment #3238 (comment) to mean that dtypes are compared, it appears from @shoyer's comment #3706 (comment) that this not the case. If strict unit checking is required, I think that may be better served by an additional assert unit == "meter" type statement.

keewis · 2020-02-19T17:56:09Z

If strict unit checking is required, I think that may be better served by an additional assert unit == "meter" type statement.

which is what I've been doing with assert_units_equal. I'll change the tests for identical, then.

Also, concerning the commutative operations: should we wait for hgrecco/pint#1019 and remove the flipped parameters or should we merge as is and possibly revert once pint implements a type casting hierarchy?

shoyer · 2020-02-21T22:23:31Z

Also, concerning the commutative operations: should we wait for hgrecco/pint#1019 and remove the flipped parameters or should we merge as is and possibly revert once pint implements a type casting hierarchy?

I don't anticipate any performance cost to this, just a small decrease in readability. So I think this is fine to merge for now with comments in the relevant sections and we can revert it later. My only suggestion is to add a note like TODO: revert after https://github.com/hgrecco/pint/issues/1019 is fixed to each comment.

max-sixty · 2020-02-21T23:05:37Z

Although, I had interpreted @max-sixty's comment #3238 (comment) to mean that dtypes are compared, it appears from @shoyer's comment #3706 (comment) that this not the case.

I was wrong; I should have at least realized I didn't know. Apologies if that caused wasted time @jthielen

Separately: should assert_identical assert that the dtypes are the same? I'd have thought there should be some way of testing whether dtypes are consistent with expectations, and I'd have thought assert_identical would be it?

xarray/tests/test_units.py

keewis · 2020-02-23T15:55:31Z

@shoyer, I added the notes

@max-sixty, @jthielen: the identical tests will be skipped for now. At the moment that does not make any difference since identical is the same as equals with additionally checking the attributes (any changes to it are not limited to pint, so I guess we should open a new issue for discussion).

dcherian · 2020-02-23T19:13:03Z

Test failures look unrelated. Thanks @keewis

…under * upstream/master: (71 commits) Optimize isel for lazy array equality checking (pydata#3588) pin msgpack (pydata#3793) concat now handles non-dim coordinates only present in one dataset (pydata#3769) Add new h5netcdf backend phony_dims kwarg (pydata#3753) always use dask_array_type for isinstance calls (pydata#3787) allow formatting the diff of ndarray attributes (pydata#3728) Pint support for variables (pydata#3706) Format issue template comment as md comment (pydata#3790) Avoid running test_open_mfdataset_list_attr without dask (pydata#3780) remove seaborn.apionly compatibility (pydata#3749) Python 3.8 CI (pydata#3727) PKG: Explicitly add setuptools dependency (pydata#3628) update whats-new Typo in Universal Functions section (pydata#3663) Release v0.15.0 fix setup.cfg Documentation fixes (pydata#3732) Remove extra && in PR template (pydata#3730) Remove garbage text inserted in DASK_LICENSE (pydata#3729) Avoid unsafe use of pip (pydata#3726) ...

keewis added 7 commits January 17, 2020 14:33

get fillna tests to pass

e5a6632

get the _getitem_with_mask tests to pass

9414fe3

silence the behavior change warning of pint

12d2fe4

don't use 0 as fill value since that has special behaviour

077d67a

use concat as a class method

e03fac8

use np.pad after trimming instead of concatenating a filled array

6b65a76

rewrite the concat test to pass appropriate arrays

88320ef

use da.pad when dealing with dask arrays

1e07dce

keewis requested a review from dcherian January 20, 2020 17:16

keewis added 2 commits January 22, 2020 19:35

Merge branch 'master' into pint-support-variables

3942193

mark the failing pad tests as xfail when on a current pint version

ce572de

This was referenced Jan 26, 2020

support for units with pint #3594

Open

Pint support for DataArray #3643

Merged

keewis added 3 commits February 5, 2020 15:55

Merge branch 'master' into pint-support-variables

8bfc347

update whats-new.rst

3d16f2e

fix the import order

a8cf968

keewis changed the title ~~WIP: Pint support for variables~~ Pint support for variables Feb 5, 2020

shoyer reviewed Feb 7, 2020

View reviewed changes

keewis added 3 commits February 7, 2020 12:27

use np.logical_not instead

f6eca88

use duck_array_ops to provide pad

72241e5

add comments explaining the order of the arguments to where

2ebfa6b

jthielen mentioned this pull request Feb 7, 2020

Make commutative operations commutative in both magnitude and units hgrecco/pint#1019

Open

keewis mentioned this pull request Feb 21, 2020

New TestVariable.test_pad failure with pint 0.11 #3783

Closed

keewis commented Feb 23, 2020

View reviewed changes

xarray/tests/test_units.py Outdated Show resolved Hide resolved

keewis added 3 commits February 23, 2020 16:26

mark the flipped parameter changes with a todo

0a0c2d3

skip the identical tests

4cadc52

remove the warnings filter

b51caa4

dcherian merged commit 47476eb into pydata:master Feb 23, 2020

keewis deleted the pint-support-variables branch February 24, 2020 00:09

This was referenced Mar 7, 2020

Pint support for top-level functions #3611

Merged

Add DataArray.pad, Dataset.pad, Variable.pad #3596

Merged

keewis mentioned this pull request Mar 23, 2020

reword the whats-new entry for unit support #3878

Merged

1 task

keewis mentioned this pull request Jul 17, 2020

Add cupy support #4212

Open

keewis mentioned this pull request Dec 23, 2020

xr.testing.assert_equal does not test for dtype #4727

Open

TomNicholas mentioned this pull request Jul 3, 2021

assert_equal does not handle wrapped duck arrays well #5570

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pint support for variables #3706

Pint support for variables #3706

keewis commented Jan 17, 2020 •

edited

Loading

keewis commented Jan 19, 2020 •

edited

Loading

keewis commented Jan 20, 2020 •

edited

Loading

keewis commented Jan 23, 2020

dcherian commented Jan 23, 2020

keewis commented Jan 23, 2020

dcherian commented Jan 23, 2020

dcherian commented Jan 23, 2020

keewis commented Jan 23, 2020

dcherian commented Jan 24, 2020

keewis commented Jan 24, 2020

dcherian commented Jan 24, 2020

keewis commented Feb 2, 2020

keewis commented Feb 5, 2020

dcherian commented Feb 5, 2020

shoyer Feb 7, 2020

shoyer Feb 7, 2020

keewis Feb 7, 2020

shoyer commented Feb 7, 2020

keewis commented Feb 7, 2020

jthielen commented Feb 7, 2020

keewis commented Feb 19, 2020

shoyer commented Feb 21, 2020

max-sixty commented Feb 21, 2020

keewis commented Feb 23, 2020

dcherian commented Feb 23, 2020

Pint support for variables #3706

Pint support for variables #3706

Conversation

keewis commented Jan 17, 2020 • edited Loading

keewis commented Jan 19, 2020 • edited Loading

keewis commented Jan 20, 2020 • edited Loading

keewis commented Jan 23, 2020

dcherian commented Jan 23, 2020

keewis commented Jan 23, 2020

dcherian commented Jan 23, 2020

dcherian commented Jan 23, 2020

keewis commented Jan 23, 2020

dcherian commented Jan 24, 2020

keewis commented Jan 24, 2020

dcherian commented Jan 24, 2020

keewis commented Feb 2, 2020

keewis commented Feb 5, 2020

dcherian commented Feb 5, 2020

shoyer Feb 7, 2020

Choose a reason for hiding this comment

shoyer Feb 7, 2020

Choose a reason for hiding this comment

keewis Feb 7, 2020

Choose a reason for hiding this comment

shoyer commented Feb 7, 2020

keewis commented Feb 7, 2020

jthielen commented Feb 7, 2020

keewis commented Feb 19, 2020

shoyer commented Feb 21, 2020

max-sixty commented Feb 21, 2020

keewis commented Feb 23, 2020

dcherian commented Feb 23, 2020

keewis commented Jan 17, 2020 •

edited

Loading

keewis commented Jan 19, 2020 •

edited

Loading

keewis commented Jan 20, 2020 •

edited

Loading