Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipe, apply should call maybe_wrap_array, possibly resolve dim->axis #1130

Closed
smartass101 opened this issue Nov 17, 2016 · 6 comments
Closed
Labels

Comments

@smartass101
Copy link

While pipe and Dataset.apply (btw, why not call them both the same?) specify that they expected DataArray returning functions, it would be very convenient to have them call maybe_wrap_array anyways.

I've often tried piping functions which at first looked like ufuncs only to find out that they forgot to call __array_wrap__ (I'm looking at you np.angle). The extra call to maybe_wrap_array is cheap, does not break anything and would be very useful. It would greatly enlarge the set of functions that can be readily applied to DataArray objects without any need for writing function wrappers (motivated in part by #1080).

Since many such functions expect an axis argument, some syntax for dim -> axis resolution could be also added. I see some options

  1. check if axis argument is a string and coerce it to a number, something like
axis = kwargs.get('axis')
if axis is not None:
    if isinnstance(axis, str):
        kwargs['axis'] = darray.get_axis_num(axis)

Simple, but specifying axis='smth' is not very explicit and may mean something else for certain funcs, it assumes a lot about function signatures.

  1. similar to 1., but only if both dim and axis='dim' are specified. Still possible conflict of specific meaning, but less likely.
if kwargs.get('axis') == 'dim':
    kwargs['axis'] = darray.get_axis_num(kwargs['dim'])

Other coding might be possible.

  1. use some syntax similar to pipe((f, 'arg2', ('axis', dim)), *args, **kwargs), but that's getting complicated and less readable.

Let me know what you think and perhaps you'll come up with some nicer syntax for dim-> axis resolution.

@shoyer
Copy link
Member

shoyer commented Nov 17, 2016

On Dataset, .pipe and .apply work differently -- pipe passes the entire Dataset whereas apply passes each of the variables in turn.

I'm OK with calling maybe_wrap_array in DataArray.apply, but I'd rather not adjust DataArray.pipe, which is intentionally very simple.

@shoyer
Copy link
Member

shoyer commented Nov 17, 2016

As for dim/axis, my inclination would be to unilaterally remap any dim keyword argument to .apply to axis using the get_axis_num. Something like:

dim = kwargs.pop('dim', None)
if dim is not None:
    if 'axis' in kwargs:
        raise ValueError('cannot set both `dim` and `axis`')
    kwargs['axis'] = self.get_axis_num(dim)

In the unlikely event your function takes a dim argument that should not be remapped, you can pass in a lambda or functools.partial instead.

@smartass101
Copy link
Author

... rather not adjust DataArray.pipe, which is intentionally very simple.

It would be just one extra call to a funciton which is very simple. As I commented in #1074, I think it makes more sense to have pipe wrap arrays, because otherwise the pipe-chain might get broken, whereas with apply I'd be ok with it behaving as the python apply function which simply applies a function and nothing more. But maybe_wrap_array is simple and does not break anything, it wraps it only when it's safe.

In the unlikely event your function takes a dim argument

I think that could be quite likely as one might want to apply a DataArray-compatible function. This would force users to remember which type of "function applier" to use for a given function and might be confusing.

@shoyer
Copy link
Member

shoyer commented Nov 18, 2016

But maybe_wrap_array is simple and does not break anything, it wraps it only when it's safe.

It's safe in the sense that the xarray.DataArray contains the information from the numpy array, but it's not safe in terms of a stable type signature.

I think that could be quite likely as one might want to apply a DataArray-compatible function. This would force users to remember which type of "function applier" to use for a given function and might be confusing.

I think this is unavoidable. It's impossible to design a single entry point that works smoothly for both use cases.

@stale
Copy link

stale bot commented Jan 24, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jan 24, 2019
@shoyer
Copy link
Member

shoyer commented Jan 24, 2019

Closing this in favor of #1130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants