Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing coordinate doesn't show in the dimensions #1499

Closed
lewisacidic opened this issue Aug 3, 2017 · 10 comments
Closed

Reusing coordinate doesn't show in the dimensions #1499

lewisacidic opened this issue Aug 3, 2017 · 10 comments
Labels

Comments

@lewisacidic
Copy link
Contributor

For a DataArray, when reusing a coordinate for multiple dimensions (is this expected usage?), it only shows once in the repr:

>>> x = xr.IndexVariable(data=range(5), dims='x')
>>> da = xr.DataArray(data=np.random.randn(5, 5), coords={'x': x}, dims=('x', 'x'))
>>> da
<xarray.DataArray (x: 5)>
array([[ 0.704139,  0.135638, -0.84717 , -0.580167,  0.95755 ],
       [ 0.966196, -0.126107,  0.547461,  1.075547, -0.477495],
       [-0.507956, -0.671571,  1.271085,  0.007741, -0.37878 ],
       [-0.969021, -0.440854,  0.062914, -0.3337  , -0.775898],
       [ 0.86893 ,  0.227861,  1.831021,  0.702769,  0.868767]])
Coordinates:
  * x        (x) int64 0 1 2 3 4

I think it should be

<xarray.DataArray (x: 5, x: 5)>
array([[ ... ]])
Coordinates:
  * x        (x) int64 0 1 2 3 4

Otherwise, everything appears to work exactly as I would expect.

This isn't an issue for Datasets:

>>> xr.Dataset({'da': da})
<xarray.Dataset>
Dimensions:  (x: 5)
Coordinates:
  * x        (x) int64 0 1 2 3 4
Data variables:
    da       (x, x) float64 0.08976 0.1049 -1.291 -0.4605 -0.005165 -0.3259 ...

Thanks!

@shoyer shoyer added the bug label Aug 3, 2017
@shoyer
Copy link
Member

shoyer commented Aug 3, 2017

Yes, this is a bug.

But note that you will probably run into other issues with repeated dimension names. It's not explicitly forbidden by the data model, but various operation (like broadcasting) will fail, possibly in unpredictable ways.

Another example would be da.mean('x'), which may or may not be do the right thing. I don't think anyone has thought careful about these operations.

There's no reason why you shouldn't be able to support these for at least some subset of operations, but it will need some attention from an interested party.

@jhamman
Copy link
Member

jhamman commented Aug 3, 2017

I'm concerned about trying to support any of this behavior. I think any use of the datamodel with duplicate dimensions will be very buggy. Logic like this is all over the place:

def get_axis_num(self, dim)
    if isinstance(dim, basestring):
        return self._get_axis_num(dim)

In your example, you'd want this method to return (0, 1) but it would just return 0. Unless we have a strong argument against, I would think we should deprecate any of this behavior.

@acrosby
Copy link

acrosby commented Aug 25, 2017

This is important for some of my use cases, it's something we do a lot--as noted it is not forbidden by the netcdf format or CF standards.

@jhamman
Copy link
Member

jhamman commented Aug 30, 2017

@acrosby - any interest in helping add support for this use case in xarray? Starting point would be to write some tests that target this use case. Then there will be bugs to fix...

@acrosby
Copy link

acrosby commented Aug 31, 2017

@jhamman I'm interested in helping, but I definitely can't be timely unless we can get a project here at work that I can leverage for time. What were you thinking for test coverage?

@shoyer
Copy link
Member

shoyer commented Aug 31, 2017

What were you thinking for test coverage?

Simple things that match your use-cases. For example: indexing, arithmetic, aggregating over an axis, transpose, concatenating, saving/loading files, plotting. The xarray docs could be a good place to start to make sure you aren't missing anything big.

@stale
Copy link

stale bot commented Aug 1, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Aug 1, 2019
@dcherian dcherian removed the stale label Aug 1, 2019
@acrosby
Copy link

acrosby commented Aug 2, 2019

I think I side with @jhamman now on this issue. Labeled access to this kind of dataarray and broadcasting operations are not useful in my cases, and working in straight numpy or pandas ends up making more sense. We just drop the appropriate variables upon opening the relevant files.

@chrisroat
Copy link
Contributor

chrisroat commented Sep 6, 2019

Just flying by and dropping a note because I just ran into this with Imaris Open files being created by a microscope camera. I wanted to use one of my favorite packages (xarray) to dig into the data, and noted the dimension reuse. Not a big blocker, but this functionality of the data format might be growing in usage.

More details on Imaris.
http://open.bitplane.com/Default.aspx?tabid=268

@max-sixty
Copy link
Collaborator

Closed by #8491

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants