Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future of DataArray.rename #6704

Open
4 tasks
headtr1ck opened this issue Jun 18, 2022 · 11 comments
Open
4 tasks

Future of DataArray.rename #6704

headtr1ck opened this issue Jun 18, 2022 · 11 comments

Comments

@headtr1ck
Copy link
Collaborator

What is your issue?

In #6665 the question came up what to do with DataArray.rename in light of the new index refactor.

To be consistent with Dataset we should introduce a

  • DataArray.rename_dims
  • DataArray.rename_vars
  • DataArray.rename

Several open questions about the behavior (Similar things apply to Dataset.rename{, _dims, _vars}):

  • Should rename_dims also rename indexes (dimension coordinates)?
  • Should rename_vars also rename the DataArray?
  • What to do if the DataArray has the same name as one of its coordinates?
  • Should rename still rename everything (like it is now) or only the name (Possibly with some deprecation cycle)?

The current implementation of DataArray.rename is a bit inconsistent:

As stated by @max-sixty in #6665 (comment)_:

  • rename operates on DataArray as described in Update DataArray.rename + docu #6665 (comment) I'm less keen on "different types have different semantics", and here a positional arg would mean a DataArray rename, and kwarg would mean var rename. But it does work locally to DataArray quite well.
  • rename only exists on DataArrays for the name of the DataArray, and we use rename_vars & rename_dims for both DataArrays & Datasets. So Dataset.rename is soft-deprecated.
@headtr1ck headtr1ck added the needs triage Issue that has not been reviewed by xarray team member label Jun 18, 2022
@max-sixty max-sixty added needs discussion and removed needs triage Issue that has not been reviewed by xarray team member labels Jun 19, 2022
@max-sixty
Copy link
Collaborator

This is the sort of thing we can discuss briefly on our dev call — which you would be more than welcome to attend and participate in @headtr1ck.

It's at 8.30AM PT every second Wednesday, including this coming Wednesday. I think there's an issue about it (though am on a plane wifi connection so currently difficult to search...). Unfortunately I've been poor at attendance recently given my schedule, but I have been trying to rearrange things so I can start attending again.

@keewis
Copy link
Collaborator

keewis commented Jun 19, 2022

that's #4001

@headtr1ck
Copy link
Collaborator Author

I feel honored thanks! But given that I will become a father any day now, I don't think I will be able to join regularly...

@max-sixty
Copy link
Collaborator

I feel honored thanks! But given that I will become a father any day now, I don't think I will be able to join regularly...

Congrats on the forthcoming arrival!

Ofc, no stress if you can't make the meeting, and obv zero stress if you can't make lots of them.

@mathause
Copy link
Collaborator

There is also anther issue discussing this: #4825

@shoyer
Copy link
Member

shoyer commented Jun 22, 2022

Should we call it rename_vars or rename_coords?

The later might make more sense, but then it wouldn't mirror Dataset.

@max-sixty
Copy link
Collaborator

As discussed on the call:

  • We'd like to move to .rename and .rename_vars as separate methods on DataArray; the latter renaming coords.
  • We would "soft-deprecate" .rename for renaming coords — i.e. remove documentation, but not give any warnings, at least for a while. This was a successful approach for .drop.
  • .rename_vars is a slightly odd name given there are only coords. But it's consistent with xr.Dataset.

After the call, I realize that I didn't mention that Dataset.rename is a synonym for Dataset.rename_vars. Would be also soft deprecate Dataset.rename?

@shoyer
Copy link
Member

shoyer commented Jun 22, 2022

Dataset.rename does both variables and dimensions. That seems useful in many cases. I think it also makes more sense than Dataset.drop does, given that variables and dimensions often use the same names -- whereas drop mixed up variable names and index values.

@benbovy
Copy link
Member

benbovy commented Jun 27, 2022

Should rename_dims also rename indexes (dimension coordinates)?

Yes, renaming dimensions and/or coordinates should be properly handled by indexes. Xarray's Index abstract class provides a rename method that may (should) be implemented in subclasses.

For example, PandasIndex and PandasMultiIndex both have a dim attribute that must match the dimension name (this attribute is required as in a near future it will be possible to set indexes for non-dimension coordinates).

PandasIndex.rename (PandasMultiIndex.rename) also renames the underlying pandas index (levels) according to the new coordinate name(s).

@benbovy
Copy link
Member

benbovy commented Jun 28, 2022

Should rename_dims also rename indexes (dimension coordinates)?

Sorry I think I misunderstood the question. If that means "should rename_dims also rename the (dimension) coordinates", then I think the answer should be no. With the explicit indexes refactor coordinates and indexes are less tightly coupled, i.e., in the mid/long term we want to drop the concept of a "dimension coordinate with an implicit index". So I think rename_dims should only rename dimensions (and update indexes metadata accordingly, cf. my previous comment).

@max-sixty
Copy link
Collaborator

Found this after hitting the issue (and realize I have commented a few times a year ago...)

Note that there's some overlap between this & #6238 & #4825.

@benbovy 's comments at #6238 (comment) & #4825 (comment) seem compelling to me...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants