Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure rename does not change index type #3532

Merged
merged 4 commits into from
Nov 15, 2019

Conversation

mathause
Copy link
Collaborator

Comment on lines 2670 to 2671
index = v
index.name = new_name
Copy link
Collaborator Author

@mathause mathause Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively there is a index = v._shallow_copy(name=new_name) method. It is "an internal non-public method" in pandas. Should I use it anyway?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think it's fairly stable, you're right—best practice is indeed to avoid.

I think the current code will mutate the existing index name, is that correct? We should at least test to ensure that doesn't happen. I think .rename rather than direct assignment will avoid this (and calls _shallow_copy from within pandas so we're safe)

FWIW I think we can replace https://github.com/pydata/xarray/pull/3532/files#diff-921db548d18a549f6381818ed08298c9R2662-R2667 also with .rename, but it's minor and doesn't need to be in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, there is even rename - I did not see that, will switch to it.

@mathause
Copy link
Collaborator Author

FWIW I think we can replace https://github.com/pydata/xarray/pull/3532/files#diff-921db548d18a549f6381818ed08298c9R2662-R2667 also with .rename, but it's minor and doesn't need to be in this PR

I have to look how this is done in the MultiIndex.

@mathause
Copy link
Collaborator Author

mathause commented Nov 14, 2019

I think the current code will mutate the existing index name, is that correct?

Currently the index is recreated with the same name (even if it does not change it's name). We could change the following lines

xarray/xarray/core/dataset.py

Lines 2649 to 2651 in 40588dc

new_name = name_dict.get(k, k)
if new_name not in dims_set:
continue

 new_name = name_dict.get(k, None) 
 if new_name is None or new_name not in dims_set: 
     continue

to skip doing this (not sure if there is an edge cas I am missing here).

@mathause
Copy link
Collaborator Author

No, this probably does not work if only one of the names of a MultiIndex changes...

@max-sixty
Copy link
Collaborator

Currently the index is recreated with the same name (even if it does not change it's name). We could change the following lines

I was unclear when I wrote "current code"; I was referring to proposed code in this PR. I don't think the code on master mutates the existing index, because it's making a copy. But I think this does:

                index = v
                index.name = new_name

@mathause
Copy link
Collaborator Author

Thanks, yes my approach was indeed wrong. I updated the code and the tests. I was confused for a while because orig.rename_dims(time="time_new") drops indexes while orig.rename_dims() does not.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super! Any other comments before merge?

@mathause
Copy link
Collaborator Author

Fine for me. I think MultiIndex could be treated the same. However, it does conserve CFTimeIndex already, and the test would require a bit of thinking, so let's leave that for now.

@max-sixty max-sixty merged commit 68b004f into pydata:master Nov 15, 2019
@max-sixty
Copy link
Collaborator

Perfect, thanks again @mathause

max-sixty added a commit to max-sixty/xarray that referenced this pull request Nov 16, 2019
dcherian pushed a commit that referenced this pull request Nov 16, 2019
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 17, 2019
* upstream/master:
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  Allow appending datetime & boolean variables to zarr stores (pydata#3504)
  warn if dim is passed to rolling operations. (pydata#3513)
  Deprecate allow_lazy (pydata#3435)
  Recursive tokenization (pydata#3515)
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 17, 2019
* upstream/master: (22 commits)
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  Allow appending datetime & boolean variables to zarr stores (pydata#3504)
  warn if dim is passed to rolling operations. (pydata#3513)
  Deprecate allow_lazy (pydata#3435)
  Recursive tokenization (pydata#3515)
  format indexing.rst code with black (pydata#3511)
  add missing pint integration tests (pydata#3508)
  DOC: update bottleneck repo url (pydata#3507)
  add drop_sel, drop_vars, map to api.rst (pydata#3506)
  remove syntax warning (pydata#3505)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 22, 2019
* master: (24 commits)
  Tweaks to release instructions (pydata#3555)
  Clarify conda environments for new contributors (pydata#3551)
  Revert to dev version
  0.14.1 whatsnew (pydata#3547)
  sparse option to reindex and unstack (pydata#3542)
  Silence sphinx warnings (pydata#3516)
  Numpy 1.18 support (pydata#3537)
  tweak whats-new. (pydata#3540)
  small simplification of rename from pydata#3532 (pydata#3539)
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  ...
@mathause mathause deleted the fix/rename_changes_index branch August 19, 2020 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CFTimeIndex changes to normal Index after renaming
2 participants