Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure rename does not change index type #3532

Merged
merged 4 commits into from
Nov 15, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ New Features

Bug fixes
~~~~~~~~~
- Ensure an index of type ``CFTimeIndex`` is not converted to a ``DatetimeIndex`` when
calling :py:meth:`Dataset.rename` (also :py:meth:`Dataset.rename_dims`
and :py:meth:`xr.Dataset.rename_vars`). By `Mathias Hauser <https://github.com/mathause>`_
(:issue:`3522`).
- Fix a bug in `set_index` in case that an existing dimension becomes a level variable of MultiIndex. (:pull:`3520`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
- Harmonize `_FillValue`, `missing_value` during encoding and decoding steps. (:pull:`3502`)
Expand Down
3 changes: 2 additions & 1 deletion xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2667,7 +2667,8 @@ def _rename_indexes(self, name_dict, dims_set):
verify_integrity=False,
)
else:
index = pd.Index(v, name=new_name)
index = v
index.name = new_name
Copy link
Collaborator Author

@mathause mathause Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively there is a index = v._shallow_copy(name=new_name) method. It is "an internal non-public method" in pandas. Should I use it anyway?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think it's fairly stable, you're right—best practice is indeed to avoid.

I think the current code will mutate the existing index name, is that correct? We should at least test to ensure that doesn't happen. I think .rename rather than direct assignment will avoid this (and calls _shallow_copy from within pandas so we're safe)

FWIW I think we can replace https://github.com/pydata/xarray/pull/3532/files#diff-921db548d18a549f6381818ed08298c9R2662-R2667 also with .rename, but it's minor and doesn't need to be in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, there is even rename - I did not see that, will switch to it.

indexes[new_name] = index
return indexes

Expand Down
25 changes: 25 additions & 0 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import numpy as np
import pandas as pd
import pytest
from pandas.core.indexes.datetimes import DatetimeIndex

import xarray as xr
from xarray import (
Expand All @@ -22,6 +23,7 @@
open_dataset,
set_options,
)
from xarray.coding.cftimeindex import CFTimeIndex
from xarray.core import dtypes, indexing, utils
from xarray.core.common import duck_array_ops, full_like
from xarray.core.npcompat import IS_NEP18_ACTIVE
Expand Down Expand Up @@ -2458,6 +2460,29 @@ def test_rename_vars(self):
with pytest.raises(ValueError):
original.rename_vars(names_dict_bad)

def test_rename_does_not_change_index_type(self):
# make sure CFTimeIndex is not converted to DatetimeIndex #3522

time = xr.cftime_range(start="2000", periods=6, freq="2MS", calendar="noleap")

ds = Dataset(coords={"time": time})

# make sure renaming does not change the type of the index
assert isinstance(ds.indexes["time"], CFTimeIndex)
assert isinstance(ds.rename().indexes["time"], CFTimeIndex)
assert isinstance(ds.rename_dims().indexes["time"], CFTimeIndex)
assert isinstance(ds.rename_vars().indexes["time"], CFTimeIndex)

time = pd.date_range(start="2000", periods=6, freq="2MS")

ds = Dataset(coords={"time": time})

# make sure renaming does not change the type of the index
assert isinstance(ds.indexes["time"], DatetimeIndex)
assert isinstance(ds.rename().indexes["time"], DatetimeIndex)
assert isinstance(ds.rename_dims().indexes["time"], DatetimeIndex)
assert isinstance(ds.rename_vars().indexes["time"], DatetimeIndex)

def test_swap_dims(self):
original = Dataset({"x": [1, 2, 3], "y": ("x", list("abc")), "z": 42})
expected = Dataset({"z": 42}, {"x": ("y", [1, 2, 3]), "y": list("abc")})
Expand Down