Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: the pd.to_timedelta() incorrectly parses the 'M' as minutes. #34968

Closed
2 of 3 tasks
IcToxi opened this issue Jun 24, 2020 · 9 comments
Closed
2 of 3 tasks

BUG: the pd.to_timedelta() incorrectly parses the 'M' as minutes. #34968

IcToxi opened this issue Jun 24, 2020 · 9 comments
Labels
Bug Docs Frequency DateOffsets Timedelta Timedelta data type

Comments

@IcToxi
Copy link

IcToxi commented Jun 24, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>> pd.Timestamp.today() + pd.to_timedelta('1 M')

Problem description

The documentation in version 1.0.5 is described as follows:

Possible values: (‘Y’, ‘M’, ‘W’, ‘D’, ‘days’, ‘day’, ‘hours’, hour’, ‘hr’, ‘h’,...

Since in such a particular order, it should be understood as year, month, and week.

But running my code gives the result of a minute for 'M'. However, 'Y' produces the correct parsing result.

I've seen some discussion there about parameters optimization, and whatever the outcome of the discussion, you shouldn't continue to place this bug here, please either change the function or change the documentation.

And perhaps you might consider separating to_datedelta api to reduce complexity.

Thanks.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1127.10.1.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.5
numpy : 1.18.1

@IcToxi IcToxi added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 24, 2020
@MJafarMashhadi
Copy link
Contributor

MJafarMashhadi commented Jun 24, 2020

I see that 'M' unit is depricated for the same ambiguity (#16344) and the docs are updated in 095ad41 which will appear in version 1.1.0. But it raises an error for pd.to_timedelta(1, unit='M') but not for pd.to_timedelta('1 M').

@selasley
Copy link
Contributor

From timedeltas.pyx

        if unit == 'M':
            # To parse ISO 8601 string, 'M' should be treated as minute,
            # not month
            unit = 'm'

I'm working on an update to the to_timedelta docs to document this behavior

@IcToxi
Copy link
Author

IcToxi commented Jun 25, 2020

Thanks for the reply.

Sorry, I would like to ask before the issue is closed. Do you no longer consider supporting the delta calculation of the month?

I thought I no longer need to import dateutil every time.

@MJafarMashhadi
Copy link
Contributor

MJafarMashhadi commented Jun 25, 2020

Correct me if I'm wrong; based on the replies on that issue, I guess the largest supported unit is a week now (?). One reason is months can have different lengths, 28 to 31 days; which one should we pick? It's ambiguous!

@jreback
Copy link
Contributor

jreback commented Jun 25, 2020

Timedelta are fixed time representations. You can simply use offsets for any real date begin/end manipulatinos which is why they exist. see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects

I thought we deprecated M, Y here but maybe not.

I'll repurpose this issue to be a doc one if you'd like to submit a PR to update.

@jreback jreback added Docs Frequency DateOffsets Timedelta Timedelta data type labels Jun 25, 2020
@jreback jreback added this to the Contributions Welcome milestone Jun 25, 2020
@MJafarMashhadi
Copy link
Contributor

@selasley, you're already working on a PR I guess you might want to comment the word "take" to assign this issue to yourself.
Also, note that Y is deprecated

@selasley
Copy link
Contributor

I submitted PR #34979. Haven't submitted often so I'm not 100% sure I'm doing it correctly. Comments and improvements are welcome.

Using M, y or Y for the units argument in to_datetime raises a ValueError in pandas 1.0.5 and in the 1.1.0 master branch. They are accepted in the arg argument in both versions but their behavior has confused some users. M and Y are no longer listed as accepted units in the docstring for to_timedelta in the 1.1 master branch. The PR is an attempt to document the behavior of M and Y in the arg argument.

>>> pd.__version__
'1.0.5'
>>> pd.to_timedelta('1M')
Timedelta('0 days 00:01:00')
>>> pd.to_timedelta('1y')
Timedelta('365 days 05:49:12')
>>> pd.to_timedelta('1Y')
Timedelta('365 days 05:49:12')
>>> pd.to_timedelta(1, unit='M')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/selasley/venvs/py38/lib/python3.8/site-packages/pandas/core/tools/timedeltas.py", line 86, in to_timedelta
    raise ValueError(
ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.
>>> pd.to_timedelta(1, unit='Y')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/selasley/venvs/py38/lib/python3.8/site-packages/pandas/core/tools/timedeltas.py", line 86, in to_timedelta
    raise ValueError(
ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.

>>> pd.__version__
'1.1.0.dev0+1946.g36b781872'
>>> help(pd.to_timedelta)
...
    unit : str, optional
        Denotes the unit of the arg for numeric `arg`. Defaults to ``"ns"``.
    
        Possible values:
    
        * 'W'
        * 'D' / 'days' / 'day'
        * 'hours' / 'hour' / 'hr' / 'h'
        * 'm' / 'minute' / 'min' / 'minutes' / 'T'
        * 'S' / 'seconds' / 'sec' / 'second'
        * 'ms' / 'milliseconds' / 'millisecond' / 'milli' / 'millis' / 'L'
        * 'us' / 'microseconds' / 'microsecond' / 'micro' / 'micros' / 'U'
        * 'ns' / 'nanoseconds' / 'nano' / 'nanos' / 'nanosecond' / 'N'
...

@MJafarMashhadi
Copy link
Contributor

MJafarMashhadi commented Jun 26, 2020

I'm getting confused too, I believe having Y depreciated means that pd.to_timedelta('1Y') should raise an error. Just like pd.to_timedelta(1, unit='Y') does!

@TomAugspurger TomAugspurger removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 4, 2020
MarcoGorelli pushed a commit that referenced this issue Sep 26, 2020
…#34979)

* DOC: Add notes about M and Y to to_timedelata documentation.  (#34968)

* DOC: Update notes about M and Y to to_timedelata documentation.  (#34968)

* DOC: Add notes about M and Y to to_timedelata documentation.  (#34968)
@MarcoGorelli
Copy link
Member

closed by #34979

kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
…dev#34968) (pandas-dev#34979)

* DOC: Add notes about M and Y to to_timedelata documentation.  (pandas-dev#34968)

* DOC: Update notes about M and Y to to_timedelata documentation.  (pandas-dev#34968)

* DOC: Add notes about M and Y to to_timedelata documentation.  (pandas-dev#34968)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs Frequency DateOffsets Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

6 participants