Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent date parsing when year is 2- or 4-digits and order "mdy" #556

Closed
malwinare opened this issue Jun 22, 2017 · 2 comments
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@malwinare
Copy link

Hi all,

For dates with mixed 2- and 4-digit year, the date-parsing behavior of parse_date_time() is inconsistent when orders = "mdy". But, it works fine when, for example, order = "ymd".

Examples

parse_date_time("apr.12.50", orders = "mdy")   # "2050-04-12" 

# inconsistent
parse_date_time(c("apr.12.50","apr.2.2016"), orders = "mdy")  #  "0050-04-12"  "2016-04-02"

# works fine:
parse_date_time(c("50.apr.12","2016.apr.2"), orders = "ymd")  # "2050-04-12" "2016-04-02"

R version: 3.4.0
lubridate: lubridate_1.6.0.9009

Thanks in advance.
Best,
Malwina

@vspinu vspinu added the bug an unexpected problem or unintended behavior label Jun 23, 2017
@cderv
Copy link
Contributor

cderv commented Jun 27, 2017

I tried to find where it could come from but it is not obvious.

Training of lubridate found two formats %b.%d.%Y and %b.%d.%y. lubridate:::.select_formats prioritizes the first one. Then lubridate::.striptime pass parsing through base::strptime that do not gives back NA on "apr.12.50" when used with format %b.%d.%Y. So %b.%d.%y is never used and %b.%d.%Y is used on both.

It is not the case with orders = "ymd". Difference is it uses c_parser with %y.%Om.%d and %Y.%Om.%d formats. with orders = "mdy", guess_format does not replace m with Om because of grepl("[^O][mbB]", orders)

Hope this little investigation could help fix the bug.

@vspinu
Copy link
Member

vspinu commented Oct 2, 2017

Thanks @cderv for spotting this. This was a corner case bug due to the scoring in .select_formats indeed. When y occurred at the end of the format, it wasn't detected by that regexp. Now fixed!

@vspinu vspinu closed this as completed in 6468bd7 Oct 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants