Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date and DateTime support for Missing is missing #28570

Closed
schnirz opened this issue Aug 10, 2018 · 25 comments · Fixed by #37222
Closed

Date and DateTime support for Missing is missing #28570

schnirz opened this issue Aug 10, 2018 · 25 comments · Fixed by #37222
Labels
dates Dates, times, and the Dates stdlib module missing data Base.missing and related functionality

Comments

@schnirz
Copy link

schnirz commented Aug 10, 2018

Currently, there is no support for missing values when it comes to working with Date and DateTime.

For example, the Date constructor throws an error when called on a missing input value:

Date(missing, "YY-mm-dd")

MethodError: Cannot `convert` an object of type Missings.Missing to an object of type Int64
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.

Stacktrace:
 [1] Date(::Missings.Missing, ::String) at ./dates/types.jl:312
 [2] include_string(::String, ::String) at ./loading.jl:522

Similarly, computing the difference between two dates fails of one of them is missing:

Date(2018,8,10) - missing

MethodError: no method matching -(::Date, ::Missings.Missing)
Closest candidates are:
  -(::DateTime, ::Missings.Missing) at In[112]:8
  -(::Date, ::Base.Dates.Day) at dates/arithmetic.jl:78
  -(::Date, ::Base.Dates.Week) at dates/arithmetic.jl:76
  ...

Stacktrace:
 [1] include_string(::String, ::String) at ./loading.jl:522

Presumably, all unary and binary operators need to be overloaded, either in Base or in Dates. Any ideas and suggestions for how to address this issue would be more than welcome.

@pdeffebach
Copy link
Contributor

It's worth noting that other constructors don't have overloads for missing

Int(missing)
>ERROR: MethodError: Cannot `convert` an object of type Missings.Missing to an object of type Int64
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.

String(missing)
> ERROR: MethodError: Cannot `convert` an object of type Missings.Missing to an object of type String
This may have arisen from a call to the constructor String(...),
since type constructors fall back to convert methods.

So it is likely that the fix is to add an overload just to the - function and whatever other unary operators are defined for Dates, but leave the parsing up to the user.

@schnirz
Copy link
Author

schnirz commented Aug 10, 2018

I think that there is a case to be made for overloading Date and DateTime constructors.

As an example, consider a user is trying to parse a an array or a column of a data frame containing strings as well as missing values to dates. Asking the user to explicitly handle missing values for this particular case while all other operations handle missing values quietly without the user having to explicitly deal with them would constitute an inconsistent user experience.

@pdeffebach
Copy link
Contributor

Could you clarify which operations handle missing values quietly? For constructors. If, for example, you have an array ["4", "5", missing, "7"], you wouldn't have anything work automatically with parsing missings.

Deliberate handling of missings is something Julia makes you more sensitive to than in other languages. For instance this is (probably) recommended behavior.

df2 = @linq df |>
       transform(date = [ismissing(x) ? missing : Date(x) for x in :a])

This might be a pain, but there is a lot to be said for this really being a feature because it encourages people to be aware of how missing values propogate.

R constructors do propagate missing values however, so this is new for people coming over from there.

@pdeffebach
Copy link
Contributor

But we should still work on a PR for the Unary operators!

@pdeffebach
Copy link
Contributor

pdeffebach commented Aug 10, 2018

Interestingly, it looks like someone snuck in a bunch of broadcasting behavior that should probably throw errors in 1.0

(-)(x::AbstractArray{T}, y::T) where {T<:TimeType} = x .- y

(-)(x::AbstractArray{T}, y::T) where {T<:TimeType} = x .- y
(-)(y::T, x::AbstractArray{T}) where {T<:TimeType} = y .- x

We might want to learn more about that before a PR to avoid having to write out so many overloadings of unary operators.

Posted on discourse here

@nalimilan nalimilan added the missing data Base.missing and related functionality label Aug 15, 2018
@nalimilan
Copy link
Member

I don't think we should have a special case for dates: either all constructors should accept missing, or none of them. Since it would be weird to have a constructor return an object not of the requested type (as @pdeffebach) noted, I don't think it would be appropriate to accept missing.

However, we should make it easier to enable functions/constructors to accept missing and return missing, an operation which is often called "lifting". See #26661.

@nalimilan nalimilan added the dates Dates, times, and the Dates stdlib module label Aug 15, 2018
@pdeffebach
Copy link
Contributor

Could you clarify more about the functions issue? Should I file a distinct issue for + and - with dates, even if we don't touch the constructor? Or should we start idiomatically using lift?

It seems like a lift statement everywhere is at odds with the idea of missings propagating easily through basic operators.

@nalimilan
Copy link
Member

Yes, + and - should (most probably) propagate missing for all types. Reopening so that we track this part of the original description.

@nalimilan nalimilan reopened this Aug 15, 2018
@schnirz
Copy link
Author

schnirz commented Aug 16, 2018

I'm still not 100% convinced that accepting missing values for the Date and DateTime constructors is problematic, since these constructors are used differently from most other constructors (for most types using the parse function seems to be the way to go from my experience).

I am, however, ok with using a lift statement/functor. I have, in fact, done so myself recently for dealing with missing input to DateTime and it worked just fine. It would seem necessary to provide a generic lift statement via Base.missing and to provide some documentation around this approach to explain to new users that this is the way you are supposed to handle missing values when a function/constructor doesn't explicitly handle them itself.

I also agree with pdeffebach that this approach, while feasible, is at odds with the promises made when missing was announced as the new de facto missing value type in Julia. As a matter of fact, the need to lift values was levelled as a complaint against Nullable or Option types and was presented as a motivation for the new missing type! (link)

@KristofferC
Copy link
Member

KristofferC commented Aug 16, 2018

As far as I recall, the argument was that a few basic operations would propagate missing but for anything other than that one would manually lift.

@pdeffebach
Copy link
Contributor

As far as I recall, the argument was that a few basic operations would propagate missing

Without the convert and constructor change we might have been able to include parse and convert in our "list of basic functions" that propagate missing. However I think that overloading parse and convert wouldn't propagate seemlessly now, right?

I think lift is a good idea. Given that missing is in base, a ? syntax would be nice.

@nalimilan
Copy link
Member

Without the convert and constructor change we might have been able to include parse and convert in our "list of basic functions" that propagate missing. However I think that overloading parse and convert wouldn't propagate seemlessly now, right?

I don't think the removal of the constructor fallback on convert changes anything. convert(Union{T, Missing}, x) should work fine, what we need is a short syntax to write Union{T,Missing}. Regarding parse, we could also define methods so that parse(Union{T,Missing}, x) works, possibly with an argument (keyword?) to specify which string(s) should be considered as missing. See discussion at #25131 and linked issue.

@pdeffebach
Copy link
Contributor

pdeffebach commented Aug 16, 2018

Thanks for the feedback.

Taking Dates as the example here. Overloading parse only works If Date() exclusively calls parse and convert, and not _tryparse or other custom functions. Or we establish best practices where one should always use parse when creating dates. But that seems less than ideal.

But we can't expect to track down every intermediate function that a constructor calls if it isn't just parse all the way down. We would have to use lift or something similar.

@schnirz
Copy link
Author

schnirz commented Aug 16, 2018

If lift is the way to go, then I suppose only the binary operators in Dates need to be overloaded in order to be consistent with binary operators defined in Base, correct?

Also, +1 for the ? syntax for lift!

@nalimilan
Copy link
Member

Taking Dates as the example here. Overloading parse only works If Date() exclusively calls parse and convert, and not _tryparse or other custom functions. Or we establish best practices where one should always use parse when creating dates. But that seems less than ideal.

But we can't expect to track down every intermediate function that a constructor calls if it isn't just parse all the way down. We would have to use lift or something similar.

Yes, my idea is that people will need to use parse(Union{Date,Missing}, x) or lift(Date)(x) (depending on whether missing values are represented as "missing" or missing, respectively), not that the Date constructor would propagate missingness.

@schnirz
Copy link
Author

schnirz commented Aug 16, 2018

I personally prefer lift(Date)(x) over parse(Union{Date,Missing}, x). I have never used parse for Date orDateTime before. Also, the documentation only discusses parsing Date and DateTime via their respective constructors, so expecting people to parse Date and DateTime using the parse function when missing values are a concern seems odd.

@quinnj
Copy link
Member

quinnj commented Oct 4, 2018

So are we going to do anything here? Define - and +? Or leave things to lifting?

@StefanKarpinski
Copy link
Member

This seems to demand a more general approach to me, rather than adding more and more methods for missing all over the place? When would it end? Never.

@JeffBezanson
Copy link
Member

As a matter of fact, the need to lift values was levelled as a complaint against Nullable or Option types and was presented as a motivation for the new missing type!

This is a slight confusion of terms --- lifting values in the case of Nullable or Option types means you can't just use e.g. an Int. You have to use Nullable(2) instead of 2. Lifting functions, on the other hand, refers to wrapping a function so that it propagates missing values. The need for that is common to both Nullable/Option and the Missing approaches.

Since + and - support missing for other types, it seems reasonable to define them for Dates too.

@nalimilan
Copy link
Member

I agree, once we have decided that a given operator should propagate missing values, we should follow this behavior for all types that implement it. The question of which operators should propagate missing is a separate one, and indeed something like lift (#26661) would be useful for those which don't propagate.

@pdeffebach
Copy link
Contributor

This seems to demand a more general approach to me, rather than adding more and more methods for missing all over the place? When would it end? Never.

Take a look at an issue here where I (misguidedly) bring up missings in NaNMath.jl. The package maintainer fairly says that they don't anticipate their package being used in a context where missing values would be a concern.

That is to say, the space of "values that should always propagate missings" might be quite small. But of course means we would inevitably have gotchas where we have to say "No, that type doesn't support missings, use lift." The inconsistency would be tiring.

@nalimilan
Copy link
Member

AFAICT NaNMath is a different question, as NaNMath.mean is a completely separate function from Statistics.mean. Better discuss that in the relevant issue.

@ExpandingMan
Copy link
Contributor

I've just rather unexpectedly run into the lack of methods for + and - involving TimeType.

Is there any consensus around this? I can make a PR if everyone agrees to support it for + and -. I'm certainly open to arguments about why we should not, but as the lack of these methods came as a surprise to me today, I suspect they'll come as a surprise to many other people as well.

@EconometricsBySimulation
Copy link
Contributor

EconometricsBySimulation commented Aug 14, 2019

@ExpandingMan I think there are some good methods using + or -. Only you need to make whatever type you are adding is a date type such as now()+Day(1). I think the reasoning is sound since you might want to add a second or a minute to the date/time. Whether it should be implied what the user is trying to do when you have a date object plus an integer might be a reasonable suggestion. Date(now()) + 1 Of course it is easy to add such methods: Base.:+(x::Date, y::Integer) = x + Day(y)

@nilshg
Copy link
Contributor

nilshg commented Aug 26, 2020

I'm in the same boat as @ExpandingMan here - I've actually been using passmissing(i)(x, y) for a while, but it seems uncontroversial (and largely agreed in this thread) that Date(2020, 8, 26) + missing should return missing?

Are there any reasons not to add this?

nilshg added a commit to nilshg/julia that referenced this issue Aug 26, 2020
@nalimilan nalimilan linked a pull request Aug 27, 2020 that will close this issue
staticfloat pushed a commit that referenced this issue Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dates Dates, times, and the Dates stdlib module missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants