Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

== does not compare columns of ZonedDateTimes correctly #84

Open
spurll opened this issue Aug 17, 2017 · 7 comments
Open

== does not compare columns of ZonedDateTimes correctly #84

spurll opened this issue Aug 17, 2017 · 7 comments

Comments

@spurll
Copy link

spurll commented Aug 17, 2017

Comparing two ZonedDateTimes that represent the same "instant" (but in different time zones) with == returns true, but comparing them with isequal returns false.

julia> using TimeZones, DataFrames, DataTables

julia> ZonedDateTime(2016, 1, 1, TimeZone("America/Winnipeg")) == ZonedDateTime(2016, 1, 1, 6, TimeZone("UTC"))
true

julia> isequal(ZonedDateTime(2016, 1, 1, TimeZone("America/Winnipeg")), ZonedDateTime(2016, 1, 1, 6, TimeZone("UTC")))
false

DataFrames.jl maintains this convention:

julia> using TimeZones, DataFrames

julia> df_1 = DataFrame(id=[1,2], date=[ZonedDateTime(2016, 1, 1, 0, TimeZone("America/Winnipeg")), ZonedDateTime(2016, 1, 1, 1, TimeZone("America/Winnipeg"))])
2×2 DataFrames.DataFrame
│ Row │ id │ date                      │
├─────┼────┼───────────────────────────┤
│ 112016-01-01T00:00:00-06:00 │
│ 222016-01-01T01:00:00-06:00 │

julia> df_2 = DataFrame(id=[1,2], date=[ZonedDateTime(2016, 1, 1, 6, TimeZone("UTC")), ZonedDateTime(2016, 1, 1, 7, TimeZone("UTC"))])
2×2 DataFrames.DataFrame
│ Row │ id │ date                      │
├─────┼────┼───────────────────────────┤
│ 112016-01-01T06:00:00+00:00 │
│ 222016-01-01T07:00:00+00:00 │

julia> df_1 == df_2
true

julia> isequal(df_1, df_2)
false

...but DataTables.jl doesn't:

julia> using TimeZones, DataTables

julia> dt_1 = DataTable(id=[1,2], date=[ZonedDateTime(2016, 1, 1, 0, TimeZone("America/Winnipeg")), ZonedDateTime(2016, 1, 1, 1, TimeZone("America/Winnipeg"))])
2×2 DataTables.DataTable
│ Row │ id │ date                      │
├─────┼────┼───────────────────────────┤
│ 112016-01-01T00:00:00-06:00 │
│ 222016-01-01T01:00:00-06:00 │

julia> dt_2 = DataTable(id=[1,2], date=[ZonedDateTime(2016, 1, 1, 6, TimeZone("UTC")), ZonedDateTime(2016, 1, 1, 7, TimeZone("UTC"))])
2×2 DataTables.DataTable
│ Row │ id │ date                      │
├─────┼────┼───────────────────────────┤
│ 112016-01-01T06:00:00+00:00 │
│ 222016-01-01T07:00:00+00:00 │

julia> dt_1 == dt_2
false

It's no real mystery why, given the fairly terse definition of ==:

@compat(Base.:(==))(dt1::AbstractDataTable, dt2::AbstractDataTable) = isequal(dt1, dt2)

I think that supporting == comparisons (rather than just doing isequals all the way down) would be preferable in this case.

Version information:

julia> Pkg.status("DataTables")
 - DataTables                    0.0.3

julia> versioninfo()
Julia Version 0.6.0-rc3.0
Commit ad290e93e4* (2017-06-07 11:53 UTC)
@ararslan
Copy link
Member

Yeah definitely. isqual and == are separate functions in Base for a reason.

@spurll
Copy link
Author

spurll commented Aug 17, 2017

I'm sure I can get a PR in for this in fairly short order.

@ararslan
Copy link
Member

That would be fantastic. Thanks!

@spurll
Copy link
Author

spurll commented Aug 18, 2017

Well, I figured out why DataTables.jl has just been using isequal.

This is actually more complex to solve than I initially anticipated, owing to the fact that == checks between NullableArrays are broken.

julia> using NullableArrays

julia> a = NullableArray(1:3)
3-element NullableArrays.NullableArray{Int64,1}:
 1
 2
 3

julia> b = NullableArray(1:3)
3-element NullableArrays.NullableArray{Int64,1}:
 1
 2
 3

julia> a == b
ERROR: TypeError: non-boolean (Nullable{Bool}) used in boolean context
Stacktrace:
 [1] ==(::NullableArrays.NullableArray{Int64,1}, ::NullableArrays.NullableArray{Int64,1}) at ./abstractarray.jl:1527

This, in turn, is because == comparisons between Nullables return Nullable{Bool}, rather than Bool.

In my opinion, the best fix for this would be to provide == for NullableArrays and work with that. There was a PR to fix this in 2015, but it was never merged: JuliaStats/NullableArrays.jl#84

I think I'm going to go ahead and take a shot at a PR here, but I suspect it isn't going to be pretty.

@nalimilan
Copy link
Member

The problem is with Nullable, and fixing it in NullableArrays would require type piracy. == with NullableArray is kinda forced to be inconsistent or not to work at all because == throws an errror for Nullable in Base. The solution to this will be to move either to Union{T, Null} (in DataFrames) or to DataValue{T}.

@davidanthoff
Copy link
Contributor

I think the definition for == in DataValues.jl is ok at this point. I'm also in the process of adding a DataValueArray that also fixes this, and then I'm also going to have a DataValueTable that is based on that. Essentially that will be exactly the same design as the current DataTable approach, except it will use DataValue instead of Nullable to get around the restrictions that we have due to Nullable being in base and Nullable not being special cased for the data science stack. I'm optimistic that I should be able to release soon, but on the flipside, classes start tomorrow, so who knows :)

@spurll
Copy link
Author

spurll commented Aug 22, 2017

Makes sense to me. I've made changes to my code that's working with DataTables to work around this issue for the moment, and I won't spend time trying to make == work as expected (at least until Nullables behave themselves a little better). Thanks, folks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants