Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Add ==(::NullableArray, ::NullableArray) and ==(::NullableArray, ::AbstractArray) #84

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nalimilan
Copy link
Member

Fixes #82.

@codecov-io
Copy link

Current coverage is 90.18%

Merging #84 into master will decrease coverage by -0.09% as of 0663641

@@            master     #84   diff @@
======================================
  Files           12      12       
  Stmts          586     611    +25
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit            529     551    +22
  Partial          0       0       
- Missed          57      60     +3

Review entire Coverage Diff as of 0663641

Powered by Codecov. Updated on successful CI builds.

@nalimilan nalimilan changed the title Add ==(::NullableArray, ::NullableArray) Add ==(::NullableArray, ::NullableArray) and ==(::NullableArray, ::AbstractArray) Oct 12, 2015
@nalimilan
Copy link
Member Author

Let's merge this? I need it for JuliaData/DataFrames.jl#1008.

@davidagold
Copy link
Contributor

We should first define hash too, right?

@davidanthoff
Copy link

This is probably also effected by the discussion in #85? I.e. the semantics for == should be the same for scalar and arrays, right?

@davidagold
Copy link
Contributor

I.e. the semantics for == should be the same for scalar and arrays, right?

I don't think so, since NullableArray is not itself a Nullable type. So ==(X::NullableArray, Y::NullableArray) should return a Bool, whereas I think legitimate cases have been made for returning either a Bool or a Nullable{Bool} from ==(x::Nullable, y::Nullable).

@davidanthoff
Copy link

I don't think so, since NullableArray is not itself a Nullable type. So ==(X::NullableArray, Y::NullableArray) should return a Bool, whereas I think legitimate cases have been made for returning either a Bool or a Nullable{Bool} from ==(x::Nullable, y::Nullable).

@davidagold I agree with you (I'm always in favor of returning a Bool from any comparison operator), but that is not how this PR implements it. ==(X::NullableArray, Y::NullableArray) returns a Nullable{Bool} if this PR gets merged.

@davidagold
Copy link
Contributor

@davidanthoff Hmm, so it does. Good point. I don't think that's appropriate.

This may be unpopular, but I think the behavior described in this PR is more appropriately expressed by isequal, module returning a Nullable{Bool}. There is something of an analogy between NaN and Nullable, and for the former, isequal provides a more "relaxed" behavior than ==. So I think it is appropriate that this sort of relaxed behavior should belong to isequal in the case of comparing Nullables as well. Indeed, Base already implements this sort of relaxed comparison behavior for isequal(x::Nullable, y::Nullable), hence isequal(X::NullableArray, Y::NullableArray) inherits precisely the desired behavior from the AbstractArray interface. However, this is at the expense of having

julia> isequal(Nullable(), Nullable())
true

@nalimilan How do you feel about using isequal in place of == in JuliaData/DataFrames.jl#1008?

@nalimilan
Copy link
Member Author

The current behavior of isequal is indeed correct. What I'm proposing here is a definition of == which is consistent with that used for Nullable and for AbstractArray. Indeed, ==(::AbstractArray, ::AbstractArray) calls == on each pair of elements of the arrays; that definition must be extended to return a Nullable{Bool} to handle missing values. Else, we would be treating nulls as false, which we don't currently do anywhere else.

If we were to return a Bool from ==(::Nullable, ::Nullable), then we would also have to change the present PR. But if we keep the current definition, I think the most consistent choice is to return a Nullable{Bool} for both.

@nalimilan How do you feel about using isequal in place of == in JuliaData/DataFrames.jl#1008?

I'm actually using isequal now, which is mostly what is needed when writing tests. But == for DataFrame recursively calls == on its elements, which currently triggers an error for NullableArrays. (That said, that function probably isn't very useful outside testing, where isequal is more appropriate. It's mostly needed for completeness.)

@davidagold
Copy link
Contributor

I see. And the behavior implemented herein is different from that of isequal, anyway. But won't returning Nullable{Bool} break the DataFrame comparison, then? Or have you already taken that into account? And will comparing two DataFrames return a Nullable{Bool}?

@nalimilan
Copy link
Member Author

Yes, ==(::DataFrame, ::DataFrame) would also have to return a Nullable{Bool}, which requires some tweaking.

@davidanthoff
Copy link

Yes, ==(::DataFrame, ::DataFrame) would also have to return a Nullable{Bool}, which requires some tweaking.

I see why having ==(::Nullable,::Nullable) return a Nullable{Bool} can sometimes be useful (although I think it complicates the design way too much to be worth the hassle), but I can't come up with any scenario where comparing two DataFrames or two NullableArrays via == returning Nullable{Bool} would actually be useful. Maybe just lack of imagination, but is there any scenario where you would compare these objects and not be in a situation where you actually want a boolean value?

@nalimilan
Copy link
Member Author

Honestly, I don't even know when comparing two data frames could really be useful apart from testing or checking results. My proposal just tries to preserve consistency with ==(::Nullable, ::Nullable), which is IMO the fundamental decision.

@davidanthoff
Copy link

Yeah, but checking results might be quite a common and useful thing? I don't know, you load data and manipulate data in some way (via Query.jl or jplyr.jl), and then you change your query to be more concise and want to make sure you still produce the same result set. That is something that might be quite common, but it would require a Bool as a result to be useful.

@nalimilan
Copy link
Member Author

As I said, there's isequal for that already, which might be enough.

@davidagold
Copy link
Contributor

then you change your query to be more concise and want to make sure you still produce the same result set. That is something that might be quite common, but it would require a Bool as a result to be useful.

That's probably not the best way to convince oneself of the equality of those queries, anyway =p

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants