Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element-wise in operator #5212

Closed
nalimilan opened this issue Dec 21, 2013 · 10 comments
Closed

Element-wise in operator #5212

nalimilan opened this issue Dec 21, 2013 · 10 comments

Comments

@nalimilan
Copy link
Member

This occurred to me while reading mbauman@d83dad3:

to select elements of ``A`` equal to 1 or 2 use ``(A .== 1) | (A .== 2)``).

What do you think about having an element-wise in operator? Despite not having a dedicated symbol, I think in is a true operator, and it's the only one without an element-wise version.

This would allow writing e.g. A .in (1, 2) instead of the convoluted form above (and thus reduce priority mistakes). It's very useful when working with data where you often need to test for equality against a longer series of values; or simply when testing for equality against a set of values stored in a variable.

I grant you .in looks a little weird, but not more than .!=. ;-)

@JeffBezanson
Copy link
Member

I don't think vectorizing everything is the way forward. For one thing, you might mean either

[ x in (1,2) for x in A]

or

[ A in x for x in (1,2) ]

Adding a dot is not expressive enough.

@StefanKarpinski
Copy link
Member

Agreed. Vectorization is generally only safe when the base definition only makes sense for scalars. For operations where one of more arguments can be containers, it's a nightmare.

@nalimilan
Copy link
Member Author

I can't find any context where [ A in x for x in (1,2) ] could make sense. Do you have an example of that? Since in is only useful when the second operand may contain more than one element, (1,2) would have to be replaced with something like a set of arrays. Sounds like a quite rare and complex situation, I don't think the ambiguity could arise.

@JeffBezanson
Copy link
Member

Apologies, but I don't think there is going to be a .in operator.

@StefanKarpinski
Copy link
Member

I can't find any context where [ A in x for x in (1,2) ] could make sense. Do you have an example of that? Since in is only useful when the second operand may contain more than one element, (1,2) would have to be replaced with something like a set of arrays. Sounds like a quite rare and complex situation, I don't think the ambiguity could arise.

Sure, you can see that it doesn't make sense, but it's still syntactically ambiguous.

@nalimilan
Copy link
Member Author

Syntactically it's not ambiguous, since you could define .in to always apply over elements of the first operand. But if you don't like it, that's your call. :-) Anyway, this kind of thing could be added later if the need for it becomes clearer.

@StefanKarpinski
Copy link
Member

Yes, let's see if it's a real pain-point and at that point, it will be clearer what to do.

@tristanmarkwell
Copy link

I just bumped up against this while working with filtering a DataFrame. I spent 20 minutes or so searching online for a way to make it work because I didn't believe it wasn't possible. When I write
antecedent = events[events[:EVENT_TYPE] .== "A", :],
I also expected this to work:
consequent = events[events[:EVENT_TYPE] .in ["B", "C"], :].
Instead I fear I'm stuck with
consequent = events[(events[:EVENT_TYPE] .== "B") | (events[:EVENT_TYPE] .== "C"), :]
or
consequent = events[[x in ["B", "C"] for x in events[:EVENT_TYPE]], :].
The first doesn't scale well, and the second, while reasonably compact, seems like a large mental jump from the original. Has a better solution arisen in the last two years ("in" is a difficult work to search for)?

@Ismael-VC
Copy link
Contributor

@tristanmarkwell I would do something like this:

julia> using DataFrames

julia> events = DataFrame(EVENT_TYPE = rand('a':'z', 100));

julia> function Base.in{T1,T2<:Integer}(xs::AbstractArray{T1}, ys::Range{T2})
           Bool[any(x in ys) for x in xs]
       end
in (generic function with 19 methods)

julia> function Base.in{T1,T2}(xs::AbstractArray{T1}, ys::Range{T2})
           Bool[any(x in ys) for x in xs]
       end
in (generic function with 20 methods)

julia> function Base.in{T1,T2}(xs::AbstractArray{T1}, ys::AbstractArray{T2})
           Bool[any(x in ys) for x in xs]
       end
in (generic function with 21 methods)

julia> @time events[(events[:EVENT_TYPE] .== 'a') | (events[:EVENT_TYPE] .== 'z'), :];
  0.370084 seconds (96.39 k allocations: 4.353 MB, 8.23% gc time)

julia> @time events[(events[:EVENT_TYPE] .== 'a') | (events[:EVENT_TYPE] .== 'z'), :];
  0.000048 seconds (36 allocations: 10.109 KB)

julia> @time events[events[:EVENT_TYPE] in ['a', 'z'], :];
  0.046778 seconds (32.42 k allocations: 1.471 MB)

julia> @time events[events[:EVENT_TYPE] in ['a', 'z'], :];
  0.000028 seconds (24 allocations: 1.656 KB)

julia> @time events[events[:EVENT_TYPE] in 'a':'z', :];
  0.047705 seconds (70.72 k allocations: 2.992 MB)

julia> @time events[events[:EVENT_TYPE] in 'a':'z', :];
  0.000040 seconds (124 allocations: 5.891 KB)

Now imagine its .in instead of in so we don't break any code, but .in is not a valid identifier name, neither is .∈, which could be used exclusively for this purposes.

@Ismael-VC
Copy link
Contributor

@StefanKarpinski

Yes, let's see if it's a real pain-point and at that point, it will be clearer what to do.

It is a a bit cumbersome with large expressions and slower because of extra memory allocations.

@JeffBezanson for .in/.∈ to just work it would have to be something like:

  • Bool[any(x in ys) for x in xs] .is/``.===and.isa` would be the only two other good cases IMO.

With out the Bool it returns a Vector{Any}

So it's not ambiguous with neither:

  • [ x in (1,2) for x in A]
  • [ A in x for x in (1,2) ]

Please excuse me if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants