Element-wise `in` operator #5212

nalimilan · 2013-12-21T22:45:31Z

This occurred to me while reading mbauman@d83dad3:

to select elements of ``A`` equal to 1 or 2 use ``(A .== 1) | (A .== 2)``).

What do you think about having an element-wise in operator? Despite not having a dedicated symbol, I think in is a true operator, and it's the only one without an element-wise version.

This would allow writing e.g. A .in (1, 2) instead of the convoluted form above (and thus reduce priority mistakes). It's very useful when working with data where you often need to test for equality against a longer series of values; or simply when testing for equality against a set of values stored in a variable.

I grant you .in looks a little weird, but not more than .!=. ;-)

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2013-12-21T23:22:03Z

I don't think vectorizing everything is the way forward. For one thing, you might mean either

[ x in (1,2) for x in A]

or

[ A in x for x in (1,2) ]

Adding a dot is not expressive enough.

StefanKarpinski · 2013-12-22T01:09:42Z

Agreed. Vectorization is generally only safe when the base definition only makes sense for scalars. For operations where one of more arguments can be containers, it's a nightmare.

nalimilan · 2013-12-22T10:21:45Z

I can't find any context where [ A in x for x in (1,2) ] could make sense. Do you have an example of that? Since in is only useful when the second operand may contain more than one element, (1,2) would have to be replaced with something like a set of arrays. Sounds like a quite rare and complex situation, I don't think the ambiguity could arise.

JeffBezanson · 2013-12-22T15:38:14Z

Apologies, but I don't think there is going to be a .in operator.

StefanKarpinski · 2013-12-22T15:54:05Z

I can't find any context where [ A in x for x in (1,2) ] could make sense. Do you have an example of that? Since in is only useful when the second operand may contain more than one element, (1,2) would have to be replaced with something like a set of arrays. Sounds like a quite rare and complex situation, I don't think the ambiguity could arise.

Sure, you can see that it doesn't make sense, but it's still syntactically ambiguous.

nalimilan · 2013-12-22T16:02:31Z

Syntactically it's not ambiguous, since you could define .in to always apply over elements of the first operand. But if you don't like it, that's your call. :-) Anyway, this kind of thing could be added later if the need for it becomes clearer.

StefanKarpinski · 2013-12-22T16:04:47Z

Yes, let's see if it's a real pain-point and at that point, it will be clearer what to do.

tristanmarkwell · 2015-12-31T19:00:55Z

I just bumped up against this while working with filtering a DataFrame. I spent 20 minutes or so searching online for a way to make it work because I didn't believe it wasn't possible. When I write
antecedent = events[events[:EVENT_TYPE] .== "A", :],
I also expected this to work:
consequent = events[events[:EVENT_TYPE] .in ["B", "C"], :].
Instead I fear I'm stuck with
consequent = events[(events[:EVENT_TYPE] .== "B") | (events[:EVENT_TYPE] .== "C"), :]
or
consequent = events[[x in ["B", "C"] for x in events[:EVENT_TYPE]], :].
The first doesn't scale well, and the second, while reasonably compact, seems like a large mental jump from the original. Has a better solution arisen in the last two years ("in" is a difficult work to search for)?

Ismael-VC · 2015-12-31T20:46:14Z

@tristanmarkwell I would do something like this:

julia> using DataFrames

julia> events = DataFrame(EVENT_TYPE = rand('a':'z', 100));

julia> function Base.in{T1,T2<:Integer}(xs::AbstractArray{T1}, ys::Range{T2})
           Bool[any(x in ys) for x in xs]
       end
in (generic function with 19 methods)

julia> function Base.in{T1,T2}(xs::AbstractArray{T1}, ys::Range{T2})
           Bool[any(x in ys) for x in xs]
       end
in (generic function with 20 methods)

julia> function Base.in{T1,T2}(xs::AbstractArray{T1}, ys::AbstractArray{T2})
           Bool[any(x in ys) for x in xs]
       end
in (generic function with 21 methods)

julia> @time events[(events[:EVENT_TYPE] .== 'a') | (events[:EVENT_TYPE] .== 'z'), :];
  0.370084 seconds (96.39 k allocations: 4.353 MB, 8.23% gc time)

julia> @time events[(events[:EVENT_TYPE] .== 'a') | (events[:EVENT_TYPE] .== 'z'), :];
  0.000048 seconds (36 allocations: 10.109 KB)

julia> @time events[events[:EVENT_TYPE] in ['a', 'z'], :];
  0.046778 seconds (32.42 k allocations: 1.471 MB)

julia> @time events[events[:EVENT_TYPE] in ['a', 'z'], :];
  0.000028 seconds (24 allocations: 1.656 KB)

julia> @time events[events[:EVENT_TYPE] in 'a':'z', :];
  0.047705 seconds (70.72 k allocations: 2.992 MB)

julia> @time events[events[:EVENT_TYPE] in 'a':'z', :];
  0.000040 seconds (124 allocations: 5.891 KB)

Now imagine its .in instead of in so we don't break any code, but .in is not a valid identifier name, neither is .∈, which could be used exclusively for this purposes.

Ismael-VC · 2015-12-31T22:04:29Z

@StefanKarpinski

Yes, let's see if it's a real pain-point and at that point, it will be clearer what to do.

It is a a bit cumbersome with large expressions and slower because of extra memory allocations.

@JeffBezanson for .in/.∈ to just work it would have to be something like:

Bool[any(x in ys) for x in xs] .is/``.===and.isa` would be the only two other good cases IMO.

With out the Bool it returns a Vector{Any}

So it's not ambiguous with neither:

[ x in (1,2) for x in A]
[ A in x for x in (1,2) ]

Please excuse me if I'm wrong.

JeffBezanson closed this as completed Dec 21, 2013

nalimilan mentioned this issue Jan 1, 2016

Support non-vectorized syntax in @where JuliaData/DataFramesMeta.jl#39

Closed

Ismael-VC mentioned this issue Jan 3, 2016

Allow users to define "dot" vectorized operators. #14544

Closed

nalimilan mentioned this issue Jan 3, 2016

Add vectorized "in" (.∈) and "notin" (.∉) #12406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Element-wise `in` operator #5212

Element-wise `in` operator #5212

nalimilan commented Dec 21, 2013

JeffBezanson commented Dec 21, 2013

StefanKarpinski commented Dec 22, 2013

nalimilan commented Dec 22, 2013

JeffBezanson commented Dec 22, 2013

StefanKarpinski commented Dec 22, 2013

nalimilan commented Dec 22, 2013

StefanKarpinski commented Dec 22, 2013

tristanmarkwell commented Dec 31, 2015

Ismael-VC commented Dec 31, 2015

Ismael-VC commented Dec 31, 2015

Element-wise in operator #5212

Element-wise in operator #5212

Comments

nalimilan commented Dec 21, 2013

JeffBezanson commented Dec 21, 2013

StefanKarpinski commented Dec 22, 2013

nalimilan commented Dec 22, 2013

JeffBezanson commented Dec 22, 2013

StefanKarpinski commented Dec 22, 2013

nalimilan commented Dec 22, 2013

StefanKarpinski commented Dec 22, 2013

tristanmarkwell commented Dec 31, 2015

Ismael-VC commented Dec 31, 2015

Ismael-VC commented Dec 31, 2015

Element-wise `in` operator #5212

Element-wise `in` operator #5212