Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexing support to SkipMissing #31008

Merged
merged 4 commits into from
Feb 22, 2019
Merged

Add indexing support to SkipMissing #31008

merged 4 commits into from
Feb 22, 2019

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented Feb 8, 2019

Define IndexStyle, eachindex, keys and getindex to match indices of wrapped array, skipping missing entries. This notably allows finding the index of elements in the wrapped array using findall, findnext, argmin/max and findmin/max.

Fixes #29305. Part of #30606.

FWIW, iterating over SkipMissing using eachindex and getindex rather than iterate is slower, but not that much (and I see no reason why the compiler couldn't improve since the involved code is quite simple) EDIT: with @inbounds annotations both solutions have the same performance (but the performance difference persists without it).

julia> function f(x)
           s = 0
           @inbounds for v in x
               s += v
           end
           s
       end
f (generic function with 1 method)

julia> function g(x)
           s = 0
           @inbounds for i in eachindex(x)
               s += x[i]
           end
           s
       end
g (generic function with 1 method)

julia> x = skipmissing(ifelse.(rand(10_000) .> 0.8, missing, rand(10_000)))
Base.SkipMissing{Array{Union{Missing, Float64},1}}(Union{Missing, Float64}[0.764807, 0.225138, missing, 0.920196, missing, 0.782529, 0.936112,0.0176555, 0.581311, 0.624459    0.175387, missing, 0.393418, 0.717713, 0.898142, 0.899193, 0.177139, missing, 0.56281, 0.479712])

julia> using BenchmarkTools

julia> @assert f(x) == g(x)

julia> @btime f(x);
  24.347 μs (1 allocation: 16 bytes)

julia> @btime g(x);
  24.521 μs (1 allocation: 16 bytes)

Define IndexStyle, eachindex, keys and getindex to match indices of wrapped array,
skipping missing entries. This notably allows finding the index of elements
in the wrapped array using findall, findnext, argmin/max and findmin/max.
@nalimilan nalimilan added the missing data Base.missing and related functionality label Feb 8, 2019
@nalimilan nalimilan added needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Feb 8, 2019
@nalimilan nalimilan removed needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Feb 8, 2019
@StefanKarpinski
Copy link
Sponsor Member

I'm not sure that I'm the best person to review this. @mbauman, would you feel qualified to take a look at this and review it?

@nalimilan
Copy link
Member Author

Given the small amount of code introduced, I think what's needed is mainly a decision regarding the proposal at #30606.

Copy link
Sponsor Member

@mbauman mbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great as is — just a few optional comments on docs that I noted as I was reading through.

base/missing.jl Outdated Show resolved Hide resolved
doc/src/manual/missing.md Outdated Show resolved Hide resolved
@nalimilan
Copy link
Member Author

Thanks! I've pushed a commit to try to improve the wording.

@nalimilan
Copy link
Member Author

I've just realized that broadcast(f, skipmissing(x)) currently works and returns a vector of length count(!ismissing, x). This is somewhat inconsistent with the indexing support added in this PR: for example a[findall(skipmissing(a) .< 0)] is a bit misleading since it will always work but return values which don't correspond to anything. Probably not a big deal since it's quite convoluted and the only alternative is to throw an error, but maybe something to change in 2.0.

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 16, 2019

Yup, that'll fit in nicely with my hope to do something about wider axis support beyond just AbstractUnitRanges. Linking to #24019 as a nice example of a type that's in-between associative and array. Also interesting that it's the first example that I've seen where the default broadcastable fallback does the wrong thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing value not working with argmax and findmax
3 participants