Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make map() and broadcast() faster for Union{T, Missing/Nothing} element types #25828

Merged
merged 1 commit into from
Feb 6, 2018

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented Jan 31, 2018

Storing typeof(el)/eltype(B) in a variable in the hot part of the loop kills performance because despite being inferred as equal to Union{Type{T}, Type{Missing/Nothing}}, it is marked as ::Any. Only using typeof/eltype when isa fails works around the problem and makes map/broadcast(x->x, ::Vector{Union{Int, Missing}}) about 10 times faster.

Fixes #25799.

Before this PR:

using BenchmarkTools
x = rand(Int, 100_000);
y = convert(Vector{Union{Int,Missing}}, x);
z = copy(y); z[2] = missing;

julia> @btime map(identity, x);
  102.686 μs (3 allocations: 781.34 KiB)

julia> @btime map(identity, y);
  606.995 μs (4 allocations: 781.36 KiB)

julia> @btime map(identity, z);
  7.245 ms (7 allocations: 1.62 MiB)

# Use x->x rather than identity to avoid fast path
julia> @btime broadcast(x->x, x);
  108.737 μs (14 allocations: 781.95 KiB)

julia> @btime broadcast(x->x, y);
  1.225 ms (49 allocations: 783.25 KiB)

julia> @btime broadcast(x->x, z);
  7.318 ms (52 allocations: 1.62 MiB)

After this PR:

julia> @btime map(identity, x);
  104.782 μs (3 allocations: 781.34 KiB)

julia> @btime map(identity, y);
  505.128 μs (4 allocations: 781.36 KiB)

julia> @btime map(identity, z);
  623.634 μs (7 allocations: 1.62 MiB)

julia> @btime broadcast(x->x, x);
  102.317 μs (14 allocations: 781.95 KiB)

julia> @btime broadcast(x->x, y);
  607.368 μs (49 allocations: 783.25 KiB)

julia> @btime broadcast(x->x, z);
  703.995 μs (52 allocations: 1.62 MiB)

I will add these benchmarks to BaseBenchmarks to ensure we don't regress. EDIT: see JuliaCI/BaseBenchmarks.jl#176.

EDIT: FWIW, DataArray takes 11ms with map in the benchmarks above, but only 300μs with broadcast. R (with C implementation) takes about 400μs (measured with x^2 since there's no vectorized identity operation, but timings are similar to identity in Julia). So we're not far.

Storing typeof(el)/eltype(B) in a variable in the hot part of the loop kills
performance because despite being inferred as equal to Union{Type{T}, Type{Missing/Nothing}},
it is marked as ::Any. Only using typeof()/eltype() when isa() fails works around the problem
and makes map/broadcast(x->x, ::Vector{Union{Int, Missing}}) about 10 times faster.
@nalimilan nalimilan added the missing data Base.missing and related functionality label Jan 31, 2018
@@ -573,12 +573,11 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T
i = offs
while !done(itr, st)
el, st = next(itr, st)
S = typeof(el)
if S === T || S <: T
if el isa T || typeof(el) === T
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for typeof(el) === T is a bit of a mystery, but it was weird to need it in the existing code already. See #25799.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that was some attempt at providing a fast-path for uninferred code. But S === T is often much harder to infer than S <: T. And since isconcretetype(S), it won't be faster at runtime either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet if I remove that check I get a worse performance than before with map(identity, y) (see what I noted at #25799). That's really weird, but...

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keno recently changed the result of el isa T from Const(true) to Conditional(:el, T, Union{}) – maybe there's was a mistake in the implementation of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the relevant commit? Can I try reverting it?

# store the result
if S <: eltype(B)
if V isa eltype(B)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

This syntax allows the compiler to see / "prove" that S <: T if this branch is taken, and thus likely eliminate dispatch in the assignment below

@@ -573,12 +573,11 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T
i = offs
while !done(itr, st)
el, st = next(itr, st)
S = typeof(el)
if S === T || S <: T
if el isa T || typeof(el) === T
@inbounds dest[i] = el::T
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you remove the typeof(el) === T condition above, inference will be able to infer the el::T result and you can remove this redundant type-check

@ararslan
Copy link
Member

ararslan commented Feb 3, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

ProcessExitedException()

cc @ararslan

@ararslan
Copy link
Member

ararslan commented Feb 4, 2018

That was after upgrading to the more recent HTTP version, which was supposed to help the intermittent ignoring, but instead it segfaults... So I've pinned at an older HTTP version for now.

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@ararslan
Copy link
Member

ararslan commented Feb 4, 2018

Those are some nice improvements!

@JeffBezanson JeffBezanson merged commit 2d39767 into master Feb 6, 2018
@JeffBezanson JeffBezanson deleted the nl/map branch February 6, 2018 16:33
aviatesk added a commit that referenced this pull request Apr 18, 2022
Most of these conditions were introduced in #25828 and #30480 for some
performance reasons atm, but now they seem just unnecessary or even
harmful in terms of inferrability.

There doesn't seem to be any performance difference in the benchmark
used at #25828:
```julia
using BenchmarkTools
x = rand(Int, 100_000);
y = convert(Vector{Union{Int,Missing}}, x);
z = copy(y); z[2] = missing;
```

> master:
```julia
julia> @Btime map(identity, x);
  57.814 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.040 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  127.554 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  59.248 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  74.693 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  126.262 μs (4 allocations: 1.62 MiB)
```

> this commit:
```
julia> @Btime map(identity, x);
  58.668 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.013 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  126.600 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  57.531 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  69.561 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  125.578 μs (4 allocations: 1.62 MiB)
```
aviatesk added a commit that referenced this pull request Apr 19, 2022
Most of these conditions were introduced in #25828 and #30480 for some
performance reasons atm, but now they seem just unnecessary or even
harmful in terms of inferrability.

There doesn't seem to be any performance difference in the benchmark
used at #25828:
```julia
using BenchmarkTools
x = rand(Int, 100_000);
y = convert(Vector{Union{Int,Missing}}, x);
z = copy(y); z[2] = missing;
```

> master:
```julia
julia> @Btime map(identity, x);
  57.814 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.040 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  127.554 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  59.248 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  74.693 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  126.262 μs (4 allocations: 1.62 MiB)
```

> this commit:
```julia
julia> @Btime map(identity, x);
  58.668 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.013 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  126.600 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  57.531 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  69.561 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  125.578 μs (4 allocations: 1.62 MiB)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants