Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance of broadcast and collect with Union{T, Missing} #30480

Merged
merged 1 commit into from
Jan 2, 2019

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented Dec 21, 2018

Use the same pattern as in collect_to! (which is used when size is known). Follows what was done by #25828.

Fixes #30455.

Before:

julia> using BenchmarkTools

julia> v = fill(2.0, 50000);

julia> v2 = vcat(v, missing);

julia> v3 = vcat(missing, v);

julia> @btime $v .* 3.0;
  46.964 μs (2 allocations: 390.70 KiB)

julia> @btime $v2 .* 3.0;
  1.838 ms (99498 allocations: 2.33 MiB)

julia> @btime $v3 .* 3.0;
  3.569 ms (7 allocations: 439.77 KiB)

julia> @btime collect($(x for x in v if true));
  336.170 μs (17 allocations: 1.00 MiB)

julia> @btime collect($(x for x in v2 if true));
  569.995 μs (21 allocations: 2.29 MiB)

julia> @btime collect($(x for x in v3 if true));
  4.041 ms (20 allocations: 1.13 MiB)

After:

julia> @btime $v .* 3.0;
  46.908 μs (2 allocations: 390.70 KiB)

julia> @btime $v2 .* 3.0;
  1.351 ms (99498 allocations: 2.33 MiB)

julia> @btime $v3 .* 3.0;
  102.446 μs (7 allocations: 439.77 KiB)

julia> @btime collect($(x for x in v if true));
  332.648 μs (17 allocations: 1.00 MiB)

julia> @btime collect($(x for x in v2 if true));
  527.674 μs (21 allocations: 2.29 MiB)

julia> @btime collect($(x for x in v3 if true));
  625.985 μs (19 allocations: 1.13 MiB)

Use the same pattern as in collect_to_with_first! (which is used when size is known).
@nalimilan nalimilan added performance Must go faster missing data Base.missing and related functionality backport pending 1.0 labels Dec 21, 2018
@@ -926,13 +926,12 @@ function copyto_nonleaf!(dest, bc::Broadcasted, iter, state, count)
y === nothing && break
I, state = y
@inbounds val = bc[I]
S = typeof(val)
if S <: T
if val isa T || typeof(val) === T
Copy link
Member Author

@nalimilan nalimilan Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether typeof(val) === T is needed is not clear (#30125), but I've added it for consistency with collect_to!. If that's redundant, we should remove all uses at the same time.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should remove that from everywhere

@nalimilan
Copy link
Member Author

@nanosoldier runbenchmarks("array" || "broadcast" || "union", vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@nalimilan
Copy link
Member Author

Nanosoldier confirms this improves performance dramatically with Union{T, Missing}. Most regressions are clearly spurious, there's only one regarding array comprehensions which could be real (but that would be surprising since there's no type instability): is it known to be noisy?

@ararslan
Copy link
Member

@nanosoldier runbenchmarks("array" && "comprehension", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan

@ararslan
Copy link
Member

Looks like you're all clear.

@mbauman
Copy link
Sponsor Member

mbauman commented Dec 21, 2018

Am I understanding the magic here correctly? Is the key that we should not assign a type that may be a small discriminated union to a variable?

@nalimilan
Copy link
Member Author

Am I understanding the magic here correctly? Is the key that we should not assign a type that may be a small discriminated union to a variable?

That's my conclusion as well, but I have no idea why that's the case.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 21, 2018

<: has almost no optimizations implemented (there aren’t many worth doing) while isa is heavily optimized and usually gets replaced by something very simple

@ararslan
Copy link
Member

Good to merge?

@StefanKarpinski
Copy link
Sponsor Member

Bump. Who's making the decision here?

@mbauman mbauman merged commit 5a8ae94 into master Jan 2, 2019
@mbauman mbauman deleted the nl/isa branch January 2, 2019 21:48
@StefanKarpinski StefanKarpinski added triage This should be discussed on a triage call backport 1.1 and removed backport 1.0 triage This should be discussed on a triage call labels Jan 31, 2019
@StefanKarpinski StefanKarpinski added triage This should be discussed on a triage call backport 1.0 and removed triage This should be discussed on a triage call backport 1.0 labels Jan 31, 2019
@JeffBezanson JeffBezanson removed backport 1.0 triage This should be discussed on a triage call labels Jan 31, 2019
aviatesk added a commit that referenced this pull request Apr 18, 2022
Most of these conditions were introduced in #25828 and #30480 for some
performance reasons atm, but now they seem just unnecessary or even
harmful in terms of inferrability.

There doesn't seem to be any performance difference in the benchmark
used at #25828:
```julia
using BenchmarkTools
x = rand(Int, 100_000);
y = convert(Vector{Union{Int,Missing}}, x);
z = copy(y); z[2] = missing;
```

> master:
```julia
julia> @Btime map(identity, x);
  57.814 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.040 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  127.554 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  59.248 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  74.693 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  126.262 μs (4 allocations: 1.62 MiB)
```

> this commit:
```
julia> @Btime map(identity, x);
  58.668 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.013 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  126.600 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  57.531 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  69.561 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  125.578 μs (4 allocations: 1.62 MiB)
```
aviatesk added a commit that referenced this pull request Apr 19, 2022
Most of these conditions were introduced in #25828 and #30480 for some
performance reasons atm, but now they seem just unnecessary or even
harmful in terms of inferrability.

There doesn't seem to be any performance difference in the benchmark
used at #25828:
```julia
using BenchmarkTools
x = rand(Int, 100_000);
y = convert(Vector{Union{Int,Missing}}, x);
z = copy(y); z[2] = missing;
```

> master:
```julia
julia> @Btime map(identity, x);
  57.814 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.040 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  127.554 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  59.248 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  74.693 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  126.262 μs (4 allocations: 1.62 MiB)
```

> this commit:
```julia
julia> @Btime map(identity, x);
  58.668 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, y);
  94.013 μs (3 allocations: 781.31 KiB)

julia> @Btime map(identity, z);
  126.600 μs (5 allocations: 1.62 MiB)

julia> @Btime broadcast(x->x, x);
  57.531 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, y);
  69.561 μs (2 allocations: 781.30 KiB)

julia> @Btime broadcast(x->x, z);
  125.578 μs (4 allocations: 1.62 MiB)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants