Simplify computation of return type in broadcast #39295

nalimilan · 2021-01-17T17:33:12Z

Since we rely on inference, we can use _return_type directly instead of via a complex machinery.

As suggested by @mbauman at #39185 (comment).

Since we rely on inference, we can use `_return_type` directly instead of via a complex machinery.

nalimilan · 2021-01-17T17:34:56Z

test/broadcast.jl

-    g() = (a = 1; Broadcast.combine_eltypes(x -> x + a, (1.0,)))
-    @test @inferred(g()) === Float64
+    g() = (a = 1; x -> x + a)
+    @test @inferred(broadcast(g(), 1.0)) === 2.0


@pabloferz @Sacha0 Since you worked on these tests (this one and the one below), could you confirm that the new ones covers the same use case as the old ones? That wasn't completely clear to me.

Regrettably sufficient time has elapsed since I looked at these tests and such that I no longer have much memory of them. Sorry Milan! :)

timholy · 2021-01-17T17:37:13Z

Does this have an impact on the inference & codegen time? The broadcast infrastructure is already a big piece of the latency for many packages, just curious whether this makes it better or worse.

nalimilan · 2021-01-17T18:27:33Z

Here's a small benchmark with x = rand(10), each time in a fresh Julia session.

Master:

julia> @time exp.(x);
  0.071312 seconds (207.54 k allocations: 12.900 MiB, 99.55% compilation time)

julia> @time exp.(x);
  0.073889 seconds (207.54 k allocations: 12.900 MiB, 99.48% compilation time)

julia> @time exp.(x);
  0.072427 seconds (207.54 k allocations: 12.900 MiB, 99.54% compilation time)

PR:

julia> @time exp.(x);
  0.075400 seconds (223.03 k allocations: 13.804 MiB, 99.46% compilation time)

julia> @time exp.(x);
  0.071174 seconds (223.03 k allocations: 13.804 MiB, 99.56% compilation time)

julia> @time exp.(x);
  0.077204 seconds (223.03 k allocations: 13.804 MiB, 99.58% compilation time)

So there are a few more allocations, and it might be a bit slower, but it's not super clear. Do you have ideas about other possible benchmarks?

timholy · 2021-01-17T18:58:11Z

Maybe one where f has multiple arguments? As long as that looks good too, then I'm fine with this idea.

nalimilan · 2021-01-17T20:13:39Z

Here's what I get for slightly more complex cases (still with a fresh session for each pair of commands):
master:

julia> @time x .+ 1;
  0.062047 seconds (165.10 k allocations: 10.133 MiB, 99.52% compilation time)

julia> @time Float32.(x) .+ x .+ 1;
  0.132495 seconds (281.68 k allocations: 16.544 MiB, 99.32% compilation time)

julia> @time x .+ 1;
  0.063509 seconds (165.10 k allocations: 10.133 MiB, 99.48% compilation time)

julia> @time Float32.(x) .+ x .+ 1;
  0.137296 seconds (281.68 k allocations: 16.544 MiB, 99.36% compilation time)

julia> @time x .+ 1;
  0.061300 seconds (165.10 k allocations: 10.133 MiB, 99.34% compilation time)

julia> @time Float32.(x) .+ x .+ 1;
  0.136219 seconds (281.68 k allocations: 16.544 MiB, 99.40% compilation time)

PR:

julia> @time x .+ 1;
  0.065680 seconds (180.26 k allocations: 11.009 MiB, 99.58% compilation time)

julia> @time Float32.(x) .+ x .+ 1;
  0.134870 seconds (296.66 k allocations: 17.251 MiB, 99.40% compilation time)

julia> @time x .+ 1;
  0.065040 seconds (180.26 k allocations: 11.009 MiB, 99.35% compilation time)

julia> @time Float32.(x) .+ x .+ 1;
  0.141318 seconds (296.66 k allocations: 17.251 MiB, 99.39% compilation time)

julia> @time x .+ 1;
  0.068847 seconds (180.26 k allocations: 11.009 MiB, 99.54% compilation time)

julia> @time Float32.(x) .+ x .+ 1;
  0.135004 seconds (296.66 k allocations: 17.251 MiB, 99.32% compilation time)

So still a slight increase in allocations.

But I've found a more serious problem: the CI failure is due to combine_eltypes being used at

julia/stdlib/SparseArrays/src/higherorderfns.jl

Line 163 in 26a721b

entrytypeC = Base.Broadcast.combine_eltypes(f, (A, Bs...))

and

julia/stdlib/SparseArrays/src/higherorderfns.jl

Line 191 in 26a721b

entrytypeC = Base.Broadcast.combine_eltypes(f, (A, Bs...))

I'm not sure we can actually get rid of these without reinventing most of combine_eltypes. What do you think? BTW, I'm surprised that combine_type is used to determine the type of the result (even when not empty) as it relies on inference.

vtjnash · 2021-01-18T19:32:04Z

SparseArrays may have some legacy issues with the way it forms the eltype. I think this PR seems reasonable. Should be similar time, since we're about to infer in to the methods (for the runtime code path) anyways.

mbauman · 2021-01-20T18:54:06Z

base/broadcast.jl

@@ -901,7 +888,8 @@ copy(bc::Broadcasted{<:Union{Nothing,Unknown}}) =
 const NonleafHandlingStyles = Union{DefaultArrayStyle,ArrayConflict}

 @inline function copy(bc::Broadcasted{Style}) where {Style}
-    ElType = combine_eltypes(bc.f, bc.args)
+    ElType = promote_typejoin_union(Base._return_type(_broadcast_getindex,
+                                                      Tuple{typeof(bc), Int}))


I think this needs to be:

Suggested change

Tuple{typeof(bc), Int}))

Tuple{typeof(bc), ndims(bc) == 1 ? eltype(axes(bc)[1]) : CartesianIndex{ndims(bc)}})

Or is it Base._return_type(iterate, Base._return_type(eachindex, Tuple{typeof(bc)})) ?

Oops, dropped a function. I meant:

index_type(bc) = iterate(eachindex(bc))[1] Base._return_type(index_type, Tuple{typeof(bc)})

Putting that all together:

Suggested change

Tuple{typeof(bc), Int}))

_broadcast_getindex_eltype(bc) = _broadcast_getindex(bc, iterate(eachindex(bc))[1])

ElType = promote_typejoin_union(Base._return_type(_broadcast_getindex_eltype, Tuple{typeof(bc)}))

I'm not entirely sure this is better than the existing code, which does pretty much the same calls, but bases it on calling eltype, instead of inference, which has at least different tradeoffs for better or worse 🤔

Do we want to try to proceed with this PR / design (inferring iterate), or keep the current one (call eltype)?

mbauman · 2021-01-20T18:56:54Z

Do you have ideas about other possible benchmarks?

The tests themselves tend to lend themselves fairly nicely to compile-time benchmarking. E.g., time julia test/broadcast.jl or some such.

StefanKarpinski · 2021-08-17T20:59:14Z

Bump?

vtjnash · 2021-08-18T00:04:47Z

Note the currently open question of whether this is actually better or worse (#39295 (comment))

N5N3 · 2021-12-29T14:04:14Z

Can we wake up this? My local bench shows that this PR is able to reduce about 10% ~ 13% of the time cost of Base.runtests("broadcast")

nalimilan · 2021-12-29T16:09:09Z

Can we wake up this? My local bench shows that this PR is able to reduce about 10% ~ 13% of the time cost of Base.runtests("broadcast")

The time it takes to run tests isn't usually a very interesting benchmark as tests are a very atypical coding pattern. Do you have evidence that this PR improves performance (or compile times) on real use cases? This isn't to say that I'm opposed to merging it.

N5N3 · 2021-12-29T16:32:00Z

Well l have no further envidence, I just follows above advice from @mbauman to bench.
And found that the time cost and mem usage reduced after similar commit.
IIRC, we also use the test itself to bench the codegen improvement from avoiding always inline.
Maybe we need a package whose TTFP is dominated by broadcast?

vtjnash · 2022-01-07T17:58:11Z

The PR is currently wrong, though I have a suggestion above to fix it. The remaining question however, as before, is whether we want this design change (#39295 (comment))

Simplify computation of return type in broadcast

affe96a

Since we rely on inference, we can use `_return_type` directly instead of via a complex machinery.

nalimilan requested a review from mbauman January 17, 2021 17:33

nalimilan commented Jan 17, 2021

View reviewed changes

nalimilan mentioned this pull request Jan 17, 2021

Fix broadcast error when eltype is inconsistent with getindex #39185

Merged

mbauman reviewed Jan 20, 2021

View reviewed changes

kshyatt added broadcast Applying a function over a collection compiler:inference Type inference labels Feb 6, 2021

kalmarek mentioned this pull request Apr 5, 2021

add note on eltype and getindex #39204

Closed

N5N3 mentioned this pull request Mar 24, 2023

make Tuple{Union{}} unconstructable #49111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify computation of return type in broadcast #39295

Simplify computation of return type in broadcast #39295

nalimilan commented Jan 17, 2021

nalimilan Jan 17, 2021

Sacha0 Jan 17, 2021

timholy commented Jan 17, 2021

nalimilan commented Jan 17, 2021

timholy commented Jan 17, 2021

nalimilan commented Jan 17, 2021

vtjnash commented Jan 18, 2021

mbauman Jan 20, 2021 •

edited

Loading

vtjnash Apr 3, 2021

vtjnash Apr 3, 2021

vtjnash Apr 3, 2021

vtjnash Apr 3, 2021

vtjnash Apr 19, 2021 •

edited

Loading

mbauman commented Jan 20, 2021

StefanKarpinski commented Aug 17, 2021

vtjnash commented Aug 18, 2021

N5N3 commented Dec 29, 2021

nalimilan commented Dec 29, 2021

N5N3 commented Dec 29, 2021

vtjnash commented Jan 7, 2022

	Tuple{typeof(bc), Int}))
	Tuple{typeof(bc), ndims(bc) == 1 ? eltype(axes(bc)[1]) : CartesianIndex{ndims(bc)}})

	Tuple{typeof(bc), Int}))
	_broadcast_getindex_eltype(bc) = _broadcast_getindex(bc, iterate(eachindex(bc))[1])
	ElType = promote_typejoin_union(Base._return_type(_broadcast_getindex_eltype, Tuple{typeof(bc)}))

Simplify computation of return type in broadcast #39295

Are you sure you want to change the base?

Simplify computation of return type in broadcast #39295

Conversation

nalimilan commented Jan 17, 2021

nalimilan Jan 17, 2021

Choose a reason for hiding this comment

Sacha0 Jan 17, 2021

Choose a reason for hiding this comment

timholy commented Jan 17, 2021

nalimilan commented Jan 17, 2021

timholy commented Jan 17, 2021

nalimilan commented Jan 17, 2021

vtjnash commented Jan 18, 2021

mbauman Jan 20, 2021 • edited Loading

Choose a reason for hiding this comment

vtjnash Apr 3, 2021

Choose a reason for hiding this comment

vtjnash Apr 3, 2021

Choose a reason for hiding this comment

vtjnash Apr 3, 2021

Choose a reason for hiding this comment

vtjnash Apr 3, 2021

Choose a reason for hiding this comment

vtjnash Apr 19, 2021 • edited Loading

Choose a reason for hiding this comment

mbauman commented Jan 20, 2021

StefanKarpinski commented Aug 17, 2021

vtjnash commented Aug 18, 2021

N5N3 commented Dec 29, 2021

nalimilan commented Dec 29, 2021

N5N3 commented Dec 29, 2021

vtjnash commented Jan 7, 2022

mbauman Jan 20, 2021 •

edited

Loading

vtjnash Apr 19, 2021 •

edited

Loading