Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LICM for pure functions #29285

Open
SimonDanisch opened this issue Sep 20, 2018 · 8 comments
Open

LICM for pure functions #29285

SimonDanisch opened this issue Sep 20, 2018 · 8 comments
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) feature Indicates new feature / enhancement requests

Comments

@SimonDanisch
Copy link
Contributor

SimonDanisch commented Sep 20, 2018

Julia 1.0:

I just realized, that this is a gotcha one easily runs into, especially when using the @. macro:

a = rand(1000, 1000)
c = 1.0
out = similar(a)
julia> @btime $(out) .= $a .- sin.($c);
  5.695 ms (0 allocations: 0 bytes)
julia> @btime $(out) .= $a .- sin($c);
  495.669 μs (0 allocations: 0 bytes)

This likely happens because the compiler can't infer that sin is pure.
I realize, with having access to the call tree in the new lazy broadcast, we could solve this for a predefined set of functions.

First trick could be to just overload broadcasted for known signatures:

Base.Broadcast.broadcasted(::typeof(sin), x::Number) = sin(x)
@btime $out .= $a .+ sin.($c)

this solves the problem for a chosen set of functions.
We could also consider, if we introduce a purity trait to make this easier for multiple argument functions:

broadcasted(f, args...) = broadcasted(IsPure(f), f, args...)
broadcasted(::Pure{true}, f, args...) = f(args...) # should probably not get applied to arrays
broadcasted(::Pure{false}, f, args...) = Broadcasted(f, args...)

I guess this has been discussed before, but I couldn't really find an issue about it...

@mbauman mbauman added broadcast Applying a function over a collection performance Must go faster labels Sep 20, 2018
@mbauman
Copy link
Sponsor Member

mbauman commented Sep 20, 2018

Sure, both of those workarounds could work, but I'd very much rather directly tell the compiler about a reasonable level of purity than push an orthogonal concern into the already-complicated broadcasting API.

I must say, though: it is doing exactly what you requested and performing within 5% of the equivalent for loop. Folks rely on this behavior for non-pure functions like A .= rand.().

@yuyichao
Copy link
Contributor

Dup of #20875

@SimonDanisch
Copy link
Contributor Author

SimonDanisch commented Sep 20, 2018

Yeah I know ;)
I'd totally understand the consent, that we don't ever want to fix it - because it's what the user specifies and this optimization can only ever work for a small subset of predefined functions. So making the user comfortable by solving the problem for a small subset of functions could lead to the opposite effect.

So it might be much better to educate the user - I wouldn't mind if this issue "derails" into how to better inform the user of such gotchas!

@vtjnash vtjnash added compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) and removed broadcast Applying a function over a collection performance Must go faster labels Sep 20, 2018
@vtjnash vtjnash changed the title broadcast performance of function applied to scalars LICM for pure functions Sep 20, 2018
@vtjnash vtjnash added the feature Indicates new feature / enhancement requests label Sep 20, 2018
@SimonDanisch
Copy link
Contributor Author

from the title change, I assume that @vtjnash is already on the right track for my next boiled down example:

using BenchmarkTools
Base.@pure @noinline sopure(x) = x / 2

function test(x, y)
    #c = sopure(y) 
    for i = 1:length(x)
        @inbounds x[i] = sopure(y) # c
    end
    x
end

@btime test($(rand(10)), $(Ref(1.0))[]) # when commenting out c 14ns, with c 5ns

This doesn't hoist, and If I'm not mistaken should be the most straightforward example for LICM?

@SimonDanisch
Copy link
Contributor Author

Now that the issue is pretty clearly outlined, my main question is:
is fixing just a matter of turning on some flags, will this require a new pass in the Julia optimizer, or does it need a refactor to forward the correct information to LLVM?

@SimonDanisch
Copy link
Contributor Author

Would it be possible to enlighten me a bit about the feasibility and some background about this problem? Just a short statement, that I can tell to the people wondering why Julia's compiler has a problems with this ;)

I feel, like I've seen a discussion about this before, but when I search for julia LICM, this issue turns up as the most relevant result :D

@KristofferC
Copy link
Sponsor Member

KristofferC commented Sep 24, 2018

#9942 maybe relevant?

@SimonDanisch
Copy link
Contributor Author

SimonDanisch commented Sep 24, 2018

I guess ;)
So gcc etc can likely hoist compiler intrinsics marked as pure out of the loop, while Julia's trigonometric functions are implemented in pure Julia which makes it harder to detect the pureness...
But an MWE with a "julia pure" function only works for constant propagation but not for LICM. I guess that would need a Julia LICM pass - or forward pureness metadata to LLVM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests

5 participants