-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually inject derivative information that can't be computed by AD #89
Comments
From |
Yup! Multiplication is overloaded on |
Though it might be easier to use the |
Ah, my bad I should have read the documentation more carefully. I will make another attempt. Thanks a bunch. |
No problem. I never really thought about how |
I guess what I am not sure about is, how my computed jacobian matrix should go into the My attempt was along: function f{T <: ForwardDiff.GradientNumber}(x::Vector{T})
x_val = ForwardDiff.get_value(x)
y = f(x_val)
J = f_diff(x_val)
grad_vec = T[ForwardDiff.gradnum_from_deriv(x[i], y[i], J[:,i]) for i = 1:length(x)]
return grad_vec
end Now, in the |
I think you have to propagate the Jacobian information in accordance with the chain rule, i.e. do a matrix multiplication between your |
Thank you. This seems to work (hard coded for 3 components to avoid the splatting into tuple penalty). To make it general I guess I need a Does this look reasonable to you? #Overload f
function f{T <: ForwardDiff.GradientNumber}(x::Vector{T})
x_val = ForwardDiff.get_value(x)
f_val, J = f(x_val), f_diff(x_val)
J_new = J * ForwardDiff.get_jacobian(x)
return [T(f_val[i], (J_new[1,i], J_new[2,i], J_new[3,i])) for i = 1:length(x)]
end |
Yup, looks good! This is pretty exciting. Like you said, generalizing this would require |
I made a macro that implements a jacobian of a function by giving the function, the derivative and, right now, the number of components. macro implement_jacobian(f, f_grad, n_components)
Jexp = Expr(:tuple, [:(J[i, $k]) for k=1:n_components]...)
return quote
function $(esc(f)){T <: ForwardDiff.GradientNumber}(x::Vector{T})
x_val = ForwardDiff.get_value(x)
f_val, J = $f(x_val), $f_grad(x_val)
J_new = J * ForwardDiff.get_jacobian(x)
return [T(f_val[i], $Jexp) for i = 1:length(x)]
end
end
end Using it: foo(x) = return 2*x
function foo_grad(x)
println("I got called")
return 2*eye(length(x))
end
julia> @implement_jacobian foo foo_grad 5
foo (generic function with 2 methods)
julia> ForwardDiff.jacobian(foo, rand(5))
I got called
5x5 Array{Float64,2}:
2.0 0.0 0.0 0.0 0.0
0.0 2.0 0.0 0.0 0.0
0.0 0.0 2.0 0.0 0.0
0.0 0.0 0.0 2.0 0.0
0.0 0.0 0.0 0.0 2.0 Haven't tested it extensively. I guess the |
This is awesome. Here's a version of the above with some tweaks (using macro implement_jacobian(f, jacf)
cachesym = esc(gensym("cache"))
return quote
const $(cachesym) = ForwardDiff.ForwardDiffCache()
function $(esc(f)){G<:ForwardDiff.GradientNumber}(x::Vector{G})
x_val = ForwardDiff.get_value(x)
f_val, J = $(f)(x_val), $(jacf)(x_val)
J_new = J * ForwardDiff.get_jacobian(x)
result = ForwardDiff.get_workvec!($(cachesym), G, length(x))
for i in eachindex(result)
result[i] = G(f_val[i], getrowpartials(G, J_new, i))
end
return result
end
end
end
@generated function getrowpartials{G<:ForwardDiff.GradientNumber}(::Type{G}, J, i)
return Expr(:tuple, [:(J[i, $k]) for k=1:ForwardDiff.npartials(G)]...)
end
@implement_jacobian f jacf |
Thanks so much. That works really well. I experimented with implementing the analytical jacobian of a matrix multiplication: const N = 8
const A = rand(N, N)
foo(x) = A * x
foo_jac(x) = return A
@implement_jacobian foo foo_jac
foo2(x) = A * x
jac_overload = ForwardDiff.jacobian(foo)
jac_normal = ForwardDiff.jacobian(foo2)
@time for i = 1:10^3 jac_normal(rand(N)) end
@time for i = 1:10^3 jac_overload(rand(N)) end
julia> @time for i = 1:10^3 jac_normal(rand(N)) end
0.545726 seconds (11.69 M allocations: 1.439 GB, 29.32% gc time)
julia> @time for i = 1:10^3 jac_overload(rand(N)) end
0.042785 seconds (214.00 k allocations: 71.808 MB, 17.93% gc time) Maybe I am doing something wrong in the benchmarking but it seems that it could be useful to overload some of the BLAS operations that has simple analytical derivatives? I guess this is a similar case to the manual optimization of unary functions except at a higher level. |
Wow, that's quite a significant improvement. I'll have to play around with an implementation that tests performance in a more general case. I'm planning on doing a ForwardDiff sprint next week to resolve some issues/implement some stuff I've been talking about with @mlubin, and this issue is now definitely on the TODO list. I wonder how |
There's definitely something to gain from overloading BLAS operations. This makes a lot of sense if you're doing vector mode. I don't have a good solution for chunk mode... |
@mlubin Why do you think that there would be extra difficulty with chunk_mode? Assume a function R^M -> R^N and that we have a chunk_length of k. The jacobian from the function will be a N x M matrix, ( The new "partial matrix" is computed with These are then inserted into new gradient numbers and returned. I made a gist here which implements an overloaded Result is that the overloaded dot operator runs in about 0.6x the time of the normal one.
|
If it works well, I'm all for it. |
The problem is that without preallocating the needed temporary arrays, the cost model becomes quite complex and which one is faster depends on what BLAS function is being overloaded and the size of the input vector etc. Doing this would make a lot of sense for someone who knows a priori exactly what input_sizes and chunk_sizes that will be used but doing it so that it will be faster in the general case seems hard. |
I have a function that is computed in a "complicated enough" way for Automatic Differentiation to not be suitable for it. I can however get the derivative in some other way.
I want to use this function as part of another function where the rest can be AD:ed so I need a way to overload the ForwardDiff call to this function and inject the returned derivative.
I have started a bit but I know I am doing it wrong because I don't know how to propagate correctly my computed derivative to the
Partials
correctly.Any tips on how to do this correctly? Maybe we could put it in the documentation if there is a good way to currently do it.
The text was updated successfully, but these errors were encountered: