Manually inject derivative information that can't be computed by AD #89

KristofferC · 2016-01-27T10:09:46Z

I have a function that is computed in a "complicated enough" way for Automatic Differentiation to not be suitable for it. I can however get the derivative in some other way.

I want to use this function as part of another function where the rest can be AD:ed so I need a way to overload the ForwardDiff call to this function and inject the returned derivative.

I have started a bit but I know I am doing it wrong because I don't know how to propagate correctly my computed derivative to the Partials correctly.

using ForwardDiff

# Function that in reality can't be differentiated with AD
function ff(x)
    @assert length(x) == 3
    y = 2*x
    return y
end

# Derivative to above function
function ff_diff(x)
    @assert length(x) == 3
    return 2*eye(3)
end

# Function that I want to use AD on that uses f
function g(x)
    a = 2*x
    b = ff(a)
    return 5*b
end

typealias GradNumFloat3 ForwardDiff.GradientNumber{3,Float64,Tuple{Float64,Float64,Float64}}
typealias PartialFloat3 ForwardDiff.Partials{Float64,Tuple{Float64,Float64,Float64}}

#Overload f
function ff(x::Vector{GradNumFloat3})
    println("I got called")
    # Extract the values
    x_val = ForwardDiff.get_value(x)
    # Call f and get the derivative
    y = ff(x_val)
    dydx = ff_diff(x_val)
    # Insert back 
    # THIS IS WRONG because it ignores the already stored paritals in x
    nums = [GradNumFloat3(y[i], PartialFloat3((dydx[:, i]...))) for i in 1:length(x)]
    return nums
end

ForwardDiff.jacobian(g, rand(3))

# I got called
# 3x3 Array{Float64,2}:
#  10.0   0.0   0.0
#  0.0  10.0   0.0
#  0.0   0.0  10.0

# Above should be 20.0 on diagonal

Any tips on how to do this correctly? Maybe we could put it in the documentation if there is a good way to currently do it.

The text was updated successfully, but these errors were encountered:

KristofferC · 2016-01-27T10:20:07Z

From f(x + ε) = f(x) + εf'(x) I guess the computed derivative must be inserted with a multiplication of the already computed partials... Componentwise..?

jrevels · 2016-01-27T15:54:47Z

From f(x + ε) = f(x) + εf'(x) I guess the computed derivative must be inserted with a multiplication of the already computed partials... Componentwise..?

Yup! Multiplication is overloaded on Partials just for this purpose. You might've already seen this, but in case you haven't, this page of the docs might be useful to you.

jrevels · 2016-01-27T15:56:32Z

Though it might be easier to use the *num_from_deriv functions, which are vaguely discussed in the Contribution guide.

KristofferC · 2016-01-27T16:07:18Z

Ah, my bad I should have read the documentation more carefully. I will make another attempt. Thanks a bunch.

jrevels · 2016-01-27T16:13:01Z

No problem. I never really thought about how *num_from_deriv might facilitate external derivative injection in cases where ForwardDiff can't compute derivatives - I just wrote those functions to make contribution easier, so I'm glad you brought this up! I'm interested to see how it goes.

KristofferC · 2016-01-27T16:25:08Z

I guess what I am not sure about is, how my computed jacobian matrix should go into the deriv field.

My attempt was along:

function f{T <: ForwardDiff.GradientNumber}(x::Vector{T})
    x_val = ForwardDiff.get_value(x)
    y = f(x_val)
    J = f_diff(x_val)
    grad_vec = T[ForwardDiff.gradnum_from_deriv(x[i], y[i], J[:,i]) for i = 1:length(x)]
    return grad_vec 
end

Now, in the deriv field it seems like there should be a scalar but I have for example 9 components in the jacobian but only 3 gradient numbers so how will they all go in there... :P

jrevels · 2016-01-27T17:35:53Z

I think you have to propagate the Jacobian information in accordance with the chain rule, i.e. do a matrix multiplication between your J and the matrix you can get from ForwardDiff.get_jacobian(x). This should result in a new Jacobian, whose components can be loaded into the GradientNumbers returned inside of grad_vec.

KristofferC · 2016-01-28T08:10:04Z

Thank you. This seems to work (hard coded for 3 components to avoid the splatting into tuple penalty). To make it general I guess I need a @stagedfunction to create the expression that inserts the components into the tuple for the partials?

Does this look reasonable to you?

#Overload f
function f{T <: ForwardDiff.GradientNumber}(x::Vector{T})
    x_val = ForwardDiff.get_value(x)
    f_val, J = f(x_val), f_diff(x_val)
    J_new = J * ForwardDiff.get_jacobian(x)
    return [T(f_val[i], (J_new[1,i], J_new[2,i], J_new[3,i])) for i = 1:length(x)]
end

jrevels · 2016-01-28T18:13:04Z

Yup, looks good! This is pretty exciting. Like you said, generalizing this would require @generated functions, but I can see how this kind of thing might be exposed via the API to allow people to inject external derivatives in a variety of cases.

KristofferC · 2016-01-28T18:56:48Z

I made a macro that implements a jacobian of a function by giving the function, the derivative and, right now, the number of components.

macro implement_jacobian(f, f_grad, n_components)
    Jexp = Expr(:tuple, [:(J[i, $k]) for k=1:n_components]...)

    return quote
        function $(esc(f)){T <: ForwardDiff.GradientNumber}(x::Vector{T})
            x_val = ForwardDiff.get_value(x)
            f_val, J = $f(x_val), $f_grad(x_val)
            J_new = J * ForwardDiff.get_jacobian(x)
            return [T(f_val[i], $Jexp) for i = 1:length(x)]
        end
    end
end

Using it:

foo(x) = return 2*x

function foo_grad(x)
    println("I got called")
    return 2*eye(length(x))
end

julia> @implement_jacobian foo foo_grad 5
foo (generic function with 2 methods)

julia> ForwardDiff.jacobian(foo, rand(5))
I got called
5x5 Array{Float64,2}:
 2.0  0.0  0.0  0.0  0.0
 0.0  2.0  0.0  0.0  0.0
 0.0  0.0  2.0  0.0  0.0
 0.0  0.0  0.0  2.0  0.0
 0.0  0.0  0.0  0.0  2.0

Haven't tested it extensively. I guess the ForwardDiffCache should probably be used somehow for optimal performance to store the Jacobian extracted from the partials and the returning vector but haven't figured that out yet.

jrevels · 2016-01-28T21:09:56Z

This is awesome. Here's a version of the above with some tweaks (using ForwardDiffCache/removing the need to specify n_components):

macro implement_jacobian(f, jacf)
    cachesym = esc(gensym("cache"))
    return quote
        const $(cachesym) = ForwardDiff.ForwardDiffCache()
        function $(esc(f)){G<:ForwardDiff.GradientNumber}(x::Vector{G})
            x_val = ForwardDiff.get_value(x)
            f_val, J = $(f)(x_val), $(jacf)(x_val)
            J_new = J * ForwardDiff.get_jacobian(x)
            result = ForwardDiff.get_workvec!($(cachesym), G, length(x))
            for i in eachindex(result)
                result[i] = G(f_val[i], getrowpartials(G, J_new, i))
            end
            return result
        end
    end
end

@generated function getrowpartials{G<:ForwardDiff.GradientNumber}(::Type{G}, J, i)
    return Expr(:tuple, [:(J[i, $k]) for k=1:ForwardDiff.npartials(G)]...)
end

@implement_jacobian f jacf

KristofferC · 2016-01-29T07:37:09Z

Thanks so much. That works really well.

I experimented with implementing the analytical jacobian of a matrix multiplication:

const N = 8
const A = rand(N, N)

foo(x) = A * x
foo_jac(x) = return A
@implement_jacobian foo foo_jac

foo2(x) = A * x

jac_overload = ForwardDiff.jacobian(foo)
jac_normal = ForwardDiff.jacobian(foo2)

@time for i = 1:10^3 jac_normal(rand(N)) end

@time for i = 1:10^3 jac_overload(rand(N)) end

julia> @time for i = 1:10^3 jac_normal(rand(N)) end
  0.545726 seconds (11.69 M allocations: 1.439 GB, 29.32% gc time)

julia> @time for i = 1:10^3 jac_overload(rand(N)) end
  0.042785 seconds (214.00 k allocations: 71.808 MB, 17.93% gc time)

Maybe I am doing something wrong in the benchmarking but it seems that it could be useful to overload some of the BLAS operations that has simple analytical derivatives? I guess this is a similar case to the manual optimization of unary functions except at a higher level.

jrevels · 2016-01-30T00:44:38Z

Wow, that's quite a significant improvement. I'll have to play around with an implementation that tests performance in a more general case.

I'm planning on doing a ForwardDiff sprint next week to resolve some issues/implement some stuff I've been talking about with @mlubin, and this issue is now definitely on the TODO list.

I wonder how @implement_jacobian could be generalized to the case where the input dimension doesn't equal the chunk size, such that x only contains a carries a few columns of the Jacobian at a time. You'd have to somehow pass slice information through to the point of propagation, i.e. where the matmul occurs, so that it knows which slices of J are relevant for the chunk you're computing. Hmm...

mlubin · 2016-01-30T02:08:15Z

There's definitely something to gain from overloading BLAS operations. This makes a lot of sense if you're doing vector mode. I don't have a good solution for chunk mode...

KristofferC · 2016-02-17T14:25:06Z

@mlubin Why do you think that there would be extra difficulty with chunk_mode?

Assume a function R^M -> R^N and that we have a chunk_length of k.

The jacobian from the function will be a N x M matrix, (J_f).
The jacobian that get_jacobian(x) gives will be a M x k matrix, (J_x) .

The new "partial matrix" is computed with J_f * J_x which will be a N x k matrix.

These are then inserted into new gradient numbers and returned.

I made a gist here which implements an overloaded dot operator (input_length = 12 and chunk_size = 6) and calls it with a vector of gradient numbers. https://gist.github.com/KristofferC/b66b8b4b78e54f71124f

Result is that the overloaded dot operator runs in about 0.6x the time of the normal one.

julia> @time for i = 1:10^5 ddot_inject(Gs) end
  0.110522 seconds

julia> @time for i = 1:10^5 ddot(Gs) end
  0.164776 seconds

mlubin · 2016-02-17T14:26:59Z

If it works well, I'm all for it.

KristofferC · 2016-02-17T17:41:32Z

The problem is that without preallocating the needed temporary arrays, the cost model becomes quite complex and which one is faster depends on what BLAS function is being overloaded and the size of the input vector etc.

Doing this would make a lot of sense for someone who knows a priori exactly what input_sizes and chunk_sizes that will be used but doing it so that it will be faster in the general case seems hard.

jrevels changed the title ~~Overloading "black box" function.~~ Manually inject derivative information that can't be computed by AD Jan 30, 2016

jrevels mentioned this issue Feb 8, 2016

Giant rewrite for Julia v0.5 #102

Merged

5 tasks

jrevels mentioned this issue Mar 28, 2016

Compatibility with Base linear algebra functions #111

Open

andyferris mentioned this issue Jul 6, 2016

API for derivatives JuliaGeometry/CoordinateTransformations.jl#6

Open

KristofferC mentioned this issue Oct 20, 2016

Linear Algebra derivates are slow #154

Closed

KristofferC mentioned this issue Dec 3, 2016

wip: implement macro for users to define their own derivatives #165

Open

4 tasks

sethaxen mentioned this issue Jul 23, 2019

Gradient of a real-valued function JuliaManifolds/Manifolds.jl#17

Closed

tpapp mentioned this issue Apr 26, 2020

Seeking suggestions to move forward tpapp/DynamicHMCExamples.jl#29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually inject derivative information that can't be computed by AD #89

Manually inject derivative information that can't be computed by AD #89

KristofferC commented Jan 27, 2016

KristofferC commented Jan 27, 2016

jrevels commented Jan 27, 2016

jrevels commented Jan 27, 2016

KristofferC commented Jan 27, 2016

jrevels commented Jan 27, 2016

KristofferC commented Jan 27, 2016

jrevels commented Jan 27, 2016

KristofferC commented Jan 28, 2016

jrevels commented Jan 28, 2016

KristofferC commented Jan 28, 2016

jrevels commented Jan 28, 2016

KristofferC commented Jan 29, 2016

jrevels commented Jan 30, 2016

mlubin commented Jan 30, 2016

KristofferC commented Feb 17, 2016

mlubin commented Feb 17, 2016

KristofferC commented Feb 17, 2016

Manually inject derivative information that can't be computed by AD #89

Manually inject derivative information that can't be computed by AD #89

Comments

KristofferC commented Jan 27, 2016

KristofferC commented Jan 27, 2016

jrevels commented Jan 27, 2016

jrevels commented Jan 27, 2016

KristofferC commented Jan 27, 2016

jrevels commented Jan 27, 2016

KristofferC commented Jan 27, 2016

jrevels commented Jan 27, 2016

KristofferC commented Jan 28, 2016

jrevels commented Jan 28, 2016

KristofferC commented Jan 28, 2016

jrevels commented Jan 28, 2016

KristofferC commented Jan 29, 2016

jrevels commented Jan 30, 2016

mlubin commented Jan 30, 2016

KristofferC commented Feb 17, 2016

mlubin commented Feb 17, 2016

KristofferC commented Feb 17, 2016