Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which types work as container of parameters? #178

Closed
lindnemi opened this issue Mar 4, 2020 · 12 comments
Closed

Which types work as container of parameters? #178

lindnemi opened this issue Mar 4, 2020 · 12 comments
Labels

Comments

@lindnemi
Copy link

lindnemi commented Mar 4, 2020

Our package provides an API that handles parameters differently depending on them being either tuples or arrays. If only arrays are passed everything works fine, but the tuple syntax seems to be incompatible with DiffEqFlux right now. Why is that?

MWE:

using Flux, Optim, DiffEqFlux, OrdinaryDiffEq

function node!(dx, x, p, t)
    dx .= [1.]
    dx .+= prod(p) # .* x
end

function edge!(dx, x, p, t)
    dx .-= sum(p)# .* x
end

function network!(dx, x, p, t)
    node!(dx, x, p, t)
    edge!(dx, x, p, t)
end

function network!(dx, x, p::T, t) where T <: Tuple
    @assert length(p) == 2
    node!(dx, x, p[1], t)
    edge!(dx, x, p[2], t)
end

x0 = [2.]
tspan = (0.,2.)

p = [3.,3.]
prob = ODEProblem(network!, x0, tspan, p)
sol  = solve(prob, Tsit5())

plot(sol)


p2 = ([3., 3.], [3., 3.])
prob2 = ODEProblem(network!, x0, tspan, p2)
sol2  = solve(prob2, Tsit5())

plot!(sol2)

### DiffEqFlux Stuff starts here ###

function predict_adjoint(p) # Our 1-layer neural network
  Array(concrete_solve(prob,Tsit5(),x0, p,saveat=0:0.1:2, adaptive=false, dt=.1))
end

function loss_adjoint(p)
  prediction = predict_adjoint(p)
  loss = sum(abs2, prediction .- 1.)
  loss, prediction
end

cb = function (p, l, pred) # callback function to observe training
  display(l)
  # using `remake` to re-create our `prob` with current parameters `p`
  display(plot(solve(remake(prob,p=p), Tsit5(),adaptive=false, dt=.1)))
  return false
end

# Display the ODE with the initial parameter values.
cb(p,loss_adjoint(p)...)

res = DiffEqFlux.sciml_train(loss_adjoint, p, BFGS(), cb = cb)

### Problems start here 

### Tuples of Parameters are not possible as input to sciml_train
res = DiffEqFlux.sciml_train(loss_adjoint, p2, BFGS(), cb = cb)


# MethodError: no method matching optimize(::NLSolversBase.InplaceObjective{Nothing,DiffEqFlux.var"#30#38"{DiffEqFlux.var"#29#37"{typeof(loss_adjoint)}},Nothing,Nothing,Nothing}, ::Tuple{Array{Float64,1},Array{Float64,1}}, ::BFGS{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,Nothing,Flat}, ::Optim.Options{Float64,DiffEqFlux.var"#_cb#36"{var"#77#78",Base.Iterators.Cycle{Tuple{DiffEqFlux.NullData}}}})
### not even if parameters are converted to tuple at a later stage

function predict_adjoint(p) # Our 1-layer neural network
  Array(concrete_solve(prob,Tsit5(),x0, ([3.,3.], p),saveat=0:0.1:2, adaptive=false, dt=.1))
end

function loss_adjoint(p)
  prediction = predict_adjoint(p)
  loss = sum(abs2, prediction .- 1.)
  loss, prediction
end

cb = function (p, l, pred) # callback function to observe training
  display(l)
  # using `remake` to re-create our `prob` with current parameters `p`
  display(plot(solve(remake(prob,p=([3.,3.], p)), Tsit5(),adaptive=false, dt=.1)))
  return false
end

res = DiffEqFlux.sciml_train(loss_adjoint, p, BFGS(), cb = cb)

# MethodError: no method matching param(::Tuple{Array{Float64,1},Array{Float64,1}})

It would be nice if one of the two constructions worked.

@ChrisRackauckas
Copy link
Member

It's because the generated ODE only works with mutation and tuples are not mutable. So SciML/SciMLSensitivity.jl#113

@lindnemi
Copy link
Author

lindnemi commented Mar 5, 2020

Thanks for the quick answer! Even though that means we will probably have to change our API...

@ChrisRackauckas
Copy link
Member

No problem. This will take a bit to solve.

@lindnemi
Copy link
Author

lindnemi commented Mar 5, 2020

No problem. This will take a bit to solve.

Does that mean your are working on a solution to this? So it might be possible at some point in the future?

@ChrisRackauckas
Copy link
Member

We would also need to solve SciML/DifferentialEquations.jl#566

@lindnemi
Copy link
Author

lindnemi commented Mar 24, 2020

I've been trying other types of multi-dimensional parameters:

N-dimensional arrays work fine

Arrays of Arrays don't work
Custom mutable structs don't work

What's the matter with these constructions?

using Flux, Optim, DiffEqFlux, OrdinaryDiffEq, Plots


### Works fine

function f!(dx, x, p, t)
  dx[1] = p[1, 1]
  dx[2] = p[2, 1]
end
p = [1. 5.; 5. 1.]

### MethodError: no method matching zero(::Type{Array{Float64,1}})

function f!(dx, x, p, t)
  dx[1] = p[1][1]
  dx[2] = p[2][1]
end
p = [[1.], [5.]]

### MethodError: no method matching zero(::Type{Any})

function f!(dx, x, p, t)
  dx[1] = p.one
  dx[2] = p.two
end

mutable struct two_p
  one::Float64
  two::Float64
end

p = two_p(1.,5.)

### ODE stuff

x0 = [1., 2.]
tspan = (0., 2.)
prob = ODEProblem(f!, x0, tspan, p)

sol  = solve(prob, Tsit5())

plot(sol)


### DiffEqFlux Stuff starts here ###

function predict_adjoint(p) # Our 1-layer neural network
  Array(concrete_solve(prob, Tsit5(), x0, p))
end

function loss_adjoint(p)
  prediction = predict_adjoint(p)
  loss = sum(abs2, prediction[:,end] .-1)
  loss, prediction
end

cb = function (p, l, pred) # callback function to observe training
  display(l)
  # using `remake` to re-create our `prob` with current parameters `p`
  display(plot(solve(remake(prob,p=p), Tsit5())))
  return false
end

# Display the ODE with the initial parameter values.
cb(p,loss_adjoint(p)...)

res = DiffEqFlux.sciml_train(loss_adjoint, p, BFGS(), cb = cb)

@ChrisRackauckas
Copy link
Member

You need something that can solve a differential equation, or move to a sensealg that is doing pure AD instead of using an adjoint method.

@lindnemi
Copy link
Author

lindnemi commented May 21, 2020

I recently learned about ArrayPartitions and tried them on this problem, since they can solve a differential equation.

p = ArrayPartition([1.],[5.]) 

I get AssertionError: IndexStyle(value) === IndexLinear(). Pure AD works.

@lindnemi lindnemi changed the title Tuples of parameters not possible Which types work as container of parameters? May 21, 2020
@ChrisRackauckas
Copy link
Member

ReverseDiff.jl doesn't like ArrayPartitions, and there's a few things left to get them fully supported in Zygote.

@lindnemi
Copy link
Author

Constructing the ArrayPartition in a function wrapper works with ReverseDiff

p = [1., 5.]
function wrapper!(dx, x, p, t)
  f!(dx, x, ArrayPartition([p[1]],[p[2]]),t)
end

@ChrisRackauckas
Copy link
Member

Yeah, it just doesn't allow custom array types to pass along its adjoints, so that's then what causes the issue.

@ChrisRackauckas
Copy link
Member

Documented upstream in the DiffEqSensitivity docstrings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants