Introduce sciml_train #125

ChrisRackauckas · 2020-01-31T15:24:59Z

This is a starter PR for students interested in solving #120

ChrisRackauckas · 2020-01-31T15:25:40Z

Needs to get tested and all of that. All of our tests should change over to this function and style. The README should all make use of it as well.

codecov · 2020-02-02T01:57:52Z

Codecov Report

Merging #125 into master will increase coverage by 5.95%.
The diff coverage is 93.54%.

@@            Coverage Diff             @@
##           master     #125      +/-   ##
==========================================
+ Coverage   72.91%   78.87%   +5.95%     
==========================================
  Files           2        3       +1     
  Lines          48       71      +23     
==========================================
+ Hits           35       56      +21     
- Misses         13       15       +2

Impacted Files	Coverage Δ
src/DiffEqFlux.jl	`23.07% <ø> (ø)`	⬆️
src/neural_de.jl	`91.42% <100%> (ø)`	⬆️
src/train.jl	`91.3% <91.3%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2085732...0973536. Read the comment docs.

ChrisRackauckas · 2020-02-02T03:39:16Z

@pkofod is setting up the output with Optim's type a good idea? Also, is there a reason why the initial stepnorm is so sensitive?

ChrisRackauckas · 2020-02-02T03:42:57Z

From the tests:

TrackerAdjoint with ADAM:

 * Status: failure (reached maximum number of iterations)

 * Candidate solution
    Minimizer: [1.90e+00, 1.89e+00, 8.78e-01,  ...]
    Minimum:   6.817536e-03

 * Found with
    Algorithm:     ADAM
    Initial Point: [2.20e+00, 1.00e+00, 2.00e+00,  ...]

 * Convergence measures
    |x - x'|               = NaN ≰ 0.0e+00
    |x - x'|/|x'|          = NaN ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = NaN ≰ 0.0e+00

 * Work counters
    Seconds run:   3  (vs limit Inf)
    Iterations:    100
    f(x) calls:    100
    ∇f(x) calls:   100

TrackerAdjoint with BFGS Optim

 * Status: failure (objective increased between iterations) (line search failed)

 * Candidate solution
    Minimizer: [1.96e+00, 1.96e+00, 1.70e+00,  ...]
    Minimum:   2.729345e-08

 * Found with
    Algorithm:     BFGS
    Initial Point: [2.20e+00, 1.00e+00, 2.00e+00,  ...]

 * Convergence measures
    |x - x'|               = 2.28e-06 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.16e-06 ≰ 0.0e+00
    |f(x) - f(x')|         = 2.73e-14 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.00e-06 ≰ 0.0e+00
    |g(x)|                 = 1.81e-03 ≰ 1.0e-08

 * Work counters
    Seconds run:   2  (vs limit Inf)
    Iterations:    11
    f(x) calls:    79
    ∇f(x) calls:   79

ForwardDiffSensitivity with ADAM

 * Status: failure (reached maximum number of iterations)

 * Candidate solution
    Minimizer: [1.75e+00, 1.72e+00, 1.18e+00,  ...]
    Minimum:   3.778992e-03

 * Found with
    Algorithm:     ADAM
    Initial Point: [2.20e+00, 1.00e+00, 2.00e+00,  ...]

 * Convergence measures
    |x - x'|               = NaN ≰ 0.0e+00
    |x - x'|/|x'|          = NaN ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = NaN ≰ 0.0e+00

 * Work counters
    Seconds run:   1  (vs limit Inf)
    Iterations:    100
    f(x) calls:    100
    ∇f(x) calls:   100

with BFGS

 * Status: success

 * Candidate solution
    Minimizer: [1.85e+00, 1.85e+00, 1.22e+00,  ...]
    Minimum:   5.315246e-22

 * Found with
    Algorithm:     BFGS
    Initial Point: [2.20e+00, 1.00e+00, 2.00e+00,  ...]

 * Convergence measures
    |x - x'|               = 1.67e-09 ≰ 0.0e+00
    |x - x'|/|x'|          = 9.02e-10 ≰ 0.0e+00
    |f(x) - f(x')|         = 5.78e-17 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.09e+05 ≰ 0.0e+00
    |g(x)|                 = 5.25e-11 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    10
    f(x) calls:    35
    ∇f(x) calls:   35

Adjoints with ADAM

 * Status: failure (reached maximum number of iterations)

 * Candidate solution
    Minimizer: [1.90e+00, 1.89e+00, 8.78e-01,  ...]
    Minimum:   6.817536e-03

 * Found with
    Algorithm:     ADAM
    Initial Point: [2.20e+00, 1.00e+00, 2.00e+00,  ...]

 * Convergence measures
    |x - x'|               = NaN ≰ 0.0e+00
    |x - x'|/|x'|          = NaN ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = NaN ≰ 0.0e+00

 * Work counters
    Seconds run:   3  (vs limit Inf)
    Iterations:    100
    f(x) calls:    100
    ∇f(x) calls:   100

BFGS

 * Status: failure (objective increased between iterations) (line search failed)

 * Candidate solution
    Minimizer: [1.96e+00, 1.96e+00, 1.70e+00,  ...]
    Minimum:   2.729345e-08

 * Found with
    Algorithm:     BFGS
    Initial Point: [2.20e+00, 1.00e+00, 2.00e+00,  ...]

 * Convergence measures
    |x - x'|               = 2.28e-06 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.16e-06 ≰ 0.0e+00
    |f(x) - f(x')|         = 2.73e-14 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.00e-06 ≰ 0.0e+00
    |g(x)|                 = 1.81e-03 ≰ 1.0e-08

 * Work counters
    Seconds run:   2  (vs limit Inf)
    Iterations:    11
    f(x) calls:    79
    ∇f(x) calls:   79

Conclusion: BFGS is consistently about 5 orders of magnitude better with less effort.

@test

```julia using DiffEqFlux, OrdinaryDiffEq, Optim, Flux, Zygote, Test u0 = Float32[2.; 0.] datasize = 30 tspan = (0.0f0,1.5f0) function trueODEfunc(du,u,p,t) true_A = [-0.1 2.0; -2.0 -0.1] du .= ((u.^3)'true_A)' end t = range(tspan[1],tspan[2],length=datasize) prob = ODEProblem(trueODEfunc,u0,tspan) ode_data = Array(solve(prob,Tsit5(),saveat=t)) fastdudt2,p = FastChain((x,p) -> x.^3, FastDense(2,50,tanh), FastDense(50,2)) fast_n_ode = NeuralODE(fastdudt2,p,tspan,Tsit5(),saveat=t) function fast_predict_n_ode(p) fast_n_ode(u0,p) end function fast_loss_n_ode(p) pred = fast_predict_n_ode(p) loss = sum(abs2,ode_data .- pred) loss,pred end dudt2 = Chain((x) -> x.^3, Dense(2,50,tanh), Dense(50,2)) n_ode = NeuralODE(dudt2,tspan,Tsit5(),saveat=t) function predict_n_ode(p) n_ode(u0,p) end function loss_n_ode(p) pred = predict_n_ode(p) loss = sum(abs2,ode_data .- pred) loss,pred end _p,re = Flux.destructure(dudt2) @test fastdudt2(ones(2),_p) ≈ dudt2(ones(2)) @test fast_loss_n_ode(p)[1] ≈ loss_n_ode(p)[1] @test Zygote.gradient((p)->fast_loss_n_ode(p)[1], p)[1] ≈ Zygote.gradient((p)->loss_n_ode(p)[1], p)[1] @Btime Zygote.gradient((p)->fast_loss_n_ode(p)[1], p) @Btime Zygote.gradient((p)->fast_loss_n_ode(p)[1], p) @Btime Zygote.gradient((p)->loss_n_ode(p)[1], p) @Btime Zygote.gradient((p)->loss_n_ode(p)[1], p) ``` ``` 27.272 ms (181318 allocations: 16.54 MiB) 27.328 ms (181318 allocations: 16.54 MiB) 262.430 ms (677868 allocations: 32.83 MiB) 260.814 ms (677868 allocations: 32.83 MiB) ``` order of magnitude performance improvement over using Flux for neural networks

implement fast versions of Flux

pkofod · 2020-02-05T19:32:00Z

Also, is there a reason why the initial stepnorm is so sensitive?

Wrt this... The question is that before the second iteration we have no second order information (unless given), so the hessian approximation is just I. This means that it will attempt to take the step -gradient and that can take you into funky regions. This is why we're also restricting the initial step in Pumas.

ChrisRackauckas · 2020-02-05T19:40:42Z

From what I can tell it almost always needs to be restricted. Would it be good to change the default to 0.01?

pkofod · 2020-02-05T20:27:07Z

You could. There have been various suggestions, but most descriptions assume that the initial step is the full gradient. Alternatively you can specify a preconditioner, you can supply curvature information through the initial inverse hessian approximation or yeah, an initial step norm. I never really benchmarked it. I only added the possibility because we looked at it in the Pumas context :)

Introduce sciml_train

3c9b00f

This was referenced Jan 31, 2020

[WIP] Add sciml_train! function #123

Closed

Better SciML Training Loop #120

Closed

ChrisRackauckas added 2 commits January 31, 2020 10:27

rename

bb3da42

sciml passes

d057efb

ChrisRackauckas added 2 commits February 1, 2020 21:05

start making the interface nicer

150be4c

standardize around Optim's output interface

d6a8422

ChrisRackauckas added 4 commits February 1, 2020 22:53

change partial_neural to new interface

aed5688

fix up tests and most of README

db5291a

final sciml_train! README changes

f136e7c

ChrisRackauckas changed the title ~~[WIP] Introduce sciml_train~~ Introduce sciml_train Feb 2, 2020

ChrisRackauckas and others added 7 commits February 2, 2020 04:40

Static and unsafe (sounds like an awesome album name)

e9149ce

initial_params interface to make it easier to swap out for Flux

74543f7

support and use FastChain in neural SDE example

3622f80

Float32 default parameters

a0b6de8

fix up neural SDE in README

a17c292

fix fast_layers test

62fc7e2

Merge pull request #126 from JuliaDiffEq/fast

0973536

implement fast versions of Flux

ChrisRackauckas merged commit 2bd7091 into master Feb 2, 2020

ChrisRackauckas deleted the sciml branch February 2, 2020 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce sciml_train #125

Introduce sciml_train #125

ChrisRackauckas commented Jan 31, 2020

ChrisRackauckas commented Jan 31, 2020

codecov bot commented Feb 2, 2020 •

edited

Loading

ChrisRackauckas commented Feb 2, 2020 •

edited

Loading

ChrisRackauckas commented Feb 2, 2020 •

edited

Loading

pkofod commented Feb 5, 2020

ChrisRackauckas commented Feb 5, 2020

pkofod commented Feb 5, 2020

Introduce sciml_train #125

Introduce sciml_train #125

Conversation

ChrisRackauckas commented Jan 31, 2020

ChrisRackauckas commented Jan 31, 2020

codecov bot commented Feb 2, 2020 • edited Loading

Codecov Report

ChrisRackauckas commented Feb 2, 2020 • edited Loading

ChrisRackauckas commented Feb 2, 2020 • edited Loading

pkofod commented Feb 5, 2020

ChrisRackauckas commented Feb 5, 2020

pkofod commented Feb 5, 2020

codecov bot commented Feb 2, 2020 •

edited

Loading

ChrisRackauckas commented Feb 2, 2020 •

edited

Loading

ChrisRackauckas commented Feb 2, 2020 •

edited

Loading