Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set dtmin to zero by default #2098

Merged
merged 5 commits into from
Dec 30, 2023
Merged

Set dtmin to zero by default #2098

merged 5 commits into from
Dec 30, 2023

Conversation

ChrisRackauckas
Copy link
Member

This has been a long time coming. Ever since the issue SciML/Sundials.jl#423 I've been like "could you actually do that?", since Sundials does seem to default to dtmin=0, so I've been testing it around. We started down this path because the Hairer codes use it, and there's a justification that I gave as to why it might make more sense for Runge-Kutta codes than it does for BDF codes (SciML/Sundials.jl#423).

But... I've come to the realization that it is a good idea to have dtmin=0 for a few reasons. It's hard to pinpoint the codes since I've seen a few thousand models since September, but a few things have stood out to me.

  1. We've had a lot of false negatives with dtmin. Fix final tstop dtmin handling #1830, fixes Error when the last integration step is smaller than dtmin DifferentialEquations.jl#924, fixes Step size after tstop can be too small #1616, fixes fixed dt step! may fail due to t being close to tstop #1879. There's just a lot of these cases that can pop up. It's hard to individually root them all out. There was a whole bunch of work that @oscardssmith did last year Use both sides of tspan when determining dtmin #1762 in order to try and root out the false positives, and I think there are a lot less, but it led to a separate concern I will mention next. This basically means that any dtmin>0 has a downside at some point getting false negative exits. It's good to exit early in some cases, but in many cases it's "hyperactive".

  2. Setting dtmin to a reasonable default to not get false positives ends up being too conservative for it to be useful in a lot of scenarios. To see what I mean, here's some stiff ODE solves:

function rober!(du, u, p, t)
    y₁, y₂, y₃ = u
    k₁, k₂, k₃ = p
    du[1] = -k₁ * y₁ + k₃ * y₂ * y₃
    du[2] = k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃
    du[3] = k₂ * y₂^2
    nothing
end
prob = ODEProblem(rober!, [1.0, 0.0, 0.0], (0.0, 1e5), [0.04, 3e7, 1e4])
sol = solve(prob, Tsit5())

const k1=.35e0
const k2=.266e2
const k3=.123e5
const k4=.86e-3
const k5=.82e-3
const k6=.15e5
const k7=.13e-3
const k8=.24e5
const k9=.165e5
const k10=.9e4
const k11=.22e-1
const k12=.12e5
const k13=.188e1
const k14=.163e5
const k15=.48e7
const k16=.35e-3
const k17=.175e-1
const k18=.1e9
const k19=.444e12
const k20=.124e4
const k21=.21e1
const k22=.578e1
const k23=.474e-1
const k24=.178e4
const k25=.312e1

function f2(dy,y,p,t)
 r1  = k1 *y[1]
 r2  = k2 *y[2]*y[4]
 r3  = k3 *y[5]*y[2]
 r4  = k4 *y[7]
 r5  = k5 *y[7]
 r6  = k6 *y[7]*y[6]
 r7  = k7 *y[9]
 r8  = k8 *y[9]*y[6]
 r9  = k9 *y[11]*y[2]
 r10 = k10*y[11]*y[1]
 r11 = k11*y[13]
 r12 = k12*y[10]*y[2]
 r13 = k13*y[14]
 r14 = k14*y[1]*y[6]
 r15 = k15*y[3]
 r16 = k16*y[4]
 r17 = k17*y[4]
 r18 = k18*y[16]
 r19 = k19*y[16]
 r20 = k20*y[17]*y[6]
 r21 = k21*y[19]
 r22 = k22*y[19]
 r23 = k23*y[1]*y[4]
 r24 = k24*y[19]*y[1]
 r25 = k25*y[20]

 dy[1]  = -r1-r10-r14-r23-r24+
          r2+r3+r9+r11+r12+r22+r25
 dy[2]  = -r2-r3-r9-r12+r1+r21
 dy[3]  = -r15+r1+r17+r19+r22
 dy[4]  = -r2-r16-r17-r23+r15
 dy[5]  = -r3+r4+r4+r6+r7+r13+r20
 dy[6]  = -r6-r8-r14-r20+r3+r18+r18
 dy[7]  = -r4-r5-r6+r13
 dy[8]  = r4+r5+r6+r7
 dy[9]  = -r7-r8
 dy[10] = -r12+r7+r9
 dy[11] = -r9-r10+r8+r11
 dy[12] = r9
 dy[13] = -r11+r10
 dy[14] = -r13+r12
 dy[15] = r14
 dy[16] = -r18-r19+r16
 dy[17] = -r20
 dy[18] = r20
 dy[19] = -r21-r22-r24+r23+r25
 dy[20] = -r25+r24
end

function fjac(J,y,p,t)
      J .= 0.0
      J[1,1]   = -k1-k10*y[11]-k14*y[6]-k23*y[4]-k24*y[19]
      J[1,11]  = -k10*y[1]+k9*y[2]
      J[1,6]   = -k14*y[1]
      J[1,4]   = -k23*y[1]+k2*y[2]
      J[1,19]  = -k24*y[1]+k22
      J[1,2]   = k2*y[4]+k9*y[11]+k3*y[5]+k12*y[10]
      J[1,13]  = k11
      J[1,20]  = k25
      J[1,5]   = k3*y[2]
      J[1,10]  = k12*y[2]

      J[2,4]   = -k2*y[2]
      J[2,5]   = -k3*y[2]
      J[2,11]  = -k9*y[2]
      J[2,10]  = -k12*y[2]
      J[2,19]  = k21
      J[2,1]   = k1
      J[2,2]   = -k2*y[4]-k3*y[5]-k9*y[11]-k12*y[10]

      J[3,1]   = k1
      J[3,4]   = k17
      J[3,16]  = k19
      J[3,19]  = k22
      J[3,3]   = -k15

      J[4,4]   = -k2*y[2]-k16-k17-k23*y[1]
      J[4,2]   = -k2*y[4]
      J[4,1]   = -k23*y[4]
      J[4,3]   = k15

      J[5,5]   = -k3*y[2]
      J[5,2]   = -k3*y[5]
      J[5,7]   = 2k4+k6*y[6]
      J[5,6]   = k6*y[7]+k20*y[17]
      J[5,9]   = k7
      J[5,14]  = k13
      J[5,17]  = k20*y[6]

      J[6,6]   = -k6*y[7]-k8*y[9]-k14*y[1]-k20*y[17]
      J[6,7]   = -k6*y[6]
      J[6,9]   = -k8*y[6]
      J[6,1]   = -k14*y[6]
      J[6,17]  = -k20*y[6]
      J[6,2]   = k3*y[5]
      J[6,5]   = k3*y[2]
      J[6,16]  = 2k18

      J[7,7]   = -k4-k5-k6*y[6]
      J[7,6]   = -k6*y[7]
      J[7,14]  = k13

      J[8,7]   = k4+k5+k6*y[6]
      J[8,6]   = k6*y[7]
      J[8,9]   = k7

      J[9,9]   = -k7-k8*y[6]
      J[9,6]   = -k8*y[9]

      J[10,10] = -k12*y[2]
      J[10,2]  = -k12*y[10]+k9*y[11]
      J[10,9]  = k7
      J[10,11] = k9*y[2]

      J[11,11] = -k9*y[2]-k10*y[1]
      J[11,2]  = -k9*y[11]
      J[11,1]  = -k10*y[11]
      J[11,9]  = k8*y[6]
      J[11,6]  = k8*y[9]
      J[11,13] = k11

      J[12,11] = k9*y[2]
      J[12,2]  = k9*y[11]

      J[13,13] = -k11
      J[13,11] = k10*y[1]
      J[13,1]  = k10*y[11]

      J[14,14] = -k13
      J[14,10] = k12*y[2]
      J[14,2]  = k12*y[10]

      J[15,1]  = k14*y[6]
      J[15,6]  = k14*y[1]

      J[16,16] = -k18-k19
      J[16,4]  = k16

      J[17,17] = -k20*y[6]
      J[17,6]  = -k20*y[17]

      J[18,17] = k20*y[6]
      J[18,6]  = k20*y[17]

      J[19,19] = -k21-k22-k24*y[1]
      J[19,1]  = -k24*y[19]+k23*y[4]
      J[19,4]  = k23*y[1]
      J[19,20] = k25

      J[20,20] = -k25
      J[20,1]  = k24*y[19]
      J[20,19] = k24*y[1]

      return
end

u0 = zeros(20)
u0[2]  = 0.2
u0[4]  = 0.04
u0[7]  = 0.1
u0[8]  = 0.3
u0[9]  = 0.01
u0[17] = 0.007
prob = ODEProblem(ODEFunction{true, SciMLBase.FullSpecialize}(f2, jac=fjac),u0,(0.0,60.0))

sol = solve(prob,Tsit5())

const N = 32

xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
  A, B, alpha, dx = p
  alpha = alpha/dx^2
  @inbounds for I in CartesianIndices((N, N))
    i, j = Tuple(I)
    x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
    ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
    du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
                B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
    du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
                A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
    end
end
p = (3.4, 1., 10., step(xyd_brusselator))

input = rand(N,N,2)
output = similar(input)
f = ODEFunction{true, SciMLBase.FullSpecialize}(brusselator_2d_loop)
function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end

u0 = init_brusselator_2d(xyd_brusselator)
prob = ODEProblem(f,u0,(0.,11.5),p);

sol = solve(prob,Tsit5())

The first two stiff ODEs exist with maxiters. The last one actually completes, but used to dtmin, so it was a false negative and now just a slow way to compute it but it works. So these are the cases where in theory dtmin should cause earlier exits, non-stiff ODE solver on a stiff ODE, and it's not even the thing causing exits here because it's small enough to not be "the" factor.

  1. One of the arguments that I have seen for dtmin was that it can reduce the possibility of numerical error. But numerical error with dt being super small isn't compounding, the issue is that you get x + dt*y = x. If you look at what happens with very small steps on exponential growth (exponential growth is an ODE which has exponential error growth and is thus a hard not easy test for ODE solver error behavior), then you can see what I mean:
using OrdinaryDiffEq, Plots
f = (u,p,t) -> -p.a*u
prob = ODEProblem(f, [10.0], (0.0, 0.1), (a=-100.0,))
sol1 = solve(prob, Vern7(), dtmin=0);
sol2 = solve(prob, Vern7(), tstops=[sol1.t[2] + eps()], dtmin=0)
sol3 = solve(prob, Vern7(), tstops=[sol1.t[2] + 0.01eps()], dtmin=0)
sol4 = solve(prob, Vern7(), tstops=[sol1.t[2] + 0.001eps()], dtmin=0)

plot(sol1)
plot!(sol2)
plot!(sol3)
plot!(sol4)

t = 0.0:1e-5:0.1
@show maximum((sol1(t) - sol2(t)) ./ sol1(t)) # 8.678693835607108e-7
@show maximum((sol1(t) - sol3(t)) ./ sol1(t)) # 2.680284044683277e-13
@show maximum((sol1(t) - sol4(t)) ./ sol1(t)) # 0.0

Small dt's forced by tstops are safe. I've been playing around in many of these examples that have come up, it's safe. The worst that happens is you take a 1e-18 sized step and don't update u, but that's because the u update is below floating point resolution. You resolve the tstop just fine! So I don't think dtmin is enforcing a safety behavior.

And finally...

  1. It can do good things to have dtmin=0, and dt<eps(t)! This one kind of shocked me, but now I kind of get it. For non-stiff ODEs, it doesn't really matter. But for stiff ODEs and DAEs, you can had some interesting adverse behaviors due to the way that nonlinear solvers work. Sometimes rejecting the step and shrinking dt can improve the next step prediction and make the Jacobian less singular. Another interesting behavior is that sometimes a "real" sized step has too much error because a DAE's nonlinear algebraic equations can be changing their branches, i.e. there is a discontinuity. Handling that discontinuity with dt=1e-18 can make it so you effectively are doing something akin to a callback, time almost doesn't move forward, the differential variables don't change, but the algebraic equations can be bumped into a new state. And with any larger step, the error estimate denies the change. Thus allowing extremely small steps improves robustness to discontinuities especially on hard problems. This seems to be one of the reasons why CVODE and IDA does well on a few cases we didn't before. It's not anything to do with the solver, it's just we gave up early with dtmin, while Sundials would just keep trying with smaller than reasonable dt's, resolve the issue and move on.

With those 4 points, I think it's time to set dtmin=0.

This has been a long time coming. Ever since the issue SciML/Sundials.jl#423 I've been like "could you actually do that?", since Sundials does seem to default to dtmin=0, so I've been testing it around. We started down this path because the Hairer codes use it, and there's a justification that I gave as to why it might make more sense for Runge-Kutta codes than it does for BDF codes (SciML/Sundials.jl#423).

But... I've come to the realization that it is a good idea to have dtmin=0 for a few reasons. It's hard to pinpoint the codes since I've seen a few thousand models since September, but a few things have stood out to me.

1. We've had a lot of false negatives with dtmin. #1830, fixes SciML/DifferentialEquations.jl#924, fixes #1616, fixes #1879. There's just a lot of these cases that can pop up. It's hard to individually root them all out. There was a whole bunch of work that @oscardssmith did last year #1762 in order to try and root out the false positives, and I think there are a lot less, but it led to a separate concern I will mention next. This basically means that any dtmin>0 has a downside at some point getting false negative exits. It's good to exit early in some cases, but in many cases it's "hyperactive".

2. Setting dtmin to a reasonable default to not get false positives ends up being too conservative for it to be useful in a lot of scenarios. To see what I mean, here's some stiff ODE solves:

```julia
function rober!(du, u, p, t)
    y₁, y₂, y₃ = u
    k₁, k₂, k₃ = p
    du[1] = -k₁ * y₁ + k₃ * y₂ * y₃
    du[2] = k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃
    du[3] = k₂ * y₂^2
    nothing
end
prob = ODEProblem(rober!, [1.0, 0.0, 0.0], (0.0, 1e5), [0.04, 3e7, 1e4])
sol = solve(prob, Tsit5())

const k1=.35e0
const k2=.266e2
const k3=.123e5
const k4=.86e-3
const k5=.82e-3
const k6=.15e5
const k7=.13e-3
const k8=.24e5
const k9=.165e5
const k10=.9e4
const k11=.22e-1
const k12=.12e5
const k13=.188e1
const k14=.163e5
const k15=.48e7
const k16=.35e-3
const k17=.175e-1
const k18=.1e9
const k19=.444e12
const k20=.124e4
const k21=.21e1
const k22=.578e1
const k23=.474e-1
const k24=.178e4
const k25=.312e1

function f2(dy,y,p,t)
 r1  = k1 *y[1]
 r2  = k2 *y[2]*y[4]
 r3  = k3 *y[5]*y[2]
 r4  = k4 *y[7]
 r5  = k5 *y[7]
 r6  = k6 *y[7]*y[6]
 r7  = k7 *y[9]
 r8  = k8 *y[9]*y[6]
 r9  = k9 *y[11]*y[2]
 r10 = k10*y[11]*y[1]
 r11 = k11*y[13]
 r12 = k12*y[10]*y[2]
 r13 = k13*y[14]
 r14 = k14*y[1]*y[6]
 r15 = k15*y[3]
 r16 = k16*y[4]
 r17 = k17*y[4]
 r18 = k18*y[16]
 r19 = k19*y[16]
 r20 = k20*y[17]*y[6]
 r21 = k21*y[19]
 r22 = k22*y[19]
 r23 = k23*y[1]*y[4]
 r24 = k24*y[19]*y[1]
 r25 = k25*y[20]

 dy[1]  = -r1-r10-r14-r23-r24+
          r2+r3+r9+r11+r12+r22+r25
 dy[2]  = -r2-r3-r9-r12+r1+r21
 dy[3]  = -r15+r1+r17+r19+r22
 dy[4]  = -r2-r16-r17-r23+r15
 dy[5]  = -r3+r4+r4+r6+r7+r13+r20
 dy[6]  = -r6-r8-r14-r20+r3+r18+r18
 dy[7]  = -r4-r5-r6+r13
 dy[8]  = r4+r5+r6+r7
 dy[9]  = -r7-r8
 dy[10] = -r12+r7+r9
 dy[11] = -r9-r10+r8+r11
 dy[12] = r9
 dy[13] = -r11+r10
 dy[14] = -r13+r12
 dy[15] = r14
 dy[16] = -r18-r19+r16
 dy[17] = -r20
 dy[18] = r20
 dy[19] = -r21-r22-r24+r23+r25
 dy[20] = -r25+r24
end

function fjac(J,y,p,t)
      J .= 0.0
      J[1,1]   = -k1-k10*y[11]-k14*y[6]-k23*y[4]-k24*y[19]
      J[1,11]  = -k10*y[1]+k9*y[2]
      J[1,6]   = -k14*y[1]
      J[1,4]   = -k23*y[1]+k2*y[2]
      J[1,19]  = -k24*y[1]+k22
      J[1,2]   = k2*y[4]+k9*y[11]+k3*y[5]+k12*y[10]
      J[1,13]  = k11
      J[1,20]  = k25
      J[1,5]   = k3*y[2]
      J[1,10]  = k12*y[2]

      J[2,4]   = -k2*y[2]
      J[2,5]   = -k3*y[2]
      J[2,11]  = -k9*y[2]
      J[2,10]  = -k12*y[2]
      J[2,19]  = k21
      J[2,1]   = k1
      J[2,2]   = -k2*y[4]-k3*y[5]-k9*y[11]-k12*y[10]

      J[3,1]   = k1
      J[3,4]   = k17
      J[3,16]  = k19
      J[3,19]  = k22
      J[3,3]   = -k15

      J[4,4]   = -k2*y[2]-k16-k17-k23*y[1]
      J[4,2]   = -k2*y[4]
      J[4,1]   = -k23*y[4]
      J[4,3]   = k15

      J[5,5]   = -k3*y[2]
      J[5,2]   = -k3*y[5]
      J[5,7]   = 2k4+k6*y[6]
      J[5,6]   = k6*y[7]+k20*y[17]
      J[5,9]   = k7
      J[5,14]  = k13
      J[5,17]  = k20*y[6]

      J[6,6]   = -k6*y[7]-k8*y[9]-k14*y[1]-k20*y[17]
      J[6,7]   = -k6*y[6]
      J[6,9]   = -k8*y[6]
      J[6,1]   = -k14*y[6]
      J[6,17]  = -k20*y[6]
      J[6,2]   = k3*y[5]
      J[6,5]   = k3*y[2]
      J[6,16]  = 2k18

      J[7,7]   = -k4-k5-k6*y[6]
      J[7,6]   = -k6*y[7]
      J[7,14]  = k13

      J[8,7]   = k4+k5+k6*y[6]
      J[8,6]   = k6*y[7]
      J[8,9]   = k7

      J[9,9]   = -k7-k8*y[6]
      J[9,6]   = -k8*y[9]

      J[10,10] = -k12*y[2]
      J[10,2]  = -k12*y[10]+k9*y[11]
      J[10,9]  = k7
      J[10,11] = k9*y[2]

      J[11,11] = -k9*y[2]-k10*y[1]
      J[11,2]  = -k9*y[11]
      J[11,1]  = -k10*y[11]
      J[11,9]  = k8*y[6]
      J[11,6]  = k8*y[9]
      J[11,13] = k11

      J[12,11] = k9*y[2]
      J[12,2]  = k9*y[11]

      J[13,13] = -k11
      J[13,11] = k10*y[1]
      J[13,1]  = k10*y[11]

      J[14,14] = -k13
      J[14,10] = k12*y[2]
      J[14,2]  = k12*y[10]

      J[15,1]  = k14*y[6]
      J[15,6]  = k14*y[1]

      J[16,16] = -k18-k19
      J[16,4]  = k16

      J[17,17] = -k20*y[6]
      J[17,6]  = -k20*y[17]

      J[18,17] = k20*y[6]
      J[18,6]  = k20*y[17]

      J[19,19] = -k21-k22-k24*y[1]
      J[19,1]  = -k24*y[19]+k23*y[4]
      J[19,4]  = k23*y[1]
      J[19,20] = k25

      J[20,20] = -k25
      J[20,1]  = k24*y[19]
      J[20,19] = k24*y[1]

      return
end

u0 = zeros(20)
u0[2]  = 0.2
u0[4]  = 0.04
u0[7]  = 0.1
u0[8]  = 0.3
u0[9]  = 0.01
u0[17] = 0.007
prob = ODEProblem(ODEFunction{true, SciMLBase.FullSpecialize}(f2, jac=fjac),u0,(0.0,60.0))

sol = solve(prob,Tsit5())

const N = 32

xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
  A, B, alpha, dx = p
  alpha = alpha/dx^2
  @inbounds for I in CartesianIndices((N, N))
    i, j = Tuple(I)
    x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
    ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
    du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
                B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
    du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
                A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
    end
end
p = (3.4, 1., 10., step(xyd_brusselator))

input = rand(N,N,2)
output = similar(input)
f = ODEFunction{true, SciMLBase.FullSpecialize}(brusselator_2d_loop)
function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end

u0 = init_brusselator_2d(xyd_brusselator)
prob = ODEProblem(f,u0,(0.,11.5),p);

sol = solve(prob,Tsit5())
```

The first two stiff ODEs exist with maxiters. The last one actually completes, but used to `dtmin`, so it was a false negative and now just a slow way to compute it but it works. So these are the cases where in theory `dtmin` should cause earlier exits, non-stiff ODE solver on a stiff ODE, and it's not even the thing causing exits here because it's small enough to not be "the" factor.

3. One of the arguments that I have seen for dtmin was that it can reduce the possibility of numerical error. But numerical error with dt being super small isn't compounding, the issue is that you get `x + dt*y = x`. If you look at what happens with very small steps on exponential growth (exponential growth is an ODE which has exponential error growth and is thus a hard not easy test for ODE solver error behavior), then you can see what I mean:

```julia
using OrdinaryDiffEq, Plots
f = (u,p,t) -> -p.a*u
prob = ODEProblem(f, [10.0], (0.0, 0.1), (a=-100.0,))
sol1 = solve(prob, Vern7(), dtmin=0);
sol2 = solve(prob, Vern7(), tstops=[sol1.t[2] + eps()], dtmin=0)
sol3 = solve(prob, Vern7(), tstops=[sol1.t[2] + 0.01eps()], dtmin=0)
sol4 = solve(prob, Vern7(), tstops=[sol1.t[2] + 0.001eps()], dtmin=0)

plot(sol1)
plot!(sol2)
plot!(sol3)
plot!(sol4)

t = 0.0:1e-5:0.1
@show maximum((sol1(t) - sol2(t)) ./ sol1(t)) # 8.678693835607108e-7
@show maximum((sol1(t) - sol3(t)) ./ sol1(t)) # 2.680284044683277e-13
@show maximum((sol1(t) - sol4(t)) ./ sol1(t)) # 0.0
```

Small dt's forced by tstops are safe. I've been playing around in many of these examples that have come up, it's safe. The worst that happens is you take a 1e-18 sized step and don't update `u`, but that's because the `u` update is below floating point resolution. You resolve the tstop just fine! So I don't think dtmin is enforcing a safety behavior.

And finally...

4. It can do good things to have dtmin=0, and dt<eps(t)! This one kind of shocked me, but now I kind of get it. For non-stiff ODEs, it doesn't really matter. But for stiff ODEs and DAEs, you can had some interesting adverse behaviors due to the way that nonlinear solvers work. Sometimes rejecting the step and shrinking dt can improve the next step prediction and make the Jacobian less singular. Another interesting behavior is that sometimes a "real" sized step has too much error because a DAE's nonlinear algebraic equations can be changing their branches, i.e. there is a discontinuity. Handling that discontinuity with dt=1e-18 can make it so you effectively are doing something akin to a callback, time almost doesn't move forward, the differential variables don't change, but the algebraic equations can be bumped into a new state. And with any larger step, the error estimate denies the change. Thus allowing extremely small steps improves robustness to discontinuities especially on hard problems. This seems to be one of the reasons why CVODE and IDA does well on a few cases we didn't before. It's not anything to do with the solver, it's just we gave up early with dtmin, while Sundials would just keep trying with smaller than reasonable dt's, resolve the issue and move on.

With those 4 points, I think it's time to set dtmin=0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant