-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDDP.jl example #98
Comments
Hi, the only thing that is majorly different is that the initial conditions are not the same - you would have to scale beta in order to have a population size of 1000, but this wasn't the problem. I also tried putting the upper bound in the variable declaration rather than as a constraint: @variable(sp, 0 ≤ υ_cumulative ≤ υ_total, SDDP.State, initial_value = 0) but that didn't fix things either - I'm not sure whether this is the correct approach in SDDP.jl anyway. There's an additional using SDDP, JuMP, Ipopt
tmax = 40.0
δt = 0.1
nsteps = Int(tmax / δt)
u0 = [0.99,0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost
model = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = Ipopt.Optimizer,
) do sp, t
@variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
@variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
@variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
#@variable(sp, 0 ≤ υ_cumulative ≤ υ_total, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ ≤ υ_max)
# constraints on control
@constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
@constraint(sp, υ_cumulative.in ≤ υ_total)
# state updating rules
@NLconstraint(sp, S.out == S.in - ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))
@NLconstraint(sp, I.out == I.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in) - ((1-exp(-γ*δt)) * I.in))
@NLconstraint(sp, C.out == C.in + ((1-exp((1 - υ) * β * S.in * I.in * δt)) * S.in))
@stageobjective(sp, C.out)
end
SDDP.train(model; iteration_limit = 10) I'd ping this to the Julia discourse in the Mathematical Optimization thread, and if that doesn't work, try filing an issue in case it really is a corner case. |
I noticed that you had some errors in the transitions (you need to use the negative of the rate in the (1-exp(-rt)) bits, and you also included S.in in the force of infection (where it should just be I.in). I also added a warm state to the control variable. Fixes below - I still get an error though. using SDDP, JuMP, Ipopt
tmax = 40.0
δt = 0.1
nsteps = Int(tmax / δt)
u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost
model = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = Ipopt.Optimizer,
) do sp, t
@variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
@variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
@variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ ≤ υ_max, start = 0)
# constraints on control
@constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
@constraint(sp, υ_cumulative.in ≤ υ_total)
# expressions to simplify the state updates
@NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
@NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)
# state updating rules
@NLconstraint(sp, S.out == S.in - infection)
@NLconstraint(sp, I.out == I.in + infection - recovery)
@NLconstraint(sp, C.out == C.in + infection)
@stageobjective(sp, C.out)
end
SDDP.train(model; iteration_limit = 10) |
The above breaks at step 202. If you reduce the simulation time to e.g. tmax=20.0 (ie 200 steps), then the model can go through an iteration loop. Any idea how you get a vector of υ? I'm not sure why there is a problem at this stage. using SDDP, JuMP, Ipopt
tmax = 20.0
δt = 0.1
nsteps = Int(tmax / δt)
u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost
model = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = Ipopt.Optimizer,
) do sp, t
@variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
@variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
@variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ ≤ υ_max, start = 0)
# constraints on control
@constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
@constraint(sp, υ_cumulative.in ≤ υ_total)
# expressions to simplify the state updates
@NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
@NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)
# state updating rules
@NLconstraint(sp, S.out == S.in - infection)
@NLconstraint(sp, I.out == I.in + infection - recovery)
@NLconstraint(sp, C.out == C.in + infection)
@stageobjective(sp, C.out)
end
SDDP.train(model; iteration_limit = 10)
rule = SDDP.DecisionRule(model; node = 1)
solution = SDDP.evaluate(rule; incoming_state=Dict(:C => 0.0)) |
In the deterministic case, we can just use JuMP, as explained in this tutorial, which may help to debug whether it's a JuMP issue or an SDDP.jl issue (though I see where you're going with this :)). |
Yeah that's a good idea, maybe it would be nice in any case to have a JuMP -> deterministic SDDP -> stochastic SDDP workflow anyway. I'll play around with the vanilla JuMP model and see if it sheds any light on what's going wrong with the SDDP implementation |
@sdwfrost I didn't notice your above comment. To get the state trajectory after running your last code block (to time 20), we can do this: using Plots
sims = SDDP.simulate(model, 1, [:S,:I])
traj = [[sims[1][i][:S].in,sims[1][i][:I].in] for i in 1:200]
traj = transpose(hcat(traj...))
plot(traj) |
Hi @slwu89! I added a deterministic JuMP.jl example here. There were a few issues, so I had to play a bit with number of steps and dt, but it now works well. The analogous model in SDDP.jl doesn't work with the same parameter values and solver. I can run for a few iterations. For some reason, the algorithm in SDDP.jl tries a solution where υ is high at the beginning, and hence it reaches a point where it can't find a feasible solution before it's gone through all nsteps. (This is why it was failing around step 20 originally). Here's what I have so far: using SDDP, JuMP, Ipopt, Plots
tmax = 100.0
δt = 1.0
nsteps = Int(tmax / δt)
u0 = [0.99, 0.01] # S,I
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0 # maximum cost
opt=Ipopt.Optimizer
model = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = opt,
) do sp, t
@variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
@variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
@variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ ≤ υ_max, start = 0)
# constraints on control
@constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
@constraint(sp, υ_cumulative.out ≤ υ_total)
# expressions to simplify the state updates
@NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
@NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)
# state updating rules
@NLconstraint(sp, S.out == S.in - infection)
@NLconstraint(sp, I.out == I.in + infection - recovery)
@NLconstraint(sp, C.out == C.in + infection)
@stageobjective(sp, C.out)
end
# The solver works when we convert to a JuMP model
# I don't know how to extract the values from this, however
det = SDDP.deterministic_equivalent(model, Ipopt.Optimizer)
set_silent(det)
JuMP.optimize!(det)
JuMP.termination_status(det)
# The stability report is fine
SDDP.numerical_stability_report(model)
# The solver will fail at iteration 7
SDDP.train(model; iteration_limit = 6)
sims = SDDP.simulate(model, 1, [:S,:I,:υ])
traj = [[sims[1][i][:S].in,sims[1][i][:I].in,sims[1][i][:υ]] for i in 1:nsteps]
traj = transpose(hcat(traj...))
plot(traj) I've tried some other NLP solvers, such as MadNLP, but they don't seem to overcome the 'early lockdown' scenario. |
@sdwfrost I found out what the problem was. The way I went about it was by reading the files that SDDP spits out when it dies (of the naming convention "subproblem_XX.mof.json") back into JuMP and then reading off the LaTeX equations to find out why the subproblem at that time point wasn't solvable. sp_moi = JuMP.read_from_file("./subproblem_XX.mof.json")
JuMP.latex_formulation(sp_moi) Based on this, it seems the constraint I've put what I have so far in my fork: https://github.com/slwu89/sir-julia/blob/slwu89/sddp/tutorials/sddp/sddp.jl Note that currently I only have a single The strange looking policy resulting from training is below: |
Hi @slwu89, I plugged your fix into the model, and it's nice it works. I think the problem is with the A 'wait and see' approach where you minimize the treatment until the final step waits too long: if t < nsteps
@stageobjective(sp, υ_cumulative.out)
else
@stageobjective(sp, C.out)
end I'll play around with more models to see if I can overcome the 'short sighted' nature of the current approach. |
Good points @sdwfrost, I'll play around with the objective too. This is a rather interesting topic, in terms of how to best set up local objectives that are able to get a policy close to the global ones that JuMP and InfiniteOpt are able to see. |
The issue of sparse rewards comes up a lot in reinforcement learning, which causes problems as an algorithm finds it hard to work out which bits of the policy contributed to the outcome. Using a different objective is like reward shaping. I really wanted to put together an example using ReinforcementLearning.jl but got stuck trying to think about how to overcome the local/global problem. |
Well, it's an interesting problem, it may be useful to have the example even if we aren't able to solve it (ideal objective function) here, if it is still an open research question (at least with respect to optimizing for minimum infections in SIR)! I tried penalizing the objective function by adding a "control whiplash" term to avoid rapid changes in @variable(sp, z)
# absolute value penalty
@constraint(sp, υ_cumulative.out - υ_cumulative.in ≤ z)
@constraint(sp, -υ_cumulative.out - υ_cumulative.in ≤ z)
# quadratic penalty
@NLconstraint(sp, (υ_cumulative.out - υ_cumulative.in)^2 == z) In general I got smoother control policies, but still far from the optimal one found by JuMP/InfiniteOpt when we can do global optimization: |
You can get something closer if you try to minimize a linear combination of the cumulative cost and the cumulative infections (i.e. trying to keep costs down and infections down, with a weighting factor). @stageobjective(sp, υ_cumulative.out + 40*C.out) A smoother penalty may not help, as the optimal includes two sharp changes. |
Good point. I was trying out that penalty to promote smoothness to cut down on the many spiky changes in policy I found in some of the other training runs, but yes it won't be able to handle the abrupt single lockdown policy very well. |
Hi @slwu89 - do you want to commit a version of this model using the weighted sum as an objective? As you say, it's an interesting problem. I'm happy to work it up into a jmd if you're busy. |
Sure! I'll get a PR ready for you in a day or two. There's a lot of development going on in SDDP.jl, I get the feeling that we'll update this specific tutorial as new features get added to the package, it's exciting stuff. |
@sdwfrost I was hoping to get a stochastic example working for the PR but I don't think it's possible right now. In case things change in JuMP that make this possible (which the JuMP dev team is certainly thinking about, see jump-dev/MathOptInterface.jl#846), I'm going to post where I got to. Because SDDP does not directly support random variables parameterized by state variables, we need to be able to feed in an exogenous source of noise to each node in the policy graph that we can transform into what we need. The most straightforward way to go about this would be to simulate an Euler-Maruyama discretization of an SDE. Each step we first draw 2 realizations of normal random variates (as https://odow.github.io/SDDP.jl/stable/guides/add_multidimensional_noise/) and then add them to the constraint matrix to multiply the nonlinear expressions for the diffusion terms by those variates (following https://odow.github.io/SDDP.jl/stable/guides/add_noise_in_the_constraint_matrix/). The issue is that model_stochastic = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = opt,
) do sp, t
@variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
@variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
@variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ ≤ υ_max)
# constraints on control
@constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
@constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total)
# drift terms
@NLexpression(sp, f_infection, (1 - υ) * β * I.in * S.in)
@NLexpression(sp, f_recovery, γ * I.in)
# diffusion terms
@NLexpression(sp, g_infection, sqrt(f_infection))
@NLexpression(sp, g_recovery, sqrt(f_recovery))
# state updating rules
@NLconstraint(sp, S_update, S.out == S.in - 1f_infection - 1g_infection)
@NLconstraint(sp, I_update, I.out == I.in + 1f_infection - 1f_recovery + 1g_infection - 1g_recovery)
@NLconstraint(sp, C_update, C.out == C.in + 1f_infection + 1g_infection)
# sample the normal random variates
Ω1 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
Ω2 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
SDDP.parameterize(sp, Ω) do ω
# drift terms
JuMP.set_normalized_coefficient(S_update, f_infection, δt) # if this doesn't work set f_infection to a variable and not an expression
JuMP.set_normalized_coefficient(I_update, f_infection, δt)
JuMP.set_normalized_coefficient(I_update, f_recovery, δt)
JuMP.set_normalized_coefficient(C_update, f_infection, δt)
# diffusion terms
JuMP.set_normalized_coefficient(S_update, g_infection, ω.inf)
JuMP.set_normalized_coefficient(I_update, g_infection, ω.inf)
JuMP.set_normalized_coefficient(I_update, g_recovery, ω.rec)
JuMP.set_normalized_coefficient(C_update, g_infection, ω.inf)
end
# linear weighting of objectives
@stageobjective(sp, υ_cumulative.out + 40*C.out)
end |
Thanks Sean! Here's hoping that the JuMP team add nonlinear expressions to |
I'm about to merge the PR in MathOptInterface, when I noticed a linked issue with "SDDP.jl" in the title that piqued my interest. This is cool stuff! You can work around the model_stochastic = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = opt,
) do sp, t
@variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
@variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
@variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ ≤ υ_max)
# constraints on control
@constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
@constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total)
# drift terms
@variable(sp, f_infection)
@NLconstraint(sp, f_infection == (1 - υ) * β * I.in * S.in)
@variable(sp, f_recovery)
@NLconstraint(sp, f_recovery == γ * I.in)
# diffusion terms
@variable(sp, g_infection)
@NLconstraint(sp, g_infection == sqrt(f_infection))
@variable(sp, g_recovery)
@NLconstraint(sp, g_recovery == sqrt(f_recovery))
# state updating rules
@constraint(sp, S_update, S.out == S.in - 1f_infection - 1g_infection)
@constraint(sp, I_update, I.out == I.in + 1f_infection - 1f_recovery + 1g_infection - 1g_recovery)
@constraint(sp, C_update, C.out == C.in + 1f_infection + 1g_infection)
# sample the normal random variates
Ω1 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
Ω2 = sort(rand(Distributions.Normal(0, sqrt(δt)), 100))
Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
SDDP.parameterize(sp, Ω) do ω
# drift terms
JuMP.set_normalized_coefficient(S_update, f_infection, δt) # if this doesn't work set f_infection to a variable and not an expression
JuMP.set_normalized_coefficient(I_update, f_infection, δt)
JuMP.set_normalized_coefficient(I_update, f_recovery, δt)
JuMP.set_normalized_coefficient(C_update, f_infection, δt)
# diffusion terms
JuMP.set_normalized_coefficient(S_update, g_infection, ω.inf)
JuMP.set_normalized_coefficient(I_update, g_infection, ω.inf)
JuMP.set_normalized_coefficient(I_update, g_recovery, ω.rec)
JuMP.set_normalized_coefficient(C_update, g_infection, ω.inf)
end
# linear weighting of objectives
@stageobjective(sp, υ_cumulative.out + 40*C.out)
end |
A core assumption of SDDP.jl is relatively complete recourse. That is, there is always a feasible solution for any choice of the incoming state variables and the random variables. I assume that the issue is @constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total) Use a penalized constraint like this (for some appropriate penalty cost): @variable(sp, penalty >= 0)
@constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total + penalty)
@stageobjective(sp, υ_cumulative.out + 40*C.out + 1_000 * penalty) |
Thanks - this may well be a problem down the line (and I had to play a bit with the deterministic version to overcome this constraint issue), but this is a model problem, as I get an infeasible solution on the first slice. |
It's potentially the non-differentiability of What if you tweak the formulation to model_stochastic = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = opt,
) do sp, t
@variables(sp, begin
0 <= S, SDDP.State, (initial_value = u0[1])
0 <= I, SDDP.State, (initial_value = u0[2])
0 <= C, SDDP.State, (initial_value = 0)
0 <= υ_cumulative <= υ_total, SDDP.State, (initial_value = 0)
0 <= υ <= υ_max
ω_inf
ω_rec
penalty >= 0
infection >= 0
recovery >= 0
end)
@NLexpressions(sp, begin
infection == δt * ((1 - υ) * β * I.in * S.in) + ω_inf * sqrt((1 - υ) * β * I.in * S.in)
recovery == δt * (γ * I.in) + ω_rec * sqrt(γ * I.in)
end)
@constraints(sp, begin
υ_cumulative.out == υ_cumulative.in + δt * υ - penalty
S.out == S.in - infection
I.out == I.in + infection - recovery
C.out == C.in + infection
end)
D = Distributions.Normal(0, sqrt(δt))
Ω1, Ω2 = sort(rand(D, 100)), sort(rand(D, 100))
Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
SDDP.parameterize(sp, Ω) do ω
JuMP.fix(ω_inf, ω.inf)
JuMP.fix(ω_rec, ω.rec)
end
@stageobjective(sp, υ_cumulative.out + 40 * C.out + 100 * penalty)
end |
Thanks for jumping in here and giving us some advice @odow! I just tried the most recent model formulation you posted, after calling |
Hi @sdwfrost, yeah, I'm on Julia 1.9 and have updated Ipopt/JuMP/SDDP to the latest versions. |
This one is running for me (although at 80 min for the first iteration) model = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = Ipopt.Optimizer,
) do sp, t
@variables(sp, begin
0 <= S, SDDP.State, (initial_value = u0[1])
0 <= I, SDDP.State, (initial_value = u0[2])
0 <= C, SDDP.State, (initial_value = 0)
0 <= υ_cumulative <= υ_total, SDDP.State, (initial_value = 0)
0 <= υ <= υ_max
ω_inf
ω_rec
penalty >= 0
infection >= 0
recovery >= 0
end)
@NLexpressions(sp, begin
infection == δt * ((1 - υ) * β * I.in * S.in) + ω_inf * sqrt((1 - υ) * β * I.in * S.in)
recovery == δt * (γ * I.in) + ω_rec * sqrt(γ * I.in)
end)
@constraints(sp, begin
υ_cumulative.out == υ_cumulative.in + δt * υ - penalty
S.out == S.in - infection
I.out == I.in + infection - recovery
C.out == C.in + infection
end)
D = Distributions.Normal(0, sqrt(δt))
Ω1, Ω2 = sort(rand(D, 100)), sort(rand(D, 100))
Ω = [(inf = ω1, rec = ω2) for ω1 in Ω1 for ω2 in Ω2]
SDDP.parameterize(sp, Ω) do ω
JuMP.fix(ω_inf, ω.inf)
JuMP.fix(ω_rec, ω.rec)
end
@stageobjective(sp, υ_cumulative.out + 40 * C.out + 100 * penalty)
end |
Thanks, that is where I am at too. |
Now I take a closer look, the issue is that you have 100 * 100 samples for the Monte Carlo. And assuming
then we need to solve approximately 1_000_000 nonlinear programs per iteration! This could be okay, except that JuMP currently rebuilds the model from scratch every time: jump-dev/JuMP.jl#1185. Problems like this are one reason that we're in the process of rewriting JuMP's nonlinear support. I'd simplify the approximation with something like
|
Thanks for the helpful advice @odow, hah I see now that in the backwards pass that will generate a huge number of solves. I'll try to play around with some balance of fewer samples from the noise for each stage and time step size that gives reasonable solutions ( |
Hi @sdwfrost, I was interested to see if I could get something similar to your InfiniteOpt.jl example running using SDDP.jl. I got this far, but the solvers keep spitting out numerical issues. I'm not familiar with the paper that your InfiniteOpt example was based off, do you see anything obviously wrong with my model here?
The text was updated successfully, but these errors were encountered: