-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting states for infinite horizon #445
Comments
A few points here:
Is there a proof that it reaches the optimal steady-state distribution? I think I can construct a counter-example: using SDDP, GLPK
graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 1, 0.99)
model = SDDP.PolicyGraph(
graph,
upper_bound = 100,
optimizer = GLPK.Optimizer,
) do sp, t
@variable(sp, x >= 0, SDDP.State, initial_value = 10)
@variable(sp, 0 <= u <= (t == 1 ? 10 : 1))
@constraint(sp, x.out == x.in - u)
@stageobjective(sp, (t == 1 ? 1 : 10) * u)
end The initial first stage solution is going to be
Case 1: the previous starting point
Case 2: the initial state
You mean using |
I think that a sufficient condition for the model to be able to reach a steady state is that there there is a positive probability that you can transition from one state to any other state (similar to an irreducible / ergodic Markov chain), but the necessary conditions are probably less restrictive. Your example has an absorbing state, which is quite extreme, if we are thinking about modelling infinite horizon. For my infinite horizon, I had Indeed, I meant if terminated_due_to_cycle
# Get the last node in the scenario.
final_node_index = scenario_path[end][1]
# We terminated due to a cycle. Here is the list of possible starting
# states for that node:
starting_states = options.starting_states[final_node_index]
# We also need the incoming state variable to the final node, which is
# the outgoing state value of the last node:
incoming_state_value = sampled_states[end]
# If this incoming state value is more than δ away from another state,
# add it to the list.
if distance(starting_states, incoming_state_value) >
options.cycle_discretization_delta
push!(starting_states, incoming_state_value)
end
end Since |
Do you have a way of looking at a model and checking if it meets the necessary conditions? That seems pretty hard, even if you have domain knowledge of the problem.
Each iteration is only one time around the loop.
When training or simulating? If you're simulating, just do one hugely long simulation, then chop it up into years after the fact. If you're training, we won't do that because you can't prove convergence. Did you try the defaults where we do forward passes of different lengths? |
Thanks for clarifying, I re-read the bit about the starting states, and I now understand what it does, but it seems that the list will only grow if you happen to select the initial state, and if the new point is close to a previously generated point the list will shrink. Do you have a proof of convergence for this? Is it sufficient to simply occasionally restart the method from the initial state vector (e.g. perform 100 iterations carrying the final states forward to the next iteration, and then for iteration 101 reset to the starting levels)? The reason I don't like the I'm training with the (For my simulations, I perform a single simulation that is sliced up.) |
We didn't include the algorithm in the paper, so we never proved. But: assume the state space is closed and convex. Given some discretization distance \delta there exists a finite number of points that can be added to the list of starting states. And if the value function is Lipschitz, then there is some epsilon-error in the final value function. Since we keep adding the initial state back, we will sample the route from the initial state and infinite number of times and so on.
You can also use this SDDP.jl/src/plugins/sampling_schemes.jl Lines 34 to 36 in 092eeb3
For example to extend the length by 52 weeks every 100 iterations, do: rollout_limit = i -> 52 * ceil(Int, i / 100) |
You could probably also write a plugin that swapped out different sampling schemes at different iterations. It's not too much work. Here's an example of a trivial plug-in: SDDP.jl/src/plugins/sampling_schemes.jl Lines 393 to 430 in 092eeb3
|
You could also write a wrapper around the Historical one to allow you to return the terminated_on_cycle |
Thanks for those suggestions, they sound like they'll achieve what I need. I'll probably come back for help when I can't figure out how to actually implement them. |
Totally un-tested, but maybe something like this: mutable struct SamplingSchemeSwapper{F} <: AbstractSamplingScheme
f::F
counter::Int
SamplingSchemeSwapper(f::Function) = new(f, 0)
end
function sample_scenario(g::PolicyGraph, s::SamplingSchemeSwapper; kwargs...)
s.counter += 1
return sample_scenario(k, s.f(s.counter); kwargs...)
end
sampling_scheme = SamplingSchemeSwapper() do iteration
return SDDP.InSampleMonteCarlo(terminate_on_cycle = iteration < 100)
end |
Did you make any progress on this? |
Sorry; I needed to engage my brain a bit before embarking on this - and that hasn't happened, yet. I'm closing this, since you've given me a few ways to do - I just need to go through it and work out the right approach for my needs. Thanks. |
I'm still working on this... What is the expected behaviour of terminate_on_cycle? It appears to terminate after a cycle is detected and the node (node 1) is sampled a second time. This results in twice as many cuts being present at node 1 than other nodes. (I have a linear network with 52 nodes and an additional arc from node 1 to node 52. (The behaviour I'm describing is for the InSampleMonteCarlo sampling scheme.) It also makes it unclear (to me) whether the correct initial state is being stored for node 1 for the subsequent forward pass. Thanks. |
I think that makes sense. You need a cut for |
Typically (without the cycle) there would be no cuts stored in week 52. With the cycle we get n cuts stored in stage 52 and an extra n cuts in stage 1. I'm not sure that makes sense. In my view, the cycle should only be directly affecting the end of horizon value function, not week 1's value function. I made this simple example to try to make sense of what is happening: using JuMP, SDDP, Random, GLPK
include("sddp_modifications.jl")
graph = SDDP.LinearGraph(4)
SDDP.add_edge(graph, 4 => 1, 0.5)
function subproblem_builder(subproblem::Model, node::Int)
# State variables
@variable(subproblem, volume, SDDP.State, initial_value = 4.0)
# Random variables
@variable(subproblem, inflow)
Ω = [1.0]
P = [1.0]
SDDP.parameterize(subproblem, Ω, P) do ω
return JuMP.fix(inflow, ω)
end
# Transition function and constraints
if node==1
@constraint(subproblem, volume.out == volume.in + inflow - 4.0)
else
@constraint(subproblem, volume.out == volume.in + inflow)
end
# Stage-objective
@stageobjective(subproblem, volume.in)
return subproblem
end
model = SDDP.PolicyGraph(
subproblem_builder,
graph;
sense = :Min,
lower_bound = 0.0,
optimizer = GLPK.Optimizer,
)
SDDP.train(model; iteration_limit = 10,
sampling_scheme = SDDP.InSampleMonteCarlo(terminate_on_cycle = true)
)
SDDP.train(model; iteration_limit = 10,
sampling_scheme = SDDP.InSampleMonteCarlo(max_depth=4)
) There are four stages, and each stage has an inflow of 1, but in the first stage 4 units of volume are removed. The objective is the sum of the incoming volumes. The cycle should have volume.in = [4, 1, 2, 3] across the four stages, and just repeat this. However, here is the result from training with ------------------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson, 2017-21
Problem
Nodes : 4
State variables : 1
Scenarios : Inf
Solver : serial mode
Numerical stability report
Non-zero Matrix range [1e+00, 1e+00]
Non-zero Objective range [1e+00, 1e+00]
Non-zero Bounds range [0e+00, 0e+00]
Non-zero RHS range [4e+00, 4e+00]
No problems detected
Iteration Simulation Bound Time (s) Proc. ID # Solves
1 1.000000e+01 1.200000e+01 1.999855e-03 1 9
2 1.000000e+01 1.600000e+01 3.999949e-03 1 18
3 1.400000e+01 1.800000e+01 1.499987e-02 1 29
4 1.400000e+01 1.900000e+01 1.799989e-02 1 40
5 2.000000e+00 1.900000e+01 2.099991e-02 1 51
6 -2.000000e+00 1.900000e+01 2.300000e-02 1 60
7 -1.000000e+00 1.900000e+01 2.499986e-02 1 71
8 1.000000e+01 1.950000e+01 2.799988e-02 1 80
9 1.000000e+01 1.975000e+01 2.900004e-02 1 89
10 -2.000000e+00 1.975000e+01 3.099990e-02 1 98
Terminating training with status: iteration_limit I would have expected the simulation to be 10 every time. Have I missed something? |
Yeah I'm not sure. I have't played with this much so something probably needs fixing. Perhaps we should return SDDP.jl/src/plugins/sampling_schemes.jl Line 272 in 5687376
|
I dug through the code a bit and managed to fix the issue. My fix is slightly different to what you suggested (which didn't quite work): I return the full sequence of nodes in the cycle, but do not add the final stage objective to the total: # Cumulate the stage_objective.
if !terminated_due_to_cycle || depth < length(scenario_path)
cumulative_value += subproblem_results.stage_objective
end However, the wrong incoming state was being appended; so I changed it to # We also need the incoming state variable to the final node, which is
# the outgoing state value of the last node:
incoming_state_value = sampled_states[end-1] These changes have resulted in the expected output: Iteration Simulation Bound Time (s) Proc. ID # Solves
1 1.000000e+01 1.150000e+01 3.000021e-03 1 11
2 1.000000e+01 1.575000e+01 3.999949e-03 1 22
3 1.000000e+01 1.787500e+01 6.999969e-03 1 33
4 1.000000e+01 1.893750e+01 1.399994e-02 1 44
5 1.000000e+01 1.946875e+01 1.600003e-02 1 55
6 1.000000e+01 1.973438e+01 1.799989e-02 1 66
7 1.000000e+01 1.986719e+01 1.999998e-02 1 77
8 1.000000e+01 1.993359e+01 2.199984e-02 1 88
9 1.000000e+01 1.996680e+01 2.399993e-02 1 99
10 1.000000e+01 1.998340e+01 2.600002e-02 1 110
Terminating training with status: iteration_limit
------------------------------------------------------------------------------ These changes have been made to the forward pass function. However, I think the proper fix would be to not evaluate the repeated node in the forward pass; however, when I tried this, I didn't have a reference to the final node for which I needed to set the starting state. |
Actually, ignore my last fix. This is the real fix. # First up, sample a scenario. Note that if a cycle is detected, this will
# return the cycle node as well.
TimerOutputs.@timeit SDDP_TIMER "sample_scenario" begin
scenario_path, terminated_due_to_cycle =
sample_scenario(model, options.sampling_scheme)
end
if terminated_due_to_cycle
final_node = scenario_path[end]
scenario_path=scenario_path[1:end-1]
end # Get the last node in the scenario.
final_node_index = final_node[1]
# We terminated due to a cycle. Here is the list of possible starting
# states for that node:
starting_states = options.starting_states[final_node_index] This fixes both the starting state issue and the extra cuts that were being inserted at node 1. |
@adow031 worked around this by adding a new sampling scheme and a new forward pass: EPOC-NZ/JADE.jl#19. |
I've got an infinite horizon model with
terminate_on_cycle=true
and I've been looking at this section of the code.I think that the best way to deal with steady-state is simply to start the next iteration using the ending state values from the current iteration, but this seems to be doing something a bit more complicated (and perhaps unnecessary, since the ending state values should reach a steady-state distribution over time, meaning you'll be sampling starting levels with the right distribution).
Also, since each iteration add at most one item to the
starting_states
vector and thesplice!
removes one value from the vector it's not really clear it the list would ever be more than two items (the ending state, and the initial state), this would over-train using the initial state (unless I misunderstand the code).Also, I would like to be able to set the
terminate_on_cycle=true
flag (which is needed for the ending states to be stored) when I'm running a historical simulation, is this possible?Thanks,
Tony.
The text was updated successfully, but these errors were encountered: