-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added features for experiment reproductibility and many other improvement : #203
Conversation
elseif symbol == :FoundSolution #last portion required to get the full closed path | ||
dist = model.adhocInfo[1] | ||
n = size(dist)[1] | ||
max_dist = Float32(Base.maximum(dist)) | ||
if isbound(model.variables["a_"*string(n-1)]) | ||
last = assignedValue(model.variables["a_"*string(n-1)]) | ||
first = assignedValue(model.variables["a_1"]) | ||
|
||
dist_to_first_node = lh.current_state.dist[last, first] * max_dist | ||
print("final_dist : ", dist_to_first_node, " // ") | ||
lh.reward.value += -ρ*dist_to_first_node | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This corresponds to the last path portion between the last assigned variable and the start variable.
if isbound(model.variables["a_"*string(current)]) | ||
a_i = assignedValue(model.variables["a_"*string(current)]) | ||
v_i = assignedValue(model.variables["v_"*string(current)]) | ||
last_dist = lh.current_state.dist[v_i, a_i] * max_dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This corresponds to the distance between the previous node and the node that has just been selected by the heuristic one step before. (Recall that the reward is always given one step after, just before making a new decision).
@@ -9,6 +9,7 @@ manually change the mode again if he wants. | |||
""" | |||
function Flux.testmode!(lh::LearnedHeuristic, mode = true) | |||
Flux.testmode!(lh.agent, mode) | |||
lh.agent.policy.explorer.is_training = !mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing RL agent's explorer value to zero during evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that by default the explorer is not called when trainMode is false.
However, this should not be a problem.
""" | ||
function last_episode_total_reward(t::AbstractTrajectory) | ||
last_index = length(t[:terminal]) | ||
last_index == 0 && return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 0 in case the trajectory is empty. This was needed in order to evaluate the model before any training step without triggering a BoundsError
while retrieving last_episode_total_reward
in the trajectory.
if !isnothing(seed) | ||
Random.seed!(seed) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want evaluation instances to be the same across experiments not to be the same inside an experiment (which is totally the opposite we are trying to achieve), hence the rng
attribute has to be defined outside the fill_with_generator!
function.
model = eval.instances[i] | ||
reset_model!(model) | ||
if eval.metrics[i,j].nbEpisodes == 0 | ||
dt = @elapsed search!(model, strategy, variableHeuristic, heuristic) | ||
eval.metrics[i,j](model,dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A search is computed for BasicHeuristics only once.
else | ||
dt,numberOfNodes, numberOfSolutions = repeatlast!(eval.metrics[i,j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every following evaluation step, we only repeat the metrics and avoid computing already existing results.
NN models can now completely be deterministically generated. One step toward perfect experiment reproductibility. |
The first evaluation is not anymore impacted by compilation time required while running the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me there is no major problem to merge this pull request.
Maybe we should just postpone the addition of tsptwRealData.jl
, since it doesn't seem essential and it doesn't seem to me to be fully functional.
Apart from that, there are plenty of useful improvements and the only remarks are about comments or minor points.
@@ -9,6 +9,7 @@ manually change the mode again if he wants. | |||
""" | |||
function Flux.testmode!(lh::LearnedHeuristic, mode = true) | |||
Flux.testmode!(lh.agent, mode) | |||
lh.agent.policy.explorer.is_training = !mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that by default the explorer is not called when trainMode is false.
However, this should not be a problem.
src/datagen/latin.jl
Outdated
Fill a CPModel with the variables and constraints generated. We fill it directly instead of | ||
creating temporary files for efficiency purpose. | ||
|
||
A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this comment is outdated, since the seed has been replaced by a random generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that by default the explorer is not called when
trainMode
is false.
I thought that it was the case too, but it is not. Before this modifications, the explorer rate was decreasing even during evaluation which was highly problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed of this with the creator of the package ReinforcementLearning.jl here.
src/datagen/knapsack.jl
Outdated
@@ -16,11 +16,13 @@ It is possible to give `Inf` as the `gen.correlation` to have a strict equality | |||
`gen.correlation` must be strictly positive. | |||
This method is from the following paper: | |||
https://www.researchgate.net/publication/2548374_Core_Problems_in_Knapsack_Algorithms | |||
|
|||
A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this comment is outdated, since the seed has been replaced by a random generator.
src/datagen/nqueens.jl
Outdated
@@ -15,14 +15,13 @@ end | |||
Fill a CPModel with the variables and constraints generated. We fill it directly instead of | |||
creating temporary files for efficiency purpose. | |||
|
|||
A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this comment is outdated, since the seed has been replaced by a random generator.
src/datagen/nqueens.jl
Outdated
This generator create graps for the NQueens problem. | ||
|
||
""" | ||
function fill_with_generator!(cpmodel::CPModel, gen::NQueensGenerator; seed=nothing) | ||
function fill_with_generator!(cpmodel::CPModel, gen::NQueensGenerator; rng::Union{Nothing,AbstractRNG} = nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no randomness in the generation of nqueen problems, is it necessary to add a random generator parameter ? I don't think that removing it would be a problem, including for the consistency of the api, since this parameter is optional in any fill_with_generator!
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, I removed it.
if ! isempty(model.statistics.nodevisitedpersolution) #infeasible case | ||
push!(metrics.meanNodeVisitedUntilfirstSolFound,model.statistics.nodevisitedpersolution[1]) | ||
end | ||
index = findall(!isnothing, model.statistics.solutions) #return the index of the first solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as before.
if ! isempty(model.statistics.nodevisitedpersolution) #infeasible case | ||
push!(metrics.meanNodeVisitedUntilfirstSolFound,model.statistics.nodevisitedpersolution[1]) | ||
end | ||
index = findall(!isnothing, model.statistics.solutions) #return the index of the first solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as before.
|
||
if isa(metrics,BasicMetrics{O,<:LearnedHeuristic}) | ||
metrics.totalReward = rollmean(metrics.totalReward,windowspan) | ||
function repeatlast!(metrics::BasicMetrics{<:AbstractTakeObjective, <:BasicHeuristic}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I saw that it was to avoid repeating the evaluations for fixed deterministic heuristics, maybe a comment should be added since the purpose of the function is not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is really confusing but was an easy way to virtually repeat evaluation metrics without having to do 1 week of refacto.
""" | ||
function SameInstancesEvaluator(valueSelectionArray::Array{H, 1}, generator::AbstractModelGenerator; seed=nothing, evalFreq::Int64 = 50, nbInstances::Int64 = 10, evalTimeOut::Union{Nothing,Int64} = nothing) where H<: ValueSelection | ||
|
||
Constructor for SameInstancesEvaluator. In order to generate nbInstances times the same evaluation instance, a seed has to be specified. Otherwise, the instance will be generated randomly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as for the fill_with_generator!
remarks. It would be nice to update the comment to reflect the fact that the seed has been replaced by a random generator.
src/datagen/tsptwRealData.jl
Outdated
#x_pos = zeros(gen.n_city) | ||
#y_pos = zeros(gen.n_city) | ||
#grid_size = 0 | ||
#cpmodel.adhocInfo = dist, timeWindows, hcat(x_pos, y_pos), grid_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that in order to use the tsptw specific rewards, you have to fill in adhocInfo
.
Since this new generator does not seem essential to me, maybe just remove it from the pull request until it is functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, I moved the file tsptwRealData.jl
and its dependencies on the branch Tsptw-Real-Data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an exhaustive review, there are a lot of changes and I don't understand all the details of the RL part but I tried to roughly understand the changes and check if it made sense to me.
This reward is the exact reward implemented by Quentin Cappart in | ||
his recent paper: Combining RL & CP for Combinatorial Optimization, https://arxiv.org/pdf/2006.01610.pdf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be nice to explain in a few words how the smart reward works and why it's interesting to use it. If not, maybe add "section 2.2" after the paper's link to make this information easier for the user.
last = assignedValue(model.variables["a_"*string(n-1)]) | ||
first = assignedValue(model.variables["a_1"]) | ||
|
||
dist_to_first_node = lh.current_state.dist[last, first] * max_dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's the good behavior but I don't understand the * max_dist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be copied from the original implementation of the reward provided by @qcappart :
However as I can't find it anymore in the original repo, I removed the factor * max_dist
in the computation of last_dist
and dist_to_first_node
.
src/RL/utils/totalreward.jl
Outdated
|
||
#if t[:terminal][last_index] #TODO understand why they wrote this | ||
|
||
#if t[:terminal][last_index] Do we need to consider cases where the last state is not a terminal state ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this case been resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @marco-novaes98, this case is done in line 10.
and every computation like fixPoints and backtracking has been done. | ||
Change the current reward at the DecisionPhase. This is called right before making the next decision, so you know you have the very last state before the new decision and every computation like fixPoints and backtracking has been done. | ||
|
||
This computes the reward : ρ*( 1+ tour_upper_bound - last_dist) where ρ is a constant, tour_upper_bound and upper bound of the tour and lastdist the distance between the previous node and the target node decided by the previous decision (the reward is attributed just before takng a new decision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo in "takng"
index = findall(!isnothing, model.statistics.solutions) #return the list of index of real solution in model.statistics.solutions | ||
push!(metrics.meanNodeVisitedUntilfirstSolFound, !isempty(index) ? model.statistics.nodevisitedpersolution[index[1]] : nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referring to the 4 findall
of this function, can we have more than one real solution?
- If so, I don't understand why we only consider the first solution in the
push!
right after. - If not, it might be better to use
findfirst
to make the code clearer and a bit faster (and also update the comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the general case, we can have more than one solution in model.statistics.solutions
.
If so, I don't understand why we only consider the first solution in the push! right after.
This is because we are looking for the number of nodes until the first solution is found :
push!(metrics.meanNodeVisitedUntilfirstSolFound, !isempty(index) ? model.statistics.nodevisitedpersolution[index[1]] : nothing)
If not, it might be better to use findfirst to make the code clearer and a bit faster (and also update the comment)
Your point is correct, findfirst
is the best practice.
…/SeaPearl.jl into tom/feature/new_pipeline
Fix some issues raised by @marco-novaes98. This PR can finally be merged. |
I added several interesting features to be able to exactly reproduce each experiment given a couple of seed
(seedEval,seed)
:seedEval
is used to generate a specific random stream to generate evaluation instances.seed
is used to handle every other random stream. (random heuristic, weight initialisation...)I plotted some metrics for 2 different experiments on a tsptw experiment with 20 nodes. It is absolutely normal that we see no improvement of the learned heuristics as hypermeters were not set for this purpose. As we can see performance of the nearest heuristic is the same in both experiments, showing good proof that Evaluation instances are consistent given the same
Evalseed
:This work is still in progress, as I am facing some unexpected behaviors from generated CP instances. Given an exact same model, the fix-point yields different results in terms of value order for a given variable. As a result, random heuristic choices are not consistent across experiments as shown in previous plots. The ollowing screenshots shows that 2 identical assignation yield 2 different domain indexing.
Features added :
Minor changes/bug fixed :
TsptwGeneratorFromRealData
generator based on @kimriouxparadis work.What will come next / things to fix :
fillwithgenerator
functions for every generatorTsptwGeneratorFromRealData
works with SmartReward which is not the case by now asTsptwGeneratorFromRealData
does not fillmodel.adhocinfo
with the same attributes than 'TsptwGenerator' --> POSTPONED