Added features for experiment reproductibility and many other improvement : #203

3rdCore · 2022-02-23T21:38:11Z

I added several interesting features to be able to exactly reproduce each experiment given a couple of seed (seedEval,seed) :

seedEval is used to generate a specific random stream to generate evaluation instances.
seed is used to handle every other random stream. (random heuristic, weight initialisation...)

I plotted some metrics for 2 different experiments on a tsptw experiment with 20 nodes. It is absolutely normal that we see no improvement of the learned heuristics as hypermeters were not set for this purpose. As we can see performance of the nearest heuristic is the same in both experiments, showing good proof that Evaluation instances are consistent given the same Evalseed :

Experiment 1	Experiment 2

This work is still in progress, as I am facing some unexpected behaviors from generated CP instances. Given an exact same model, the fix-point yields different results in terms of value order for a given variable. As a result, random heuristic choices are not consistent across experiments as shown in previous plots. The ollowing screenshots shows that 2 identical assignation yield 2 different domain indexing.

Experiment 1	Experiment 2

Features added :

added seed for evaluation instance generation
added seed for random ValueHeuristic
added seed for RL explorer

Minor changes/bug fixed :

evalFreq can no longer be smaller than 1, (in that case they were no evaluations)
An evaluation is now done before running a first training episode
added AccumulatedRewardBeforeReset to retrieve one episode total reward
determinist and random heuristics are no longer re-evaluated along with training as their behavior is not supposed to change.
SmartReward has been totally modified. It is now the exact same one used by @qcappart in its 2020 paper.
Added TsptwGeneratorFromRealData generator based on @kimriouxparadis work.

What will come next / things to fix :

During the first evaluation, execution time takes into account precompiling time which can biased the metric. that need to be fixed.
testset for every new features
add seed for RL model weight initialization
add seed for training instance generation
refacto all fillwithgenerator functions for every generator
make sure that TsptwGeneratorFromRealData works with SmartReward which is not the case by now as TsptwGeneratorFromRealData does not fill model.adhocinfo with the same attributes than 'TsptwGenerator' --> POSTPONED

…euristics

3rdCore · 2022-02-23T22:06:21Z

src/CP/valueselection/learning/rewards/smartreward.jl

+    elseif symbol == :FoundSolution #last portion required to get the full closed path
+        dist = model.adhocInfo[1]
+        n =  size(dist)[1]
+        max_dist = Float32(Base.maximum(dist))
+            if isbound(model.variables["a_"*string(n-1)])
+                last = assignedValue(model.variables["a_"*string(n-1)])
+                first = assignedValue(model.variables["a_1"])
+
+                dist_to_first_node = lh.current_state.dist[last, first] * max_dist
+                print("final_dist : ", dist_to_first_node, " // ")
+                lh.reward.value += -ρ*dist_to_first_node 
+            end


This corresponds to the last path portion between the last assigned variable and the start variable.

3rdCore · 2022-02-23T22:08:05Z

src/CP/valueselection/learning/rewards/smartreward.jl

+        if isbound(model.variables["a_"*string(current)])
+            a_i = assignedValue(model.variables["a_"*string(current)])
+            v_i = assignedValue(model.variables["v_"*string(current)])
+            last_dist = lh.current_state.dist[v_i, a_i] * max_dist


This corresponds to the distance between the previous node and the node that has just been selected by the heuristic one step before. (Recall that the reward is always given one step after, just before making a new decision).

3rdCore · 2022-02-23T22:08:30Z

src/CP/valueselection/learning/utils.jl

@@ -9,6 +9,7 @@ manually change the mode again if he wants.
 """
 function Flux.testmode!(lh::LearnedHeuristic, mode = true)
    Flux.testmode!(lh.agent, mode)
+    lh.agent.policy.explorer.is_training = !mode


fixing RL agent's explorer value to zero during evaluation.

It seems to me that by default the explorer is not called when trainMode is false.

However, this should not be a problem.

3rdCore · 2022-02-23T22:11:01Z

src/RL/utils/totalreward.jl

 """
 function last_episode_total_reward(t::AbstractTrajectory)
    last_index = length(t[:terminal])
+    last_index == 0 && return 0


return 0 in case the trajectory is empty. This was needed in order to evaluate the model before any training step without triggering a BoundsError while retrieving last_episode_total_reward in the trajectory.

3rdCore · 2022-02-23T22:14:18Z

src/datagen/tsptw.jl

-    if !isnothing(seed)
-        Random.seed!(seed)
-    end


We want evaluation instances to be the same across experiments not to be the same inside an experiment (which is totally the opposite we are trying to achieve), hence the rng attribute has to be defined outside the fill_with_generator! function.

3rdCore · 2022-02-23T22:15:13Z

src/experiment/evaluation.jl

+                model = eval.instances[i]
+                reset_model!(model)
+                if eval.metrics[i,j].nbEpisodes == 0
+                dt = @elapsed search!(model, strategy, variableHeuristic, heuristic)
+                eval.metrics[i,j](model,dt)            


A search is computed for BasicHeuristics only once.

3rdCore · 2022-02-23T22:16:28Z

src/experiment/evaluation.jl

+                else
+                dt,numberOfNodes, numberOfSolutions = repeatlast!(eval.metrics[i,j])


For every following evaluation step, we only repeat the metrics and avoid computing already existing results.

3rdCore · 2022-03-22T22:24:42Z

NN models can now completely be deterministically generated. One step toward perfect experiment reproductibility.

3rdCore · 2022-03-22T23:42:24Z

The first evaluation is not anymore impacted by compilation time required while running the evaluate code block for the first time. Testsets were added for that feature.

gostreap

For me there is no major problem to merge this pull request.

Maybe we should just postpone the addition of tsptwRealData.jl, since it doesn't seem essential and it doesn't seem to me to be fully functional.

Apart from that, there are plenty of useful improvements and the only remarks are about comments or minor points.

gostreap · 2022-04-27T15:57:56Z

src/CP/valueselection/learning/utils.jl

@@ -9,6 +9,7 @@ manually change the mode again if he wants.
 """
 function Flux.testmode!(lh::LearnedHeuristic, mode = true)
    Flux.testmode!(lh.agent, mode)
+    lh.agent.policy.explorer.is_training = !mode


It seems to me that by default the explorer is not called when trainMode is false.

However, this should not be a problem.

gostreap · 2022-04-27T16:05:10Z

src/datagen/latin.jl

+Fill a CPModel with the variables and constraints generated. We fill it directly instead of
+creating temporary files for efficiency purpose.
+
+A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment. 


It seems to me that this comment is outdated, since the seed has been replaced by a random generator.

It seems to me that by default the explorer is not called when trainMode is false.

I thought that it was the case too, but it is not. Before this modifications, the explorer rate was decreasing even during evaluation which was highly problematic.

We discussed of this with the creator of the package ReinforcementLearning.jl here.

gostreap · 2022-04-27T16:05:32Z

src/datagen/knapsack.jl

@@ -16,11 +16,13 @@ It is possible to give `Inf` as the `gen.correlation` to have a strict equality
 `gen.correlation` must be strictly positive.
 This method is from the following paper:
 https://www.researchgate.net/publication/2548374_Core_Problems_in_Knapsack_Algorithms
+
+A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment. 


It seems to me that this comment is outdated, since the seed has been replaced by a random generator.

gostreap · 2022-04-27T16:06:31Z

src/datagen/nqueens.jl

@@ -15,14 +15,13 @@ end
 Fill a CPModel with the variables and constraints generated. We fill it directly instead of
 creating temporary files for efficiency purpose.

+A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment. 


It seems to me that this comment is outdated, since the seed has been replaced by a random generator.

gostreap · 2022-04-27T16:10:56Z

src/datagen/nqueens.jl

 This generator create graps for the NQueens problem.

 """
-function fill_with_generator!(cpmodel::CPModel, gen::NQueensGenerator; seed=nothing)
+function fill_with_generator!(cpmodel::CPModel, gen::NQueensGenerator; rng::Union{Nothing,AbstractRNG} = nothing)


Since there is no randomness in the generation of nqueen problems, is it necessary to add a random generator parameter ? I don't think that removing it would be a problem, including for the consistency of the api, since this parameter is optional in any fill_with_generator! function.

That's right, I removed it.

gostreap · 2022-04-27T17:31:31Z

src/experiment/metrics/basicmetrics.jl

-    if ! isempty(model.statistics.nodevisitedpersolution)    #infeasible case
-        push!(metrics.meanNodeVisitedUntilfirstSolFound,model.statistics.nodevisitedpersolution[1])
-    end
+    index = findall(!isnothing, model.statistics.solutions)  #return the index of the first solution    


Same remark as before.

gostreap · 2022-04-27T17:31:43Z

src/experiment/metrics/basicmetrics.jl

-    if ! isempty(model.statistics.nodevisitedpersolution)    #infeasible case
-        push!(metrics.meanNodeVisitedUntilfirstSolFound,model.statistics.nodevisitedpersolution[1])
-    end
+    index = findall(!isnothing, model.statistics.solutions)  #return the index of the first solution    


Same remark as before.

gostreap · 2022-04-27T17:34:09Z

src/experiment/metrics/basicmetrics.jl

-
-    if isa(metrics,BasicMetrics{O,<:LearnedHeuristic})
-        metrics.totalReward = rollmean(metrics.totalReward,windowspan)
+function repeatlast!(metrics::BasicMetrics{<:AbstractTakeObjective, <:BasicHeuristic})


What is the purpose of this function ?

Ok I saw that it was to avoid repeating the evaluations for fixed deterministic heuristics, maybe a comment should be added since the purpose of the function is not obvious.

This function is really confusing but was an easy way to virtually repeat evaluation metrics without having to do 1 week of refacto.

gostreap · 2022-04-27T17:36:41Z

src/experiment/evaluation.jl

+"""
+    function SameInstancesEvaluator(valueSelectionArray::Array{H, 1}, generator::AbstractModelGenerator; seed=nothing, evalFreq::Int64 = 50, nbInstances::Int64 = 10, evalTimeOut::Union{Nothing,Int64} = nothing) where H<: ValueSelection
+
+Constructor for SameInstancesEvaluator. In order to generate nbInstances times the same evaluation instance, a seed has to be specified. Otherwise, the instance will be generated randomly. 


Same thing as for the fill_with_generator! remarks. It would be nice to update the comment to reflect the fact that the seed has been replaced by a random generator.

gostreap · 2022-04-27T18:02:51Z

src/datagen/tsptwRealData.jl

+    #x_pos = zeros(gen.n_city)
+    #y_pos = zeros(gen.n_city)
+    #grid_size = 0
+    #cpmodel.adhocInfo = dist, timeWindows, hcat(x_pos, y_pos), grid_size


It seems to me that in order to use the tsptw specific rewards, you have to fill in adhocInfo.

Since this new generator does not seem essential to me, maybe just remove it from the pull request until it is functional.

That's right, I moved the file tsptwRealData.jl and its dependencies on the branch Tsptw-Real-Data.

This reverts commit 63ae9e8.

marco-novaes98

This is not an exhaustive review, there are a lot of changes and I don't understand all the details of the RL part but I tried to roughly understand the changes and check if it made sense to me.

marco-novaes98 · 2022-04-28T15:18:18Z

src/CP/valueselection/learning/rewards/smartreward.jl

+This reward is the exact reward implemented by Quentin Cappart in
+his recent paper: Combining RL & CP for Combinatorial Optimization, https://arxiv.org/pdf/2006.01610.pdf.


Could be nice to explain in a few words how the smart reward works and why it's interesting to use it. If not, maybe add "section 2.2" after the paper's link to make this information easier for the user.

marco-novaes98 · 2022-04-28T15:19:17Z

src/CP/valueselection/learning/rewards/smartreward.jl

+                last = assignedValue(model.variables["a_"*string(n-1)])
+                first = assignedValue(model.variables["a_1"])
+
+                dist_to_first_node = lh.current_state.dist[last, first] * max_dist


I guess it's the good behavior but I don't understand the * max_dist

This is supposed to be copied from the original implementation of the reward provided by @qcappart :

https://github.com/qcappart/hybrid-cp-rl-solver/blob/master/src/problem/tsptw/environment/environment.py

However as I can't find it anymore in the original repo, I removed the factor * max_dist in the computation of last_dist and dist_to_first_node.

marco-novaes98 · 2022-04-28T15:29:24Z

src/RL/utils/totalreward.jl


-    #if t[:terminal][last_index]   #TODO understand why they wrote this
-
+    #if t[:terminal][last_index]  Do we need to consider cases where the last state is not a terminal state ?


Has this case been resolved?

Yes @marco-novaes98, this case is done in line 10.

marco-novaes98 · 2022-04-28T15:38:23Z

src/CP/valueselection/learning/rewards/smartreward.jl

-and every computation like fixPoints and backtracking has been done.
+Change the current reward at the DecisionPhase. This is called right before making the next decision, so you know you have the very last state before the new decision and every computation like fixPoints and backtracking has been done.
+
+This computes the reward : ρ*( 1+ tour_upper_bound  - last_dist) where ρ is a constant, tour_upper_bound and upper bound of the tour and lastdist the distance between the previous node and the target node decided by the previous decision (the reward is attributed just before takng a new decision)


small typo in "takng"

marco-novaes98 · 2022-04-28T16:00:25Z

src/experiment/metrics/basicmetrics.jl

+    index = findall(!isnothing, model.statistics.solutions)  #return the list of index of real solution in model.statistics.solutions
+    push!(metrics.meanNodeVisitedUntilfirstSolFound, !isempty(index) ? model.statistics.nodevisitedpersolution[index[1]] : nothing)


Referring to the 4 findall of this function, can we have more than one real solution?

If so, I don't understand why we only consider the first solution in the push! right after.

If not, it might be better to use findfirst to make the code clearer and a bit faster (and also update the comment)

In the general case, we can have more than one solution in model.statistics.solutions.

If so, I don't understand why we only consider the first solution in the push! right after.

This is because we are looking for the number of nodes until the first solution is found :
push!(metrics.meanNodeVisitedUntilfirstSolFound, !isempty(index) ? model.statistics.nodevisitedpersolution[index[1]] : nothing)

If not, it might be better to use findfirst to make the code clearer and a bit faster (and also update the comment)

Your point is correct, findfirst is the best practice.

…/SeaPearl.jl into tom/feature/new_pipeline

3rdCore · 2022-05-04T20:33:34Z

Fix some issues raised by @marco-novaes98.

This PR can finally be merged.

Tom added 16 commits February 17, 2022 23:01

added a guard for Evaluator evalFreq

aa2203d

added AccumulatedRewardBeforeReset to compute one path reward

c507cf6

added test sets for

fb2c249

added TsptwRealData + dependencies

63ae9e8

fixed typo

b7930cd

redesign of smartreward functon

ee00993

fixing explorer decaying during evaluation

0429f65

added testsets for explorer decaying

a133734

added doc for fill_with_generator! function

d2a6e66

fix typo

755a1b1

refacto SameInstancesEvaluator to avoid useless eval on non learned h…

793139d

…euristics

added repeatlast! function

b0b6c36

added doc

3442dc2

added rng on evaluator instance generation

de761e8

added rng for tsptw fill_with_generator!

ce14a31

added testset for last_episode_total_reward for empty traj

5cc383c

3rdCore requested review from marco-novaes98 and qcappart February 23, 2022 22:03

3rdCore added the bug Something isn't working label Feb 23, 2022

reduced ρ vaue for SmartReward

1aadc20

3rdCore commented Feb 23, 2022

View reviewed changes

Tom added 6 commits February 23, 2022 23:51

added testsets for fill_with_generator!

1b32928

modified fill_with_generator! function tu fit new requirement

862dbdd

changed SameInstancesEvaluator signature

dd15ac7

revmoves searhwith basicHerstics urng traning

6b88588

fix launch experiment

5154287

added seed for NN layer initialisation

028790d

fixed precompile time bias on 1st evaluation.

d238b82

Tom added 8 commits March 22, 2022 23:56

removed prints

679c628

added seed for training instances generation

5e77251

added first "false" evaluation

2af8044

added testsets for repeatlast!

5e011cb

WIP on basicmetrics

8870ffd

updated basicmetrics to be consistent.

654b6f5

added basic metrics testets

bab6975

removed dead dependencies

4ce7662

gostreap approved these changes Apr 27, 2022

View reviewed changes

Tom added 3 commits April 27, 2022 16:27

Revert "added TsptwRealData + dependencies"

e4add08

This reverts commit 63ae9e8.

removed TsptwRealData dependencies

05b1dd9

refacto doc

50e7461

marco-novaes98 reviewed Apr 28, 2022

View reviewed changes

3rdCore changed the title ~~[In Progress] Added features for experiment reproductibility and many other improvement :~~ Added features for experiment reproductibility and many other improvement : May 4, 2022

3rdCore added 5 commits May 4, 2022 15:31

removed obsolete comments

6085dce

Merge branch 'tom/feature/new_pipeline' of github.com:corail-research…

48f337a

…/SeaPearl.jl into tom/feature/new_pipeline

fixed typo

b82854a

removed max_dist

1536cb5

replaced findall by findfirst

48cf440

3rdCore and others added 6 commits May 4, 2022 16:36

Merge branch 'master' into tom/feature/new_pipeline

9644dc3

fixed merge conflict

2b39d54

Merge branch 'master' into tom/feature/new_pipeline

c100838

Change rng to only take type AbstractRNG

cf94c58

Add solutionFound to repeatLast + fix return type when scores is Nothing

043347b

Fix solutionFound count in repeatlast!

fec5edb

3rdCore added the reinforcement learning label May 5, 2022

gostreap added the generator Everything related to instance generation. Usually found in the `datagen` folder. label May 5, 2022

3rdCore merged commit 267ba43 into master May 5, 2022

3rdCore deleted the tom/feature/new_pipeline branch May 5, 2022 18:24

		else
		dt,numberOfNodes, numberOfSolutions = repeatlast!(eval.metrics[i,j])

		This reward is the exact reward implemented by Quentin Cappart in
		his recent paper: Combining RL & CP for Combinatorial Optimization, https://arxiv.org/pdf/2006.01610.pdf.


		#if t[:terminal][last_index] #TODO understand why they wrote this

		#if t[:terminal][last_index] Do we need to consider cases where the last state is not a terminal state ?

		index = findall(!isnothing, model.statistics.solutions) #return the list of index of real solution in model.statistics.solutions
		push!(metrics.meanNodeVisitedUntilfirstSolFound, !isempty(index) ? model.statistics.nodevisitedpersolution[index[1]] : nothing)

Added features for experiment reproductibility and many other improvement : #203

Added features for experiment reproductibility and many other improvement : #203

Conversation

3rdCore commented Feb 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

3rdCore commented Mar 22, 2022

3rdCore commented Mar 22, 2022 • edited Loading

gostreap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marco-novaes98 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

3rdCore May 4, 2022 • edited Loading

Choose a reason for hiding this comment

3rdCore commented May 4, 2022

3rdCore commented Feb 23, 2022 •

edited

Loading

3rdCore commented Mar 22, 2022 •

edited

Loading

3rdCore May 4, 2022 •

edited

Loading