Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added features for experiment reproductibility and many other improvement : #203

Merged
merged 46 commits into from
May 5, 2022

Conversation

3rdCore
Copy link
Collaborator

@3rdCore 3rdCore commented Feb 23, 2022

I added several interesting features to be able to exactly reproduce each experiment given a couple of seed (seedEval,seed) :

  • seedEval is used to generate a specific random stream to generate evaluation instances.
  • seed is used to handle every other random stream. (random heuristic, weight initialisation...)

I plotted some metrics for 2 different experiments on a tsptw experiment with 20 nodes. It is absolutely normal that we see no improvement of the learned heuristics as hypermeters were not set for this purpose. As we can see performance of the nearest heuristic is the same in both experiments, showing good proof that Evaluation instances are consistent given the same Evalseed :

Experiment 1 Experiment 2
image image

This work is still in progress, as I am facing some unexpected behaviors from generated CP instances. Given an exact same model, the fix-point yields different results in terms of value order for a given variable. As a result, random heuristic choices are not consistent across experiments as shown in previous plots. The ollowing screenshots shows that 2 identical assignation yield 2 different domain indexing.

Experiment 1 Experiment 2
image image

Features added :

  • added seed for evaluation instance generation
  • added seed for random ValueHeuristic
  • added seed for RL explorer

Minor changes/bug fixed :

  • evalFreq can no longer be smaller than 1, (in that case they were no evaluations)
  • An evaluation is now done before running a first training episode
  • added AccumulatedRewardBeforeReset to retrieve one episode total reward
  • determinist and random heuristics are no longer re-evaluated along with training as their behavior is not supposed to change.
  • SmartReward has been totally modified. It is now the exact same one used by @qcappart in its 2020 paper.
  • Added TsptwGeneratorFromRealData generator based on @kimriouxparadis work.

What will come next / things to fix :

  • During the first evaluation, execution time takes into account precompiling time which can biased the metric. that need to be fixed.
  • testset for every new features
  • add seed for RL model weight initialization
  • add seed for training instance generation
  • refacto all fillwithgenerator functions for every generator
  • make sure that TsptwGeneratorFromRealData works with SmartReward which is not the case by now as TsptwGeneratorFromRealData does not fill model.adhocinfo with the same attributes than 'TsptwGenerator' --> POSTPONED

@3rdCore 3rdCore added the bug Something isn't working label Feb 23, 2022
Comment on lines 27 to 38
elseif symbol == :FoundSolution #last portion required to get the full closed path
dist = model.adhocInfo[1]
n = size(dist)[1]
max_dist = Float32(Base.maximum(dist))
if isbound(model.variables["a_"*string(n-1)])
last = assignedValue(model.variables["a_"*string(n-1)])
first = assignedValue(model.variables["a_1"])

dist_to_first_node = lh.current_state.dist[last, first] * max_dist
print("final_dist : ", dist_to_first_node, " // ")
lh.reward.value += -ρ*dist_to_first_node
end
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This corresponds to the last path portion between the last assigned variable and the start variable.

if isbound(model.variables["a_"*string(current)])
a_i = assignedValue(model.variables["a_"*string(current)])
v_i = assignedValue(model.variables["v_"*string(current)])
last_dist = lh.current_state.dist[v_i, a_i] * max_dist
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This corresponds to the distance between the previous node and the node that has just been selected by the heuristic one step before. (Recall that the reward is always given one step after, just before making a new decision).

@@ -9,6 +9,7 @@ manually change the mode again if he wants.
"""
function Flux.testmode!(lh::LearnedHeuristic, mode = true)
Flux.testmode!(lh.agent, mode)
lh.agent.policy.explorer.is_training = !mode
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing RL agent's explorer value to zero during evaluation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that by default the explorer is not called when trainMode is false.

However, this should not be a problem.

"""
function last_episode_total_reward(t::AbstractTrajectory)
last_index = length(t[:terminal])
last_index == 0 && return 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return 0 in case the trajectory is empty. This was needed in order to evaluate the model before any training step without triggering a BoundsError while retrieving last_episode_total_reward in the trajectory.

Comment on lines -26 to -28
if !isnothing(seed)
Random.seed!(seed)
end
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want evaluation instances to be the same across experiments not to be the same inside an experiment (which is totally the opposite we are trying to achieve), hence the rng attribute has to be defined outside the fill_with_generator! function.

Comment on lines +73 to +77
model = eval.instances[i]
reset_model!(model)
if eval.metrics[i,j].nbEpisodes == 0
dt = @elapsed search!(model, strategy, variableHeuristic, heuristic)
eval.metrics[i,j](model,dt)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A search is computed for BasicHeuristics only once.

Comment on lines +79 to +80
else
dt,numberOfNodes, numberOfSolutions = repeatlast!(eval.metrics[i,j])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every following evaluation step, we only repeat the metrics and avoid computing already existing results.

@3rdCore
Copy link
Collaborator Author

3rdCore commented Mar 22, 2022

NN models can now completely be deterministically generated. One step toward perfect experiment reproductibility.

@3rdCore
Copy link
Collaborator Author

3rdCore commented Mar 22, 2022

The first evaluation is not anymore impacted by compilation time required while running the evaluate code block for the first time. Testsets were added for that feature.

Copy link
Collaborator

@gostreap gostreap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me there is no major problem to merge this pull request.

Maybe we should just postpone the addition of tsptwRealData.jl, since it doesn't seem essential and it doesn't seem to me to be fully functional.

Apart from that, there are plenty of useful improvements and the only remarks are about comments or minor points.

@@ -9,6 +9,7 @@ manually change the mode again if he wants.
"""
function Flux.testmode!(lh::LearnedHeuristic, mode = true)
Flux.testmode!(lh.agent, mode)
lh.agent.policy.explorer.is_training = !mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that by default the explorer is not called when trainMode is false.

However, this should not be a problem.

Fill a CPModel with the variables and constraints generated. We fill it directly instead of
creating temporary files for efficiency purpose.

A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this comment is outdated, since the seed has been replaced by a random generator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that by default the explorer is not called when trainMode is false.

I thought that it was the case too, but it is not. Before this modifications, the explorer rate was decreasing even during evaluation which was highly problematic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed of this with the creator of the package ReinforcementLearning.jl here.

@@ -16,11 +16,13 @@ It is possible to give `Inf` as the `gen.correlation` to have a strict equality
`gen.correlation` must be strictly positive.
This method is from the following paper:
https://www.researchgate.net/publication/2548374_Core_Problems_in_Knapsack_Algorithms

A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this comment is outdated, since the seed has been replaced by a random generator.

@@ -15,14 +15,13 @@ end
Fill a CPModel with the variables and constraints generated. We fill it directly instead of
creating temporary files for efficiency purpose.

A seed must be specified by the user to generate a specific instance. As long as Random.seed!(seed) is called at the beginning of the function, every random-based operations with be deterministic. Caution : this is not the seed that must be specified in order to generate a same set of evaluation instances across experiment, in that case, the user must call Random.seed! only once, at the beginning of the experiment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this comment is outdated, since the seed has been replaced by a random generator.

This generator create graps for the NQueens problem.

"""
function fill_with_generator!(cpmodel::CPModel, gen::NQueensGenerator; seed=nothing)
function fill_with_generator!(cpmodel::CPModel, gen::NQueensGenerator; rng::Union{Nothing,AbstractRNG} = nothing)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no randomness in the generation of nqueen problems, is it necessary to add a random generator parameter ? I don't think that removing it would be a problem, including for the consistency of the api, since this parameter is optional in any fill_with_generator! function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I removed it.

if ! isempty(model.statistics.nodevisitedpersolution) #infeasible case
push!(metrics.meanNodeVisitedUntilfirstSolFound,model.statistics.nodevisitedpersolution[1])
end
index = findall(!isnothing, model.statistics.solutions) #return the index of the first solution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as before.

if ! isempty(model.statistics.nodevisitedpersolution) #infeasible case
push!(metrics.meanNodeVisitedUntilfirstSolFound,model.statistics.nodevisitedpersolution[1])
end
index = findall(!isnothing, model.statistics.solutions) #return the index of the first solution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as before.


if isa(metrics,BasicMetrics{O,<:LearnedHeuristic})
metrics.totalReward = rollmean(metrics.totalReward,windowspan)
function repeatlast!(metrics::BasicMetrics{<:AbstractTakeObjective, <:BasicHeuristic})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this function ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I saw that it was to avoid repeating the evaluations for fixed deterministic heuristics, maybe a comment should be added since the purpose of the function is not obvious.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is really confusing but was an easy way to virtually repeat evaluation metrics without having to do 1 week of refacto.

"""
function SameInstancesEvaluator(valueSelectionArray::Array{H, 1}, generator::AbstractModelGenerator; seed=nothing, evalFreq::Int64 = 50, nbInstances::Int64 = 10, evalTimeOut::Union{Nothing,Int64} = nothing) where H<: ValueSelection

Constructor for SameInstancesEvaluator. In order to generate nbInstances times the same evaluation instance, a seed has to be specified. Otherwise, the instance will be generated randomly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as for the fill_with_generator! remarks. It would be nice to update the comment to reflect the fact that the seed has been replaced by a random generator.

#x_pos = zeros(gen.n_city)
#y_pos = zeros(gen.n_city)
#grid_size = 0
#cpmodel.adhocInfo = dist, timeWindows, hcat(x_pos, y_pos), grid_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that in order to use the tsptw specific rewards, you have to fill in adhocInfo.

Since this new generator does not seem essential to me, maybe just remove it from the pull request until it is functional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I moved the file tsptwRealData.jl and its dependencies on the branch Tsptw-Real-Data.

Copy link
Contributor

@marco-novaes98 marco-novaes98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an exhaustive review, there are a lot of changes and I don't understand all the details of the RL part but I tried to roughly understand the changes and check if it made sense to me.

Comment on lines +5 to +6
This reward is the exact reward implemented by Quentin Cappart in
his recent paper: Combining RL & CP for Combinatorial Optimization, https://arxiv.org/pdf/2006.01610.pdf.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be nice to explain in a few words how the smart reward works and why it's interesting to use it. If not, maybe add "section 2.2" after the paper's link to make this information easier for the user.

last = assignedValue(model.variables["a_"*string(n-1)])
first = assignedValue(model.variables["a_1"])

dist_to_first_node = lh.current_state.dist[last, first] * max_dist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's the good behavior but I don't understand the * max_dist

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be copied from the original implementation of the reward provided by @qcappart :

https://github.com/qcappart/hybrid-cp-rl-solver/blob/master/src/problem/tsptw/environment/environment.py

However as I can't find it anymore in the original repo, I removed the factor * max_dist in the computation of last_dist and dist_to_first_node.


#if t[:terminal][last_index] #TODO understand why they wrote this

#if t[:terminal][last_index] Do we need to consider cases where the last state is not a terminal state ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this case been resolved?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @marco-novaes98, this case is done in line 10.

and every computation like fixPoints and backtracking has been done.
Change the current reward at the DecisionPhase. This is called right before making the next decision, so you know you have the very last state before the new decision and every computation like fixPoints and backtracking has been done.

This computes the reward : ρ*( 1+ tour_upper_bound - last_dist) where ρ is a constant, tour_upper_bound and upper bound of the tour and lastdist the distance between the previous node and the target node decided by the previous decision (the reward is attributed just before takng a new decision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo in "takng"

Comment on lines 54 to 55
index = findall(!isnothing, model.statistics.solutions) #return the list of index of real solution in model.statistics.solutions
push!(metrics.meanNodeVisitedUntilfirstSolFound, !isempty(index) ? model.statistics.nodevisitedpersolution[index[1]] : nothing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to the 4 findall of this function, can we have more than one real solution?

  • If so, I don't understand why we only consider the first solution in the push! right after.
  • If not, it might be better to use findfirst to make the code clearer and a bit faster (and also update the comment)

Copy link
Collaborator Author

@3rdCore 3rdCore May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the general case, we can have more than one solution in model.statistics.solutions.

If so, I don't understand why we only consider the first solution in the push! right after.

This is because we are looking for the number of nodes until the first solution is found :
push!(metrics.meanNodeVisitedUntilfirstSolFound, !isempty(index) ? model.statistics.nodevisitedpersolution[index[1]] : nothing)

If not, it might be better to use findfirst to make the code clearer and a bit faster (and also update the comment)

Your point is correct, findfirst is the best practice.

@3rdCore 3rdCore changed the title [In Progress] Added features for experiment reproductibility and many other improvement : Added features for experiment reproductibility and many other improvement : May 4, 2022
@3rdCore
Copy link
Collaborator Author

3rdCore commented May 4, 2022

Fix some issues raised by @marco-novaes98.

This PR can finally be merged.

@gostreap gostreap added the generator Everything related to instance generation. Usually found in the `datagen` folder. label May 5, 2022
@3rdCore 3rdCore merged commit 267ba43 into master May 5, 2022
@3rdCore 3rdCore deleted the tom/feature/new_pipeline branch May 5, 2022 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working generator Everything related to instance generation. Usually found in the `datagen` folder. reinforcement learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants