Summary:

They explain why rigourous experiments are crucial for research progress. They point out all issues with Deep RL experiments. They illustrate their points with extensive experiments.

They demonstrate the effect of :

Random seeds: need to evaluate on more than 5.
Hyper-parameters: huge effect, need for grid-search.
Environments: some algorithms work well in particular environment.
Codebase: different implementations lead to different results.
Evaluation metrics: max does not mean anything. No best metric.

Final thoughts:

Great article because what they show is striking. I need to respect their recommendations, i.e :

open-source code, make it reproducible
grid-search
random seed > 50
careful with evaluation metric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deep_reinforcement_learning_that_matters.md

deep_reinforcement_learning_that_matters.md

Files

deep_reinforcement_learning_that_matters.md

Latest commit

History

deep_reinforcement_learning_that_matters.md

File metadata and controls