Add official evaluation script #15

katja-hofmann · 2017-03-27T12:37:55Z

Input: trained agent (git repo + git commit id + json config file that details agent class + parameters)
Output: performance in terms of score@100k and score@500k, where score is the actual malmo pig chase game score, @100k means that the model was saved after 100,000 interaction steps.

Evaluation procedure: the evaluation script should run the trained model for 500 episodes and reports the mean + stderr performance achieved during evaluation.

The text was updated successfully, but these errors were encountered:

Haishion · 2017-05-12T03:11:23Z

Hi, you are storing the score for each step rather than each episode, then compute the mean and stderr. May I know why? As far as I can see, these two are not equal. For example, (1) moving 1 steps into the exit will result in score 5, and (2) moving 10 steps to catch the pig will result in score 15. However, the mean score of each step will be higher in (1) compared to (2), which has higher score in an episode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add official evaluation script #15

Add official evaluation script #15

katja-hofmann commented Mar 27, 2017

Haishion commented May 12, 2017

Add official evaluation script #15

Add official evaluation script #15

Comments

katja-hofmann commented Mar 27, 2017

Haishion commented May 12, 2017