Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add official evaluation script #15

Open
katja-hofmann opened this issue Mar 27, 2017 · 1 comment
Open

Add official evaluation script #15

katja-hofmann opened this issue Mar 27, 2017 · 1 comment

Comments

@katja-hofmann
Copy link
Member

Input: trained agent (git repo + git commit id + json config file that details agent class + parameters)
Output: performance in terms of score@100k and score@500k, where score is the actual malmo pig chase game score, @100k means that the model was saved after 100,000 interaction steps.

Evaluation procedure: the evaluation script should run the trained model for 500 episodes and reports the mean + stderr performance achieved during evaluation.

@Haishion
Copy link

Hi, you are storing the score for each step rather than each episode, then compute the mean and stderr. May I know why? As far as I can see, these two are not equal. For example, (1) moving 1 steps into the exit will result in score 5, and (2) moving 10 steps to catch the pig will result in score 15. However, the mean score of each step will be higher in (1) compared to (2), which has higher score in an episode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants