-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Example script custom_metrics_and_callbacks.py
should work for batch_mode=complete_episodes
.
#22684
[RLlib] Example script custom_metrics_and_callbacks.py
should work for batch_mode=complete_episodes
.
#22684
Conversation
…ch_mode':'complete_episodes'.
…ed during debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please rebase master into your branch, a lot of strange numpy random number generator errors.
thanks.
][-1], ( | ||
"ERROR: `on_episode_end()` should only be called " "after episode is done!" | ||
) | ||
# Check if there are multiple episodes in a batch, i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a ton for noticing this.
any suggestion on how can we assert this for "complete_episode" mode as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trainer class
I think in case of complete_episodes
we do not have to care about, if episodes are really done
because they are always done (complete). But to have a callback that can run with both batch_mode
s we need a prior safeguard for this. In case of truncate_episodes
we have to be careful if the episode is indeed done.
Custom usage
This is different, if a user implements the callback in his own code and calls this callback before an episode is done. I don't know if (1) this case is rare and would happen only between a rollout and a batch completion or (2) if it can happen literally at any point of sampling. If we want to ensure that the below assertion does not error out, we might have a look at the batches
attribute of the PolicyCollector
as this gets emptied when the MultiAgentBatch
is created by the simple_list_collector
. I have also not yet figured out, if this attribute (batches
) is only emptied, when the episode is done
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, even in "complete_episodes" mode, there could be more than one episodes in the final train batch (but not here in this callback).
- We should probably rename
multiple_episodes_in_batch
intocollect_complete_episodes
(bool). - I think we should rather use
worker.policy_config["batch_mode"]
here in this if block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @simonsays1980 , thanks for this fix! Let me know, if you want to do a follow-up PR with the 2 above suggested changes. I think this would help clarify these batch generating rules for the different settings even more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sven1977, yes I can do a follow-up PR. Should I do this within the same branch or better create a new one?
a820140
to
9d0f2eb
Compare
@gjoliver I rebased. Can we rerun the tests somehow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, the tests look reasonable.
@sven1977, can you help merge? :)
custom_metrics_and_callbacks.py
should work for batch_mode=complete_episodes
.
Why are these changes needed?
To enable a better traceability and to give users a universal guideline that works with different
batch_mode
s when producing custom metrics.So far the
batch_mode="complete_episodes"
produces an error in thecustom_metrics_and_callbacks.py
. The reason for this is:batch_mode="complete_episodes"
theSimpleListCollector
will be called to build aMultiAgentBatch
build()
method of thePolicyCollector
and empty thebatches
attribute.batches
from thePolicyCollector
are empty now,episode.batch_builder.policy_collectors["default_policy"].batches[-1]
does not exist in theassert
expression.For most of the end-users running their algorithms with
batch_mode="complete_episodes"
trying to implement their own metrics by following this example will possibly not able to trace this error back and to find a solution for their own code.Related issue number
#22683
Checks
scripts/format.sh
to lint the changes in this PR.