-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Make actions sent by RLlib to the env immutable. #24262
[RLlib] Make actions sent by RLlib to the env immutable. #24262
Conversation
As said the errors before were due to the fact that actions from different action spaces could of non-numpy type. In this case in has to be accounted for the structure. This PR should now work for any action structure, as |
rllib/evaluation/sampler.py
Outdated
@@ -1250,6 +1250,20 @@ def _process_policy_eval_results( | |||
episode._set_last_action(agent_id, action) | |||
|
|||
assert agent_id not in actions_to_send[env_id] | |||
# Flag actions as immutable to notify the user when trying to change it | |||
# and to avoid hardly traceable errors. | |||
def make_action_immutable(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks awesome. what do you think we move this into rllib/utils/numpy.py?
I will try to reuse it for some connectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks awesome. what do you think we move this into rllib/utils/numpy.py? I will try to reuse it for some connectors.
@gjoliver Thanks! Yes I can do this. Regarding the connectors the dataclass
decorator could also become interesting as it makes classes immutable by the __hash__()
function - I was thinking about using it here, but it was an overload
… to tree.traverse as the former function does not include the containing object, but only the contained ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, awesome tech :)
Awesome PR @simonsays1980 ! Thanks for going the extra mile here after our concerns about the first "deepcopy" solution. |
#24262 broke linting. This fixes this.
Why are these changes needed?
As actions could be mutated by users in the environment
step()
function and produces hardly traceable errors a better solution is to warn users not to mutate the actions (directly). This PR gives a solution by setting thenumpy
flagWRITEABLE
toFalse
in the_env_runner()
function whn calling the sampler.The last PR I sent throwed many errors in tests due to the fact that actions can be of different type depending on the action space. This PR includes now immutability for any type of action using
tree.map_structure()
and theMappingProxyType
from Python to makedict
s immutable.Related issue number
#23890
Checks
scripts/format.sh
to lint the changes in this PR.