-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] ExternalMultiAgentEnv #4200
[rllib] ExternalMultiAgentEnv #4200
Conversation
- self.cur_reward in _ExternalEnvEpisode for some reason has to know the ids of all possible agents (serving multiagent example) - the thread of ExternalMultiAgentEnv gets started multiple times, thus ppo.train() deadlocks in the example
Test FAILed. |
Hey, thanks for getting started on this. One idea: is it possible to add multi-agent to the existing external env adapter in a backwards compatible way? For example, adding option arguments to specify an agent id. That way, there could be less code duplication between the two envs. |
introduce a flag 'multiagent' in _ExternalEnvToBaseEnv to reduce code duplication.
Test FAILed. |
Test FAILed. |
return episode_id | ||
|
||
@PublicAPI | ||
def get_action(self, episode_id, observation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One design decision here is whether you should be getting the action of multiple agents at once (as opposed to one at a time).
I think it may be easier to enforce all the observations of all agents acting for the "episode step" must be provided at once in this call. Otherwise, it becomes unclear which agent actions go in which episode timestep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think it would be easier to enforce an observation of all agents in one call to get_action(...)
, this would not allow the same "level" of asynchronism as offered with MultiAgentEnv
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I was thinking you could allow a subset to be passed in. Similar to how step() in MultiAgentEnv returns a subset of the agents in the env, get_action() could take a subset of the agents as well. So the level of asynchrony would be matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks for clarification.
As the logic of ExternalMultiAgentEnv would then be analogous to MultiAgentEnv, I'd opt for that.
Currently, this is working as expected: one can pass in a subset of agent observations and get that subset of agent actions back.
|
||
|
||
@PublicAPI | ||
class ExternalMultiAgentEnv(threading.Thread): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this extend ExternalEnv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only differences I see between ExternalEnv and ExternalMultiAgentEnv are argument types in the method signatures (e.g. action vs. action_dict), so this should be ok?
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
@ctombumila37 any update? I can try to help out here unless you've got changes not yet pushed. |
append _dict to variables
Test PASSed. |
Test PASSed. |
Test FAILed. |
5a704bf
to
d0431ac
Compare
Test FAILed. |
@ctombumila37 I saw the new updates, is this ready to review? Any other issues you've found? |
Yes, I hit no other issues. Should I remove my copy-pasta MultiCartpole-Serving example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good.
- Let's remove the examples as noted (instead, a unit test in test_external_env will suffice).
- I have some comments on further removing some duplication for the _ExternalEnvEpisode helper class.
python/ray/rllib/examples/serving/multiagent/cartpole_client.py
Outdated
Show resolved
Hide resolved
self, action_space, observation_space, max_concurrent) | ||
|
||
# we require to know all agents' spaces | ||
if isinstance(self.action_space, dict) or isinstance(self.observation_space, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed sometimes you pass in None for the spaces here -- should that be allowed?
Test FAILed. |
Test FAILed. |
Remove multiagent cartpole examples.
Test FAILed. |
Remove _ExternalMultiAgentEnvEpisode
Test FAILed. |
Test FAILed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I think we can merge once the unit test is added.
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I pushed some lint fixes
Test FAILed. |
Tests look good, thanks for contributing this! |
What do these changes do?
Create a combination of
ExternalEnv
andMultiAgentEnv
, calledExternalMutliAgentEnv
Related issue number
#4051
Please note that this PR is by far not finished.
For things that do not work (yet), see the commit messages.
I am a novice in ray / rllib, thus I would appreciate any help with this :)
To Do
_ExternalMultiAgentEnvEpisode
into_ExternalEnvEpisode