[rllib] Support prev_state/prev_action in rollout and fix multiagent #4565

vladfi1 · 2019-04-04T23:29:03Z

What do these changes do?

Fixes a few issues with rollout.py:

Multiagent envs were not properly handled.
prev_state and prev_action weren't passed in to the agent.

The code has also been simplified and should be more readable.

Closes #4573

Linter

I've run scripts/format.sh to lint the changes in this PR.

AmplabJenkins · 2019-04-04T23:29:30Z

Can one of the admins verify this patch?

AmplabJenkins · 2019-04-05T00:10:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13577/
Test FAILed.

ericl · 2019-04-05T02:39:48Z

python/ray/rllib/rollout.py

        policy_map = agent.local_evaluator.policy_map
        state_init = {p: m.get_initial_state() for p, m in policy_map.items()}
        use_lstm = {p: len(s) > 0 for p, s in state_init.items()}
+        action_init = {
+            p: m.action_space.sample()


note that in the actual code we feed the all-zeros action initially

Does that make sense? All zeros might not even be in the action space?

Yeah that's a good question.

I think this is fine for now, though it would be better to set it to zeros for consistency (or switch the sampler to stick in a random initial action, but that might be weird).

ericl · 2019-04-05T02:42:05Z

python/ray/rllib/rollout.py

@@ -124,37 +140,46 @@ def rollout(agent, env_name, num_steps, out=None, no_render=True):
    while steps < (num_steps or steps + 1):
        if out is not None:
            rollout = []
-        state = env.reset()
+        obs = env.reset()
+        multi_obs = obs if multiagent else {0: obs}


could use https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/base_env.py#L196 instead of 0

ericl · 2019-04-05T02:43:42Z

python/ray/rllib/rollout.py

@@ -124,37 +140,46 @@ def rollout(agent, env_name, num_steps, out=None, no_render=True):
    while steps < (num_steps or steps + 1):
        if out is not None:
            rollout = []
-        state = env.reset()
+        obs = env.reset()


should the mapping cache be reset as well?

Yeah I think that's appropriate.

cclauss · 2019-04-05T07:57:39Z

python/ray/rllib/rollout.py

            policy_agent_mapping = agent.config["multiagent"][
                "policy_mapping_fn"]
-            mapping_cache = {}
+        else:
+            policy_agent_mapping = lambda _: DEFAULT_POLICY_ID


flake8: E731 do not assign a lambda expression, use a def

Reset the mapping cache at the start of each episode.

AmplabJenkins · 2019-04-06T11:58:39Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13598/
Test FAILed.

AmplabJenkins · 2019-04-06T12:49:43Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13599/
Test FAILed.

AmplabJenkins · 2019-04-07T01:15:56Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13605/
Test FAILed.

ericl · 2019-04-07T03:05:09Z

Seems this causes rllib/tests/test_rollout.sh to raise an error:

Traceback (most recent call last):
  File "../rollout.py", line 212, in <module>
    run(args, parser)
  File "../rollout.py", line 105, in run
    rollout(agent, args.env, num_steps, args.out, args.no_render)
  File "../rollout.py", line 183, in rollout
    next_obs, reward, done, _ = env.step(action)
  File "/home/eric/.local/lib/python3.5/site-packages/gym/wrappers/time_limit.py", line 31, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/eric/.local/lib/python3.5/site-packages/gym/envs/atari/atari_env.py", line 68, in step
    action = self._action_set[a]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

waldroje · 2019-04-09T02:33:10Z

I believe step() needs to also be contingent on multiagent, i.e.
if multiagent:
next_obs, reward, done, _ = env.step(action)
else:
next_obs, reward, done, _ = env.step(action[_DUMMY_AGENT_ID])

and line #172
agent_states[policy_id] = p_state
but above in the call to agent.compute_action, it is using agent_states[agent_id], and at least in the cartpole_lstm example policy_id != agent_id, so obs and prev_states are not being carried forward.

vladfi1 · 2019-04-09T07:17:41Z

Single-agent envs should work now (test_rollout.py passes).

AmplabJenkins · 2019-04-09T09:25:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13652/
Test FAILed.

vladfi1 · 2019-04-09T09:59:04Z

Test failure doesn't seem related to this PR:

  File "/ray/python/ray/experimental/sgd/test_sgd.py", line 48, in <module>
    all_reduce_alg=args.all_reduce_alg)
  File "/ray/python/ray/experimental/sgd/sgd.py", line 110, in __init__
    shard_shapes = ray.get(self.workers[0].shard_shapes.remote())
  File "/ray/python/ray/worker.py", line 2306, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=118, host=308e9c11b868)
NameError: global name 'FileNotFoundError' is not defined

cclauss · 2019-04-09T10:01:52Z

FileNotFoundError was added in Python 3. In Python 2 use OSError instead.

waldroje · 2019-04-09T18:41:56Z

Latest version still is updating
agent_states[policy_id] = p_state
yet passing
agent_states[agent_id]
to agent.compute_action....
Don't believe that is correct as it results in the initial state simply being passed repetitively, unless agent_id==policy_id, which was not the case in the cartpole_lstm example I had run...

vladfi1 · 2019-04-09T18:58:53Z

@waldroje You are correct, good catch (if only python had types...). Fixed now.

AmplabJenkins · 2019-04-09T21:36:25Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13667/
Test FAILed.

ericl · 2019-04-10T07:01:15Z

Lint unrelated.

ericl · 2019-04-10T07:01:36Z

Thanks!

vladfi1 added 3 commits April 4, 2019 23:26

Cleaner and more correct treatment of agent states in rollout.py

8b9be61

support lstm_use_prev_action_reward in rollout.py

dac943b

Linter.

35b735a

ericl self-assigned this Apr 4, 2019

ericl reviewed Apr 5, 2019

View reviewed changes

ericl changed the title ~~Rollout fix~~ [rllib] Support prev_state/prev_action in rollout and fix multiagent Apr 5, 2019

cclauss reviewed Apr 5, 2019

View reviewed changes

vladfi1 added 3 commits April 6, 2019 11:13

appease flake8

ae0bc2c

Use _DUMMY_AGENT_ID instead of 0.

578fb14

All agents have a policy_agent_mapping.

8d05633

Reset the mapping cache at the start of each episode.

Update rollout.py

50adcc9

ericl mentioned this pull request Apr 7, 2019

Prev_action, prev_reward not passed to rollout #4573

Closed

ericl approved these changes Apr 7, 2019

View reviewed changes

vladfi1 added 2 commits April 9, 2019 08:12

Fix rollout.py for single-agent envs.

73f0167

Merge branch 'rollout-fix' of github.com.:vladfi1/ray into rollout-fix

486ad06

Use agent_id, not policy_id.

5fd9ef5

ericl merged commit 74fd3d7 into ray-project:master Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Support prev_state/prev_action in rollout and fix multiagent #4565

[rllib] Support prev_state/prev_action in rollout and fix multiagent #4565

vladfi1 commented Apr 4, 2019 •

edited by ericl

Loading

AmplabJenkins commented Apr 4, 2019

AmplabJenkins commented Apr 5, 2019

ericl Apr 5, 2019

vladfi1 Apr 6, 2019

ericl Apr 6, 2019

ericl Apr 7, 2019

ericl Apr 5, 2019

vladfi1 Apr 6, 2019

ericl Apr 5, 2019

vladfi1 Apr 6, 2019

cclauss Apr 5, 2019 •

edited

Loading

vladfi1 Apr 6, 2019

AmplabJenkins commented Apr 6, 2019

AmplabJenkins commented Apr 6, 2019

AmplabJenkins commented Apr 7, 2019

ericl commented Apr 7, 2019

waldroje commented Apr 9, 2019

vladfi1 commented Apr 9, 2019

AmplabJenkins commented Apr 9, 2019

vladfi1 commented Apr 9, 2019

cclauss commented Apr 9, 2019 •

edited

Loading

waldroje commented Apr 9, 2019

vladfi1 commented Apr 9, 2019

AmplabJenkins commented Apr 9, 2019

ericl commented Apr 10, 2019

ericl commented Apr 10, 2019

[rllib] Support prev_state/prev_action in rollout and fix multiagent #4565

[rllib] Support prev_state/prev_action in rollout and fix multiagent #4565

Conversation

vladfi1 commented Apr 4, 2019 • edited by ericl Loading

What do these changes do?

Linter

AmplabJenkins commented Apr 4, 2019

AmplabJenkins commented Apr 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cclauss Apr 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Apr 6, 2019

AmplabJenkins commented Apr 6, 2019

AmplabJenkins commented Apr 7, 2019

ericl commented Apr 7, 2019

waldroje commented Apr 9, 2019

vladfi1 commented Apr 9, 2019

AmplabJenkins commented Apr 9, 2019

vladfi1 commented Apr 9, 2019

cclauss commented Apr 9, 2019 • edited Loading

waldroje commented Apr 9, 2019

vladfi1 commented Apr 9, 2019

AmplabJenkins commented Apr 9, 2019

ericl commented Apr 10, 2019

ericl commented Apr 10, 2019

vladfi1 commented Apr 4, 2019 •

edited by ericl

Loading

cclauss Apr 5, 2019 •

edited

Loading

cclauss commented Apr 9, 2019 •

edited

Loading