Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Examples folder: All training_iteration translations. #23712

Merged
merged 6 commits into from
Apr 5, 2022

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Apr 5, 2022

All scripts in rllib/examples that use execution_plan are translated to using the new training_iteration API.

  • random_parametric_agent.py
  • two_trainer_workflow.py

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 assigned gjoliver and avnishn and unassigned gjoliver Apr 5, 2022
Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, have some questions.

@@ -922,8 +922,9 @@ def learn_on_batch(self, samples: SampleBatchType) -> Dict:
summarize(samples)
)
)

info_out = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug. There are some cases, where info_out would not be defined later in the code. I wanted to make sure it's at least an empty dict :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's a good idea to define info_out here.
but now that we are doing this, we should try to make sure we always and only do info_out.update(...) in the subsequent code, like line 945.

@@ -36,7 +36,7 @@


@ExperimentalAPI
def train_one_step(trainer, train_batch) -> Dict:
def train_one_step(trainer, train_batch, policies_to_train=None) -> Dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this change implemented in the code base?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually, let me add a nice docstring for this function. ...

dqn_train_results = {}
dqn_train_batch = self.local_replay_buffer.replay()
if dqn_train_batch is not None:
dqn_train_results = train_one_step(self, dqn_train_batch, ["dqn_policy"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's used here.

@sven1977 sven1977 merged commit 434265e into ray-project:master Apr 5, 2022
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice pr. I know it's already merged, but I have a few comments ...

@@ -922,8 +922,9 @@ def learn_on_batch(self, samples: SampleBatchType) -> Dict:
summarize(samples)
)
)

info_out = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's a good idea to define info_out here.
but now that we are doing this, we should try to make sure we always and only do info_out.update(...) in the subsequent code, like line 945.

# Create a new Trainer using the Policy and config defined above and a new
# execution plan.
# Backward compatibility, just in case users want to use the erroneous old name.
RandomParametriclPolicy = RandomParametricPolicy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really necessary?? :) :)
this is an example script ... folks probably shouldn't import an example script and use it as a library?

# Return training metrics.
return StandardMetricsReporting(rollouts, workers, config)
# Return (empty) training metrics.
return {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not collect rollout related metrics here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B/c it's done automatically by RLlib after this. So we always just return the learner stats here.
But yes, we should start thinking about a way to customize this bit of the iteration.


return StandardMetricsReporting(train_op, workers, config)
# Combine results for PPO and DQN into one results dict.
results = dict(ppo_train_results, **dqn_train_results)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this small example is actually really cool!! 2 trainers train 2 policies for different agents in a same ma env. nice.

it doesn't feel exactly right here though, we should be combining the result dict? for example, not overwriting ppo steps with dqn steps, but summing them up?
I wonder how Concurrently does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not overridden. The PPO stats and DQN stats will reside under their policy ID keys.

results = {
"ppo_policy": [some stats],
"dqn_policy": [some other stats]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants