Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Fix error messages and example for dataset writer #23419

Merged
merged 4 commits into from
Mar 28, 2022

Conversation

krfricke
Copy link
Contributor

Why are these changes needed?

Currently the error message and example refer to a field type that is actually format.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@krfricke
Copy link
Contributor Author

krfricke commented Mar 23, 2022

Btw @gjoliver, it seems like final batches are not written because max_num_samples_per_file is never reached.

Thus in the example "max_num_samples_per_file": 1 or similar to be passed as well. Note also that len(self.samples) is usually the number of batches, not the number of samples.

Is this something we can fix? I.e. is there a deconstructor/cleanup method we can use for this?

@@ -44,7 +44,7 @@ def __init__(
output_config: Dict = ioctx.output_config
assert (
"format" in output_config
), "output_config.type must be specified when using Dataset output."
), "output_config.format must be specified when using Dataset output."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks for catching this. went through iterations ...

@@ -1,7 +1,7 @@
# To generate training data, first run:
# $ ./train.py --run=PPO --env=CartPole-v0 \
# --stop='{"timesteps_total": 50000}' \
# --config='{"output": "dataset", "output_config": {"type": "json", "path": "/tmp/out"}, "batch_mode": "complete_episodes"}'
# --config='{"output": "dataset", "output_config": {"format": "json", "path": "/tmp/out", "max_num_samples_per_file": 1}, "batch_mode": "complete_episodes"}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to double check, you want to write 1 sample per file??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well 1 sample = 1 episode here so usually has a few hundred timesteps. When I ran this I ended up with 39 files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but still, why write 1 episode per file? :)
the default half-cheetah json dataset is a single file of maybe a few thousand episodes.
is there any efficiency consideration behind this?
anyways though, this is just an example, so whatever you see fit. I am just curious what made you make this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that currently remaining samples are not written at all - they are only written when at least max_num_samples_per_file samples are reached during training. I.e. we don't flush remaining results when training is over and I didn't see a great way to make that happen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is pretty bad. please add a Note or TODO in dataset_writer.py
this is another case where we are using a workaround to fix underlying problems. and per eric's message, we shouldn't do this long-term.
let's intercept a signal somewhere and flush the last shard before things go down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that's what I mentioned in my first comment to this PR. Ok, will add a todo in the file. I've tried to flush on __del__ but that doesn't seem to work with actors, so no quick fix. Probably we need to call some kind of deconstructor/cleanup method manually

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it all comes back. I didn't understand what you meant at the time :)
cool, interesting topic ...

@@ -1,7 +1,7 @@
# To generate training data, first run:
# $ ./train.py --run=PPO --env=CartPole-v0 \
# --stop='{"timesteps_total": 50000}' \
# --config='{"output": "dataset", "output_config": {"type": "json", "path": "/tmp/out"}, "batch_mode": "complete_episodes"}'
# --config='{"output": "dataset", "output_config": {"format": "json", "path": "/tmp/out", "max_num_samples_per_file": 1}, "batch_mode": "complete_episodes"}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but still, why write 1 episode per file? :)
the default half-cheetah json dataset is a single file of maybe a few thousand episodes.
is there any efficiency consideration behind this?
anyways though, this is just an example, so whatever you see fit. I am just curious what made you make this change.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to merge btw.

@krfricke krfricke merged commit 262d612 into ray-project:master Mar 28, 2022
@krfricke krfricke deleted the rllib/dataset-format branch March 28, 2022 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants