Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" #4332

Merged
merged 1 commit into from
Mar 12, 2019

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Mar 12, 2019

What do these changes do?

There seems to be a bug here, isolated to the changes in vtrace.py. Reverting that file and patching up multi_from_logits to call from_logits fixes the regression. Unfortunately I don't have time to investigate this further so we should revert this change and re-land it once the bug is found.

Interestingly, vtrace.py still passes its original unit tests.

The regression is fairly straightforward to reproduce: on any GPU machine run:

atari-impala:
    env:
        grid_search:
            - BreakoutNoFrameskip-v4
    run: IMPALA
    config:
        sample_batch_size: 50
        train_batch_size: 500
        num_workers: 32
        num_envs_per_worker: 5
        clip_rewards: True
        lr_schedule: [
            [0, 0.0005],
            [20000000, 0.000000000001],
        ]

Within a few minutes it gets to 1m+ timesteps. Before the regression, you can expect 10+ reward at this point. After the regression, the reward will top out at 1-2.

Related issue number

Atari performance regression originally reported in #4329

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/36/
Test PASSed.

@ericl
Copy link
Contributor Author

ericl commented Mar 12, 2019

First time build. Skipping changelog.
[Ray-Perf-Integration-PRB] $ /bin/bash /tmp/hudson4942004597160488522.sh
PR 4332 does not contain ray-core, early exiting
Test PASSed.

@robertnishihara is this misdetection due to a revert or is it some other bug?

@pcmoritz
Copy link
Contributor

@ericl This is the new performance testing jenkins, which only kicks in if the PR string contains ray-core afaik (@devin-petersohn), i.e. if something performance-related has changed. ray-core should probably be changed to core.

@ericl
Copy link
Contributor Author

ericl commented Mar 12, 2019

Ah ok, didn't realize it had a different name.

log_rhos = get_log_rhos(target_action_log_probs,
behaviour_action_log_probs)

log_rhos = target_action_log_probs - behaviour_action_log_probs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential source of the bug is if this guy has a wrong shape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the shapes here and it looked reasonable (at least they match).

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12765/
Test FAILed.

@ericl ericl merged commit 3c41cb9 into ray-project:master Mar 12, 2019
stefanpantic added a commit to wingman-ai/ray that referenced this pull request Mar 12, 2019
@stefanpantic stefanpantic mentioned this pull request Mar 12, 2019
@stefanpantic
Copy link
Contributor

@ericl Hi Eric, I think I fixed the issue, I've opened a pull request here

ericl pushed a commit that referenced this pull request Mar 13, 2019
* Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)"

This reverts commit 3c41cb9.

* Fix a bug with log rhos for vtrace

* Reformat

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants