Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tune] [Train] Only convert a BaseTrainer to Trainable once in the Tuner #30355

Merged
merged 6 commits into from
Nov 17, 2022

Conversation

justinvyu
Copy link
Contributor

Signed-off-by: Justin Yu [email protected]

Why are these changes needed?

When using Tune with Train (Tuner(trainer)), the trainer is currently being converted to a Tune Trainable multiple times. This is possibly expensive, since BaseTrainer.as_trainable will put all of its config (which could contain a large checkpointed model in resume_from_checkpoint) into the object store from a call to tune.with_parameters. This PR makes it so that the conversion only happens once.

Related issue number

Closes #30321

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@justinvyu justinvyu added tune Tune-related issues train Ray Train Related Issue labels Nov 16, 2022
@justinvyu justinvyu requested a review from Yard1 November 16, 2022 21:26
@Yard1
Copy link
Member

Yard1 commented Nov 16, 2022

Thanks, this is great. To be extra safe, I'd propose to consider using properties to link _trainable and _converted_trainable together:

    @property
    def _trainable(self):
        return self.__trainable

    @property
    def _converted_trainable(self):
        return self.__converted_trainable

    @_trainable.setter
    def _trainable(self, value):
        self.__trainable = value
        self.__converted_trainable = self._convert_trainable(self.__trainable)

(and change the keys in constants to account for extra underscores)

My worry (even if that's a very unlikely situation) is that we can end up with a situation where _trainable has been changed but the cached _converted_trainable is still being used.

Let me know what you think!

@justinvyu
Copy link
Contributor Author

This makes sense, I'll go ahead and make the change!

@justinvyu
Copy link
Contributor Author

One thing that I had to change was:

  • Python automatically adds the class name to double underscore attributes when accessing the __dict__. So, the __trainable attribute would have the key "_TunerInternal__trainable" in __getstate__. Then, the key to pop the attribute would be "_TunerInternal__trainable", which is a bit confusing.
  • I changed the properties to self.trainable and self.converted_trainable instead, with _trainable and _converted_trainable as the private variables.

Copy link
Member

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@amogkam amogkam merged commit 161de0b into ray-project:master Nov 17, 2022
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
…e Tuner (ray-project#30355)

When using Tune with Train (Tuner(trainer)), the trainer is currently being converted to a Tune Trainable multiple times. This is possibly expensive, since BaseTrainer.as_trainable will put all of its config (which could contain a large checkpointed model in resume_from_checkpoint) into the object store from a call to tune.with_parameters. This PR makes it so that the conversion only happens once.

Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
train Ray Train Related Issue tune Tune-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Tune + Train] AIR Trainers currently get converted to Trainables multiple times unnecessarily
3 participants