[Tune] [Train] Only convert a `BaseTrainer` to `Trainable` once in the Tuner #30355

justinvyu · 2022-11-16T21:26:10Z

Signed-off-by: Justin Yu [email protected]

Why are these changes needed?

When using Tune with Train (Tuner(trainer)), the trainer is currently being converted to a Tune Trainable multiple times. This is possibly expensive, since BaseTrainer.as_trainable will put all of its config (which could contain a large checkpointed model in resume_from_checkpoint) into the object store from a call to tune.with_parameters. This PR makes it so that the conversion only happens once.

Related issue number

Closes #30321

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Justin Yu <[email protected]>

…ert_trainable_caching

Yard1 · 2022-11-16T22:07:50Z

Thanks, this is great. To be extra safe, I'd propose to consider using properties to link _trainable and _converted_trainable together:

    @property
    def _trainable(self):
        return self.__trainable

    @property
    def _converted_trainable(self):
        return self.__converted_trainable

    @_trainable.setter
    def _trainable(self, value):
        self.__trainable = value
        self.__converted_trainable = self._convert_trainable(self.__trainable)

(and change the keys in constants to account for extra underscores)

My worry (even if that's a very unlikely situation) is that we can end up with a situation where _trainable has been changed but the cached _converted_trainable is still being used.

Let me know what you think!

justinvyu · 2022-11-16T22:21:47Z

This makes sense, I'll go ahead and make the change!

Signed-off-by: Justin Yu <[email protected]>

justinvyu · 2022-11-16T23:15:08Z

One thing that I had to change was:

Python automatically adds the class name to double underscore attributes when accessing the __dict__. So, the __trainable attribute would have the key "_TunerInternal__trainable" in __getstate__. Then, the key to pop the attribute would be "_TunerInternal__trainable", which is a bit confusing.
I changed the properties to self.trainable and self.converted_trainable instead, with _trainable and _converted_trainable as the private variables.

Yard1

LGTM, thanks!

…e Tuner (ray-project#30355) When using Tune with Train (Tuner(trainer)), the trainer is currently being converted to a Tune Trainable multiple times. This is possibly expensive, since BaseTrainer.as_trainable will put all of its config (which could contain a large checkpointed model in resume_from_checkpoint) into the object store from a call to tune.with_parameters. This PR makes it so that the conversion only happens once. Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Weichen Xu <[email protected]>

justinvyu added 5 commits November 10, 2022 09:41

Only convert the trainable once (if passing in BaseTrainer)

ef6ce50

Signed-off-by: Justin Yu <[email protected]>

Fix convert_trainable return type

34060f7

Signed-off-by: Justin Yu <[email protected]>

Always convert _trainable rather than taking in an argument

8dc02b8

Signed-off-by: Justin Yu <[email protected]>

Add a clarifying comment for get_converted_trainable

fc201ef

Signed-off-by: Justin Yu <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into conv…

b120e27

…ert_trainable_caching

justinvyu added tune Tune-related issues train Ray Train Related Issue labels Nov 16, 2022

justinvyu assigned amogkam and Yard1 Nov 16, 2022

justinvyu requested a review from Yard1 November 16, 2022 21:26

Tie together trainable and converted_trainable with property setter

bbc45d3

Signed-off-by: Justin Yu <[email protected]>

Yard1 approved these changes Nov 16, 2022

View reviewed changes

amogkam merged commit 161de0b into ray-project:master Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tune] [Train] Only convert a `BaseTrainer` to `Trainable` once in the Tuner #30355

[Tune] [Train] Only convert a `BaseTrainer` to `Trainable` once in the Tuner #30355

justinvyu commented Nov 16, 2022

Yard1 commented Nov 16, 2022

justinvyu commented Nov 16, 2022

justinvyu commented Nov 16, 2022

Yard1 left a comment

[Tune] [Train] Only convert a BaseTrainer to Trainable once in the Tuner #30355

[Tune] [Train] Only convert a BaseTrainer to Trainable once in the Tuner #30355

Conversation

justinvyu commented Nov 16, 2022

Why are these changes needed?

Related issue number

Checks

Yard1 commented Nov 16, 2022

justinvyu commented Nov 16, 2022

justinvyu commented Nov 16, 2022

Yard1 left a comment

Choose a reason for hiding this comment

[Tune] [Train] Only convert a `BaseTrainer` to `Trainable` once in the Tuner #30355

[Tune] [Train] Only convert a `BaseTrainer` to `Trainable` once in the Tuner #30355