[train+data] Remove `preprocessor` argument from trainers #43146

justinvyu · 2024-02-13T22:30:32Z

Why are these changes needed?

This PR removes the hard-deprecated preprocessor argument fully. This is a follow-up to #38640.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Justin Yu <[email protected]>

justinvyu · 2024-02-13T23:39:49Z

python/ray/train/base_trainer.py

+            # Evaluate datasets if they are wrapped in a factory.
+            trainer.datasets = {
+                k: d() if callable(d) else d for k, d in self.datasets.items()
+            }
+
            trainer.setup()
-            trainer.preprocess_datasets()
            trainer.training_loop()


preprocess_datasets used to do this "evaluation" of the dataset factory function, plus any dataset preprocessing from Trainer.preprocessor. This interface is not needed so I removed it, and brought the evaluation logic out.

matthewdeng

love cleaning up code 👌

python/ray/train/lightgbm/lightgbm_trainer.py

Signed-off-by: Justin Yu <[email protected]>

…ve_preprocessor Signed-off-by: Justin Yu <[email protected]>

bveeramani

LGTM

justinvyu added 5 commits February 13, 2024 14:06

remove preprocessor arg + preprocess_datasets method in base trainer

8400a62

Signed-off-by: Justin Yu <[email protected]>

remove preprocessor arg in remaining places

5608774

Signed-off-by: Justin Yu <[email protected]>

remove legacy preprocessing method

80a612b

Signed-off-by: Justin Yu <[email protected]>

remove unused preprocessor imports

4543c3a

Signed-off-by: Justin Yu <[email protected]>

update dp trainer docstring

d4971c7

Signed-off-by: Justin Yu <[email protected]>

justinvyu requested review from raulchen, matthewdeng and woshiyyya February 13, 2024 22:30

justinvyu assigned matthewdeng and woshiyyya Feb 13, 2024

fix failing test

8592001

Signed-off-by: Justin Yu <[email protected]>

justinvyu commented Feb 13, 2024

View reviewed changes

matthewdeng approved these changes Feb 13, 2024

View reviewed changes

python/ray/train/lightgbm/lightgbm_trainer.py Outdated Show resolved Hide resolved

justinvyu added 3 commits February 13, 2024 22:36

merge the common setup in gbdt trainer

26c67bc

Signed-off-by: Justin Yu <[email protected]>

fix lint

4264674

Signed-off-by: Justin Yu <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into remo…

2c97b99

…ve_preprocessor Signed-off-by: Justin Yu <[email protected]>

c21 assigned bveeramani Feb 14, 2024

raulchen approved these changes Feb 14, 2024

View reviewed changes

bveeramani approved these changes Feb 14, 2024

View reviewed changes

justinvyu merged commit a79f18d into ray-project:master Feb 14, 2024
9 checks passed

justinvyu deleted the remove_preprocessor branch February 14, 2024 22:22

justinvyu mentioned this pull request Feb 16, 2024

[train] Fix regression where large Trainer attributes get serialized along with actor class #43234

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train+data] Remove `preprocessor` argument from trainers #43146

[train+data] Remove `preprocessor` argument from trainers #43146

justinvyu commented Feb 13, 2024

justinvyu Feb 13, 2024 •

edited

Loading

matthewdeng left a comment

bveeramani left a comment

[train+data] Remove preprocessor argument from trainers #43146

[train+data] Remove preprocessor argument from trainers #43146

Conversation

justinvyu commented Feb 13, 2024

Why are these changes needed?

Related issue number

Checks

justinvyu Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

matthewdeng left a comment

Choose a reason for hiding this comment

bveeramani left a comment

Choose a reason for hiding this comment

[train+data] Remove `preprocessor` argument from trainers #43146

[train+data] Remove `preprocessor` argument from trainers #43146

justinvyu Feb 13, 2024 •

edited

Loading