[AIR] Add Scaling Config validation #23889

Yard1 · 2022-04-13T10:48:47Z

Why are these changes needed?

Adds a ScalingConfigDataClass.validate_config classmethod to allow for a generic way of validating ScalingConfigs by allowing only certain keys.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

krfricke · 2022-04-13T13:34:29Z

python/ray/ml/config.py

+            scaling_config_arg_name: Name of the ScalingConfig argument to be used
+                in the exception message.
+            exc_obj_name: Name of the object calling this method to be used
+                in the exception message.


As mentioned in the previous PR, let's remove these arguments. The enclosing method can re-raise the exception so that the affected key names are in the stack trace, but we should avoid passing strings just for raising error messages

I wonder if we can return a list of bad keys instead, so that the enclosing method can raise an exception with a nice message. I think a single exception would be better from the user perspective than a long stack trace

I would be open to raise a custom exception here that stores the keys as an attribute or a hint on how to resolve things.

ray.air.exceptions.ConfigError or so

Yard1 · 2022-04-13T16:56:16Z

@krfricke updated, PTAL

python/ray/ml/config.py

amogkam

Thanks @Yard1, this looks great so far! I think we should also do some more refactoring to make the allowed keys explicit for each Trainer

python/ray/ml/config.py

ericl

+1 on @amogkam 's suggestions. I think you want this end state right?

class Trainer:
   _scaling_config_supported_keys = []  # Not scalable by default
   
   @classmethod
   def validate_config(scaling_config):
       ... default validator ...

class XGBoostTrainer:
   _scaling_config_supported_keys = ["num_workers", "use_gpu"]

I'd also like to see this PR show the whole integration path with Trainer, rather than adding a utility function without showing how works with other classes. This will make it easier to review the interfaces rather than the implementation.

python/ray/ml/config.py

python/ray/ml/tests/test_api.py

…g-validate

krfricke

I've updated the PR to address the comments from the last feedback round:

The validation method has been split up in a dict and dataclass part
It has been moved to a utility function and generalized to arbitrary dataclass and dict objects
The Trainer now has a private method to return a validate scaling config dataclass
The code is being used in the DP/GBDP trainer classes.

I'll also update the sklearn trainer to utilize this. PTAL

Yard1

Looks great, thanks @krfricke!

python/ray/ml/trainer.py

matthewdeng · 2022-04-18T16:11:24Z

python/ray/ml/train/data_parallel_trainer.py

+        scaling_config_dataclass = self._validate_and_get_scaling_config_data_class(
+            self.scaling_config
+        )


Should this check be in Trainer itself?

I think it's fine, we already call it in Trainer as well (in as_trainable). This is more of a way to get the dataclass inside the training loop

amogkam

LGTM! Thanks for making the changes!

amogkam · 2022-04-19T18:42:52Z

Needs an approval from @krfricke before we can merge.

Implements `SklearnTrainer` and `SklearnPredictor`. Full parallelism with joblib + support for GPU enabled estimators like cuML. Interface has been modified slightly by addition of several arguments, which were required for full functionality. I haven't tested cuML yet, will do it later. Depends on #23889 Co-authored-by: Kai Fricke <[email protected]>

[AIR] Add ScalingConfigDataClass.validate_config

da2e551

Yard1 added this to the Ray AIR milestone Apr 13, 2022

Yard1 requested review from ericl, matthewdeng, gjoliver, amogkam and krfricke April 13, 2022 10:48

Yard1 assigned ericl, matthewdeng, gjoliver, amogkam and krfricke Apr 13, 2022

Yard1 mentioned this pull request Apr 13, 2022

[AIR] SklearnTrainer&Predictor implementation #23850

Merged

6 tasks

krfricke requested changes Apr 13, 2022

View reviewed changes

Apply suggestions from code review

5c52e74

Yard1 requested a review from krfricke April 13, 2022 16:56

Tweak

1c92c17

gjoliver reviewed Apr 13, 2022

View reviewed changes

python/ray/ml/config.py Show resolved Hide resolved

python/ray/ml/config.py Outdated Show resolved Hide resolved

matthewdeng reviewed Apr 13, 2022

View reviewed changes