[air] Accessors for preprocessor in Predictor class #26600

ericl · 2022-07-15T18:31:53Z

Why are these changes needed?

There is currently no way to get/set a preprocessor on a BatchPredictor / Predictor once created. This makes it very difficult to figure out how to pass a custom preprocessor to a predictor.

Changes

Added Checkpoint.get_preprocessor() to unify Preprocessor retrieval logic.
1. This could be further optimized. Ideally we can unify dict & dir checkpoints, but a short-term optimization would be to check if the original format is dict or dir so that we don't need to check both (which may incur extra conversions).
Added Predictor.get_preprocessor() and Predictor.set_preprocessor().
Added BatchPredictor.get_preprocessor() and BatchPredictor.set_preprocessor().

Related issue number

Closes #26528

TODO:

Unit tests

Signed-off-by: Eric Liang <[email protected]>

amogkam · 2022-07-17T20:59:58Z

python/ray/train/batch_predictor.py

+            return self._override_preprocessor
+
+        checkpoint_data = self._checkpoint.to_dict()
+        preprocessor = checkpoint_data.get(PREPROCESSOR_KEY)


Just fyi This implementation currently won’t work for gbdt predictors as they use directory Checkpoints not dict checkpoints

What’s the user journey that we are trying to solve by introducing these apis?

It seems by doing this we are making assumptions about the underlying checkpoint structure (assuming there is a PREPROCESSOR_KEY in the checkpoint). So these will only work for checkpoints outputted by our trainers and not for the general case. BatchPredictor will no longer work with any predictor and any checkpoint. I’m wondering if there is an alternative solution here to resolve the user problem.

If it’s just the ability to override the preprocessor, then we currently have other mechanisms to do this and it might just be a matter of documenting them better

(moved to slack)

@matthewdeng this is the tricky issue

Added Checkpoint.get_preprocessor() to unify this logic, though it's not optimized in this PR.

Signed-off-by: Matthew Deng <[email protected]>

matthewdeng · 2022-07-19T05:56:48Z

python/ray/train/lightgbm/lightgbm_predictor.py

@@ -42,7 +42,8 @@ def from_checkpoint(cls, checkpoint: Checkpoint) -> "LightGBMPredictor":
                ``LightGBMTrainer`` run.

        """
-        bst, preprocessor = load_checkpoint(checkpoint)
+        bst, _ = load_checkpoint(checkpoint)


Leaving this line here for now as load_checkpoint will be removed with #26651

matthewdeng · 2022-07-19T06:00:48Z

python/ray/train/tests/test_predictor.py

-    # Ensure the proper conversion functions are called.
-    convert_to_pandas_mock.assert_called_once()
-    convert_from_pandas_mock.assert_called_once()


Removing these mocks since they cause the prediction logic to no longer be tested. Didn't seem valuable enough of a test to create a second test that just tests the call logic.

Hmm I wrote the mocks after our previous discussions on not writing integration tests for everything.

If we make sure the conversion functions are called and since the conversion functions are also tested, then we don’t need to write tests for predictor against all possible input and output types. This was the original intention but the mocks may not have been the best way to achieve this- do you have any other suggestions to not lose this test coverage?

Ah gotcha. I think in that case I can extend the mocks to return concrete values and validate the inputs/outputs.

amogkam

As discussed offline, what does the developer api look like?

with framework specific checkpoints we should make the base Checkpoint class a developer api, right? This may be out of scope for this PR, so we can do this in a follow up as well!

amogkam · 2022-07-19T06:56:43Z

python/ray/train/tests/test_predictor.py

-    # Ensure the proper conversion functions are called.
-    convert_to_pandas_mock.assert_called_once()
-    convert_from_pandas_mock.assert_called_once()


Hmm I wrote the mocks after our previous discussions on not writing integration tests for everything.

If we make sure the conversion functions are called and since the conversion functions are also tested, then we don’t need to write tests for predictor against all possible input and output types. This was the original intention but the mocks may not have been the best way to achieve this- do you have any other suggestions to not lose this test coverage?

amogkam · 2022-07-19T09:17:19Z

Actually nvm Checkpoint will still be exposed to Tune users who use trainables directly and not any of the trainers.

Signed-off-by: Matthew Deng <[email protected]>

ericl · 2022-07-19T20:19:57Z

with framework specific checkpoints we should make the base Checkpoint class a developer api, right? This may be out of scope for this PR, so we can do this in a follow up as well!

The base class should remain a PublicAPI since all the methods are still publicly accessible. Futhermore, you are always allowed to load a checkpoint generically using the base class.

amogkam · 2022-07-19T20:48:12Z

Agreed for Checkpoint let's keep it as PublicAPI since it will still be exposed to users who are not using any of the Trainers.

The base class should remain a PublicAPI since all the methods are still publicly accessible.

For this case, the pattern that we have been following is the class is DeveloperAPI, but individual methods can be denoted as PublicAPI. This is what we do for BaseTrainer for example.

Signed-off-by: Rohan138 <[email protected]>

Signed-off-by: Stefan van der Kleij <[email protected]>

ericl added 3 commits July 15, 2022 11:31

update prep

e2c0a97

Update batch_predictor.py

4da71b1

Update batch_predictor.py

a125227

ericl mentioned this pull request Jul 15, 2022

[AIR] Execute GPU inference in a separate stage in BatchPredictor #26616

Merged

ericl added 3 commits July 16, 2022 17:31

Update predictor.py

291857f

Merge remote-tracking branch 'upstream/master' into update-prep

e12b85f

Signed-off-by: Eric Liang <[email protected]>

Merge branch 'update-prep' of github.com:ericl/ray into update-prep

6fdde42

richardliaw self-assigned this Jul 17, 2022

amogkam reviewed Jul 17, 2022

View reviewed changes

ericl mentioned this pull request Jul 17, 2022

[RFC] Eliminate free-floating "similar by convention" util functions from Train #26651

Closed

matthewdeng added 5 commits July 18, 2022 18:02

add Checkpoint.get_preprocessor

668f564

Signed-off-by: Matthew Deng <[email protected]>

update predictor implementations to use new API

e2dd7c6

Signed-off-by: Matthew Deng <[email protected]>

add tests

2a1ca7b

Signed-off-by: Matthew Deng <[email protected]>

add method docs

b7eec56

Signed-off-by: Matthew Deng <[email protected]>

fix attr

5fd7833

Signed-off-by: Matthew Deng <[email protected]>

matthewdeng reviewed Jul 19, 2022

View reviewed changes

matthewdeng changed the title ~~[WIP] Accessors for preprocessor in Predictor class~~ [air] Accessors for preprocessor in Predictor class Jul 19, 2022

matthewdeng assigned ericl and amogkam Jul 19, 2022

amogkam reviewed Jul 19, 2022

View reviewed changes

matthewdeng added 3 commits July 19, 2022 10:16

fix tests

f781fc8

Signed-off-by: Matthew Deng <[email protected]>

fix tests

4833b3e

Signed-off-by: Matthew Deng <[email protected]>

lint

ac1ce46

Signed-off-by: Matthew Deng <[email protected]>

ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 19, 2022

richardliaw approved these changes Jul 19, 2022

View reviewed changes

richardliaw merged commit b0eb051 into ray-project:master Jul 19, 2022

Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022

[air] Accessors for preprocessor in Predictor class (ray-project#26600)

c0b0127

Signed-off-by: Rohan138 <[email protected]>

Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022

[air] Accessors for preprocessor in Predictor class (ray-project#26600)

7ca5409

Signed-off-by: Stefan van der Kleij <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[air] Accessors for preprocessor in Predictor class #26600

[air] Accessors for preprocessor in Predictor class #26600

ericl commented Jul 15, 2022 •

edited by matthewdeng

Loading

amogkam Jul 17, 2022

amogkam Jul 17, 2022

amogkam Jul 17, 2022

richardliaw Jul 17, 2022

ericl Jul 18, 2022

matthewdeng Jul 19, 2022

matthewdeng Jul 19, 2022

matthewdeng Jul 19, 2022

amogkam Jul 19, 2022 •

edited

Loading

matthewdeng Jul 19, 2022

amogkam left a comment •

edited

Loading

amogkam Jul 19, 2022 •

edited

Loading

amogkam commented Jul 19, 2022

ericl commented Jul 19, 2022

amogkam commented Jul 19, 2022

[air] Accessors for preprocessor in Predictor class #26600

[air] Accessors for preprocessor in Predictor class #26600

Conversation

ericl commented Jul 15, 2022 • edited by matthewdeng Loading

Why are these changes needed?

Changes

Related issue number

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amogkam Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amogkam left a comment • edited Loading

Choose a reason for hiding this comment

amogkam Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

amogkam commented Jul 19, 2022

ericl commented Jul 19, 2022

amogkam commented Jul 19, 2022

ericl commented Jul 15, 2022 •

edited by matthewdeng

Loading

amogkam Jul 19, 2022 •

edited

Loading

amogkam left a comment •

edited

Loading

amogkam Jul 19, 2022 •

edited

Loading