-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR][GPU Batch Prediction] Add basic support for GPU batch prediction #26251
Conversation
output = self.model(model_input) | ||
|
||
def untensorize(torch_tensor): | ||
numpy_array = torch_tensor.cpu().detach().numpy() | ||
numpy_array = torch_tensor.detach().cpu().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is minor nit improvement so we don't construct autograd edge on cpu then detach it. End result is the same.
self.use_gpu = use_gpu | ||
if use_gpu: | ||
# Ensure input tensor and model live on GPU for GPU inference | ||
self.model.to(torch.device("cuda")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model init and config is moved to constructor rather than upon every predict() call.
with tf.device("GPU:0"): | ||
self.model = self.model_definition() | ||
else: | ||
self.model = self.model_definition() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model init and config is moved to constructor rather than upon every predict() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main comment is around edge case coverage: if a GPU is available and use_gpu=False, warn the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jiaodong! Overall lgtm, but let's make sure to follow up on automatically setting use_gpu
with PredictorDeployment
!
.buildkite/pipeline.gpu.large.yml
Outdated
@@ -39,6 +39,8 @@ | |||
- DATA_PROCESSING_TESTING=1 TRAIN_TESTING=1 TUNE_TESTING=1 ./ci/env/install-dependencies.sh | |||
- pip install -Ur ./python/requirements_ml_docker.txt | |||
- bazel test --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=gpu python/ray/air/... | |||
- pip install -U horovod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah instead of manually installing here you can set the INSTALL_HOROVOD=1
env var for install-dependencies.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice let me change it
"//python/ray/air:__pkg__", | ||
"//python/ray/air:__subpackages__", | ||
"//python/ray/train:__pkg__", | ||
"//python/ray/train:__subpackages__", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding, what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i noticed that ray train tests are actually using modules from air folder, such as the test_horovod
. So I setup both ML and AIR bazel targets to be visible to each other so tests and directly import and use if needed.
@jiaodong actually looks like the lint failure is related. This PR removes |
ray-project#26251) This PR adds GPU support for pytorch and tensorflow predictor, as well as automatic setting `use_gpu` flag in `BatchPredictor`. Notable changes: - Added `use_gpu` flag in the constructor of `TorchPredictor` and `TensorflowPredictor` (note it's slightly different from our latest design doc that puts this flag at `predict()` call) - Added `use_gpu` flag to `SklearnPredictor` so its interface is compatible with `BatchPredictor` - Code to move both model weights and input tensor to default visible GPU at index 0 if flag is set - parametrized existing predictor tests to use GPU for both CPU & GPU coverage - Changed BUILD CI tests with an added `gpu` tag (I'm not 100% sure if that's a right way tho) Follow ups: ray-project#26249 is created in case our host has multiple GPU devices. It's a bit out of scope for this PR, but for GPU batch inference ideally we should be able to evenly use all GPU devices on host where CPU & DRAM are busy with pre-fetching + data movement to GPU. We might approximately do the same by scheduling same # of Predictor instances on the host, but that's worth verifying once benchmarks are set. Signed-off-by: Stefan van der Kleij <[email protected]>
Why are these changes needed?
This PR adds GPU support for pytorch and tensorflow predictor, as well as automatic setting
use_gpu
flag inBatchPredictor
.Notable changes:
use_gpu
flag in the constructor ofTorchPredictor
andTensorflowPredictor
(note it's slightly different from our latest design doc that puts this flag atpredict()
call)use_gpu
flag toSklearnPredictor
so its interface is compatible withBatchPredictor
gpu
tag (I'm not 100% sure if that's a right way tho)Follow ups:
#26249 is created in case our host has multiple GPU devices. It's a bit out of scope for this PR, but for GPU batch inference ideally we should be able to evenly use all GPU devices on host where CPU & DRAM are busy with pre-fetching + data movement to GPU. We might approximately do the same by scheduling same # of Predictor instances on the host, but that's worth verifying once benchmarks are set.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.