[RLlib] Make sure SlateQ works with GPU. #22738

gjoliver · 2022-03-01T20:12:24Z

Why are these changes needed?

Create models and variables on proper device so SlateQ works with GPU training.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- [*] Release tests
- This PR is not tested :(

…with GPU training.

sven1977 · 2022-03-02T14:47:00Z

rllib/agents/slateq/slateq_torch_policy.py

@@ -41,14 +41,15 @@ def build_slateq_model_and_distribution(
    Returns:
        Tuple consisting of 1) Q-model and 2) an action distribution class.
    """
+    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")


Actually, the TorchPolicy will take care of all this.

I think we only have to make sure all tensors in the loss function are on the right device. ...

ok, I understand everything now.
it's the target_model that was the issue. we just need to use the correct target_model out of policy.target_models for things to work.
thanks.

Ah, yeah, sorry, I should have thought about this. Yes, you can always do:

correct_target_model_to_use = policy.target_models[model] ...

sven1977 · 2022-03-02T14:48:46Z

rllib/agents/slateq/slateq_torch_policy.py

@@ -154,7 +155,7 @@ def build_slateq_losses(

    clicked = torch.sum(click_indicator, dim=1)
    mask_clicked_slates = clicked > 0
-    clicked_indices = torch.arange(batch_size)
+    clicked_indices = torch.arange(batch_size).to(policy.device)


Here, this seems (almost) correct.

clicked_indices = torch.arange(batch_size).to(clicked.device). # <- some tensor that we know is already on one of the GPU.

yep. this was actually how I did it originally. :)
updated.

sven1977 · 2022-03-02T14:49:10Z

rllib/agents/slateq/slateq_torch_policy.py

@@ -320,7 +321,10 @@ def score_documents(
        torch.multiply(user_obs.unsqueeze(1), torch.stack(doc_obs, dim=1)), dim=2
    )
    # Compile a constant no-click score tensor.
-    score_no_click = torch.full(size=[user_obs.shape[0], 1], fill_value=no_click_score)
+    # Make sure it lives on the same device as scores_per_candidate.
+    score_no_click = torch.full(


looks great.

sven1977 · 2022-03-02T14:50:55Z

rllib/agents/slateq/slateq_torch_policy.py

+    # [1, AxS] Useful for torch.take_along_dim()
+    policy.slates_indices = policy.slates.reshape(-1).unsqueeze(0).to(policy.device)
+
+    setup_mixins(policy)


setup_late_mixins()

??

ah, reverted. I was trying to move policy.slates_indices to the correct device during late_setup.
but I am doing this the correct way now.

sven1977

Looks great! Thanks for this important fix @gjoliver !

[RLlib] Create models and variables on proper device so SlateQ works …

f4cd2a5

…with GPU training.

gjoliver requested review from sven1977 and avnishn as code owners March 1, 2022 20:12

Jun Gong added 3 commits March 1, 2022 12:13

lint

0ac1476

clean up

d9c1dd9

update doc

48bbfcf

sven1977 reviewed Mar 2, 2022

View reviewed changes

sven1977 changed the title ~~[RLlib] make sure SlateQ works with GPU~~ [RLlib] Make sure SlateQ works with GPU. Mar 2, 2022

Jun Gong added 2 commits March 2, 2022 21:04

address review comments.

bfd514b

lint

a689b9a

sven1977 approved these changes Mar 4, 2022

View reviewed changes

sven1977 merged commit e765915 into ray-project:master Mar 4, 2022

avnishn mentioned this pull request Mar 24, 2022

[RLlib] Pin Gym Everywhere and turn off gpu for recsim tests #23452

Merged

6 tasks

gjoliver deleted the slateq_gpu branch April 1, 2022 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Make sure SlateQ works with GPU. #22738

[RLlib] Make sure SlateQ works with GPU. #22738

gjoliver commented Mar 1, 2022

sven1977 Mar 2, 2022

gjoliver Mar 3, 2022

sven1977 Mar 4, 2022

sven1977 Mar 2, 2022

gjoliver Mar 3, 2022

sven1977 Mar 2, 2022

sven1977 Mar 2, 2022

gjoliver Mar 3, 2022

sven1977 left a comment

[RLlib] Make sure SlateQ works with GPU. #22738

[RLlib] Make sure SlateQ works with GPU. #22738

Conversation

gjoliver commented Mar 1, 2022

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment