Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] add_time_dimension in TF loses the shape of the input layer when the batch size isn't specified #23897

Closed
dolphingarlic opened this issue Apr 13, 2022 · 2 comments · Fixed by #24006
Assignees
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical repro-script-confirmed rllib RLlib related issues

Comments

@dolphingarlic
Copy link

What happened + What you expected to happen

I'm trying to add a time dimension to an input layer for use in an LSTM layer. The batch size of this input layer isn't specified because it changes with time, so the shape is (?, 64). The expected shape of the layer after adding the time dimension is (?, ?, 64), but the actual shape that is returned is (?, ?, ?). This causes an error when I pass the new layer into an LSTM layer.

This bug doesn't happen in Ray 1.10.0, but it does in Ray 1.11.0. It seems to be caused by this change: 9e6b871 (to rllib/policy/rnn_sequencing.py)

Versions / Dependencies

This bug happens in Ray 1.11.0, but not in 1.10.0. This happens on all OSes that I've tried (Ubuntu, MacOS, and Windows). I'm using Python 3.7.10.

Reproduction script

Run the following in a Python REPL:

from ray.rllib.models.utils import try_import_tf
from ray.rllib.policy.rnn_sequencing import add_time_dimension

tf1, tf, tfv = try_import_tf()

LSTM_CELL_SIZE = 64
merge_feat = tf.keras.layers.Input(shape=(LSTM_CELL_SIZE,), name='merge_feat', dtype=tf.int32)
seq_in = tf.keras.layers.Input(shape=(), name='seq_in', dtype=tf.int32)
max_seq_len = tf.shape(merge_feat)[0] // tf.shape(seq_in)[0]
lstm_in = add_time_dimension(merge_feat, max_seq_len=max_seq_len, framework='tf')
lstm_in

In Ray 1.11.0, the output is

<tf.Tensor 'Reshape:0' shape=(?, ?, ?) dtype=int32>

While in Ray 1.10.0, the (correct) output is

<tf.Tensor 'Reshape:0' shape=(?, ?, 64) dtype=int32>

Issue Severity

High: It blocks me from completing my task.

@dolphingarlic dolphingarlic added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 13, 2022
@sven1977 sven1977 added rllib RLlib related issues repro-script-confirmed P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 19, 2022
@sven1977 sven1977 self-assigned this Apr 19, 2022
@sven1977
Copy link
Contributor

Hey @dolphingarlic , thanks for filing this issue. I can confirm your reproduction script. ...

@sven1977
Copy link
Contributor

See the above fix-it PR, which should be merged soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical repro-script-confirmed rllib RLlib related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants