Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIR] TensorflowPredictor doesn't create model weights #25125

Closed
bveeramani opened this issue May 24, 2022 · 0 comments · Fixed by #25136
Closed

[AIR] TensorflowPredictor doesn't create model weights #25125

bveeramani opened this issue May 24, 2022 · 0 comments · Fixed by #25136
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks
Milestone

Comments

@bveeramani
Copy link
Member

What happened + What you expected to happen

I tried classifying an image, but my program errored.

❯ python module.py
2022-05-23 21:59:07.729578: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-23 21:59:10,010 INFO services.py:1478 -- View the Ray dashboard at http://127.0.0.1:8265
(raylet) E0523 21:59:11.602878000 4562591232 fork_posix.cc:76]                  Other threads are currently calling into gRPC, skipping fork() handlers
2022-05-23 21:59:12,088 WARNING read_api.py:252 -- The number of blocks in this dataset (4) limits its parallelism to 4 concurrent tasks. This is much less than the number of available CPU slots in the cluster. Use `.repartition(n)` to increase the number of dataset blocks.
[dataset]: Run `pip install tqdm` to enable progress reporting.
(BlockWorker pid=7430) 2022-05-23 21:59:16.252786: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
(BlockWorker pid=7430) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "module.py", line 40, in <module>
    batch_predictor.predict(dataset)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/ml/batch_predictor.py", line 93, in predict
    return data.map_batches(
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/dataset.py", line 332, in map_batches
    return Dataset(plan, self._epoch, self._lazy)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/dataset.py", line 140, in __init__
    self._plan.execute(allow_clear_input_blocks=False)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/impl/plan.py", line 257, in execute
    blocks, stage_info = stage(blocks, clear_input_blocks)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/impl/plan.py", line 436, in __call__
    blocks = compute._apply(
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/impl/compute.py", line 266, in _apply
    new_metadata = ray.get(new_metadata)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/worker.py", line 1843, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::BlockWorker.map_block_nosplit() (pid=7430, ip=127.0.0.1, repr=<ray.data.impl.compute.BlockWorker object at 0x194a80490>)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/impl/compute.py", line 185, in map_block_nosplit
    return _map_block_nosplit(block, fn, input_files)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/impl/compute.py", line 341, in _map_block_nosplit
    for new_block in fn(block):
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/dataset.py", line 308, in transform
    applied = fn(view)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/data/impl/compute.py", line 300, in _fn
    return ray.data._cached_fn(item)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/ml/batch_predictor.py", line 83, in __call__
    return self.predictor.predict(batch, **predict_kwargs)
  File "/private/tmp/.venv/lib/python3.8/site-packages/ray/ml/predictors/integrations/tensorflow/tensorflow_predictor.py", line 170, in predict
    model.set_weights(self.model_weights)
  File "/private/tmp/.venv/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1614, in set_weights
    params = self.weights
  File "/private/tmp/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 2829, in weights
    return self._dedup_weights(self._undeduplicated_weights)
  File "/private/tmp/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 2834, in _undeduplicated_weights
    self._assert_weights_created()
  File "/private/tmp/.venv/lib/python3.8/site-packages/keras/engine/sequential.py", line 472, in _assert_weights_created
    super(functional.Functional, self)._assert_weights_created()  # pylint: disable=bad-super-call
  File "/private/tmp/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 3027, in _assert_weights_created
    raise ValueError(f'Weights for model {self.name} have not yet been '
ValueError: Weights for model sequential_3 have not yet been created. Weights are created when the Model is first called on inputs or `build()` is called with an `input_shape`.

This error only occurs if you patch #25124 first!

Versions / Dependencies

Ray: 4444150
Python: 3.8.12
OS: MacOS

Reproduction script

import ray
from ray.ml.predictors.integrations.tensorflow import TensorflowPredictor
import tensorflow as tf
from tensorflow.keras import models, layers
from ray.ml.batch_predictor import BatchPredictor
from ray.ml.checkpoint import Checkpoint

batch_size = 4
height = 32
width = 32
num_channels = 3
num_classes = 10

def build_model():
    model = models.Sequential()

    model.add(layers.Lambda(lambda tensor: tf.squeeze(tensor, axis=1)))

    model.add(layers.Conv2D(6, (5, 5), activation='relu', input_shape=(height, width, num_channels)))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(16, (5, 5), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(120, activation='relu'))
    model.add(layers.Dense(84, activation='relu'))
    model.add(layers.Dense(num_classes))
    return model

model = build_model()
model.build(input_shape=(0, 1, 32, 32, 3))
checkpoint = Checkpoint.from_dict({"model": model.get_weights()})

batch_predictor = BatchPredictor.from_checkpoint(
    checkpoint=checkpoint,
    predictor_cls=TensorflowPredictor,
    model_definition=build_model,
)

dataset = ray.data.range_tensor(batch_size, shape=(1, height, width, num_channels))
batch_predictor.predict(dataset)

Issue Severity

High: It blocks me from completing my task.

@bveeramani bveeramani added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) air labels May 24, 2022
@bveeramani bveeramani changed the title [AIR] TorchPredictor doesn't create model weights [AIR] TensorflowPredicotr doesn't create model weights May 24, 2022
@bveeramani bveeramani changed the title [AIR] TensorflowPredicotr doesn't create model weights [AIR] TensorflowPredictor doesn't create model weights May 24, 2022
@bveeramani bveeramani self-assigned this May 24, 2022
@bveeramani bveeramani added this to the Ray AIR milestone May 24, 2022
amogkam added a commit that referenced this issue May 26, 2022
`TensorflowPredictor.predict` doesn't work right now. For more information, see #25125.

Co-authored-by: Amog Kamsetty <[email protected]>
@hora-anyscale hora-anyscale added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants