You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I returned a list[list[dict[str, str]]] from my UDF. I expected Ray Data to implicitly convert my output to an ndarray, but I got an error instead.
If I explicitly cast my output to an array with create_ragged_ndarray, I don't get an error.
ray.exceptions.RayTaskError(ValueError): ray::MapBatches(HuggingFacePredictor)() (pid=67876, ip=127.0.0.1)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/execution/operators/actor_pool_map_operator.py", line 386, in submit
yield from _map_task(fn, ctx, *blocks)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/execution/operators/map_operator.py", line 389, in _map_task
for b_out in fn(iter(blocks), ctx):
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/execution/legacy_compat.py", line 311, in do_map
yield from block_fn(blocks, ctx, *fn_args, **fn_kwargs)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/planner/map_batches.py", line 109, in fn
yield from process_next_batch(batch)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/planner/map_batches.py", line 97, in process_next_batch
raise e from None
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/planner/map_batches.py", line 78, in process_next_batch
output_buffer.add_batch(b)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/output_buffer.py", line 50, in add_batch
self._buffer.add_batch(batch)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/delegating_block_builder.py", line 51, in add_batch
block = BlockAccessor.batch_to_block(batch)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/block.py", line 436, in batch_to_block
return ArrowBlockAccessor.numpy_to_block(
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/arrow_block.py", line 184, in numpy_to_block
col = ArrowTensorArray.from_numpy(col)
File "/Users/balaji/Documents/GitHub/ray/python/ray/air/util/tensor_extensions/arrow.py", line 312, in from_numpy
return ArrowVariableShapedTensorArray.from_numpy(arr)
File "/Users/balaji/Documents/GitHub/ray/python/ray/air/util/tensor_extensions/arrow.py", line 721, in from_numpy
raise ValueError(
ValueError: ArrowVariableShapedTensorArray only supports heterogeneous-shaped tensor collections, not arbitrarily nested ragged tensors. Got arrays: [('dtype=object', 'shape=(1,)'), ('dtype=object', 'shape=(1,)')]
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
bveeramani
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
May 15, 2023
bveeramani
added
P1
Issue that should be fixed within a few weeks
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
May 15, 2023
bveeramani
changed the title
[Data] Ray Data doesn't cast
[Data] Ray Data doesn't cast some list outputs to ndarrays
May 15, 2023
What happened + What you expected to happen
I returned a
list[list[dict[str, str]]]
from my UDF. I expected Ray Data to implicitly convert my output to an ndarray, but I got an error instead.If I explicitly cast my output to an array with
create_ragged_ndarray
, I don't get an error.Versions / Dependencies
Ray: 21e9d38
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: