You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seems like dataset with groupby followed by map_groups cannot correctly produce arrow blocks and subsequent call with raw python dictionary fails due to invalid block type.
(_map_block_nosplit pid=11452) Traceback (most recent call last):
(_map_block_nosplit pid=11452) File "python/ray/_raylet.pyx", line 830, in ray._raylet.execute_task
(_map_block_nosplit pid=11452) File "python/ray/_raylet.pyx", line 834, in ray._raylet.execute_task
(_map_block_nosplit pid=11452) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/_internal/compute.py", line 484, in _map_block_nosplit
(_map_block_nosplit pid=11452) for new_block in block_fn(blocks, *fn_args, **fn_kwargs):
(_map_block_nosplit pid=11452) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/dataset.py", line 582, in transform
(_map_block_nosplit pid=11452) yield from process_next_batch(batch)
(_map_block_nosplit pid=11452) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/dataset.py", line 570, in process_next_batch
(_map_block_nosplit pid=11452) batch = batch_fn(batch, *fn_args, **fn_kwargs)
(_map_block_nosplit pid=11452) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/grouped_dataset.py", line 330, in group_fn
(_map_block_nosplit pid=11452) block_accessor = BlockAccessor.for_block(batch)
(_map_block_nosplit pid=11452) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/block.py", line 399, in for_block
(_map_block_nosplit pid=11452) raise TypeError("Not a block type: {} ({})".format(block, type(block)))
(_map_block_nosplit pid=11452) TypeError: Not a block type: {'group': array([1, 1]), 'value': array([1, 2])} (<class 'dict'>)
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
jiaodong
added
bug
Something that is supposed to be working; but isn't
P1
Issue that should be fixed within a few weeks
data
Ray Data-related issues
air
labels
Nov 8, 2022
Changing it to batch_format="pyarrow" works for me and here's legacy docstring
batch_format: Specify "default" to use the default block format
--
| (promotes Arrow to pandas), "pandas" to select
| ``pandas.DataFrame`` as the batch format,
| or "pyarrow" to select ``pyarrow.Table``.
But as a result the input batch to UDF becomes 'pyarrow.lib.Table' rather than ndarray / Dict[str, ndarray]
…30172)
This is to fix issue found in #30102, where user can do ds.groupby("key").map_groups(fn, batch_format="numpy"). We need to correctly convert between block and batch in map_groups to handle it.
…ay-project#30172)
This is to fix issue found in ray-project#30102, where user can do ds.groupby("key").map_groups(fn, batch_format="numpy"). We need to correctly convert between block and batch in map_groups to handle it.
Signed-off-by: Weichen Xu <[email protected]>
What happened + What you expected to happen
Seems like dataset with groupby followed by map_groups cannot correctly produce arrow blocks and subsequent call with raw python dictionary fails due to invalid block type.
Versions / Dependencies
master
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: