You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tried to transform a dataset with MultiHotEncoder, but got a TypeError:
Traceback (most recent call last):
File "/Users/balaji/Documents/GitHub/ray/temp.py", line 7, in <module>
encoder.fit_transform(dataset)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/preprocessor.py", line 120, in fit_transform
self.fit(dataset)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/preprocessor.py", line 105, in fit
return self._fit(dataset)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/preprocessors/encoder.py", line 318, in _fit
self.stats_ = _get_unique_value_indices(
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/preprocessors/encoder.py", line 550, in _get_unique_value_indices
value_counts = dataset.map_batches(get_pd_value_counts, batch_format="pandas")
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/dataset.py", line 659, in map_batches
return Dataset(plan, self._epoch, self._lazy)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/dataset.py", line 223, in __init__
self._plan.execute(allow_clear_input_blocks=False)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/plan.py", line 321, in execute
blocks, stage_info = stage(
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/plan.py", line 688, in __call__
blocks = compute._apply(
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/compute.py", line 154, in _apply
raise e from None
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/compute.py", line 138, in _apply
results = map_bar.fetch_until_complete(refs)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/progress_bar.py", line 75, in fetch_until_complete
for ref, result in zip(done, ray.get(done)):
File "/Users/balaji/Documents/GitHub/ray/python/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/Users/balaji/Documents/GitHub/ray/python/ray/_private/worker.py", line 2347, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::_map_block_split() (pid=82040, ip=127.0.0.1)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/compute.py", line 459, in _map_block_split
for new_block in block_fn(blocks, *fn_args, **fn_kwargs):
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/dataset.py", line 637, in transform
yield from process_next_batch(batch)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/dataset.py", line 601, in process_next_batch
batch = batch_fn(batch, *fn_args, **fn_kwargs)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/preprocessors/encoder.py", line 543, in get_pd_value_counts
result[col] = get_pd_value_counts_per_column(df[col])
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/preprocessors/encoder.py", line 536, in get_pd_value_counts_per_column
return Counter(col.value_counts(dropna=False).to_dict())
File "/Users/balaji/Documents/GitHub/ray/.venv/lib/python3.10/site-packages/pandas/core/series.py", line 1895, in to_dict
return into_c((k, maybe_box_native(v)) for k, v in self.items())
TypeError: unhashable type: 'numpy.ndarray'
import ray
from ray.data.preprocessors import MultiHotEncoder
dataset = ray.data.from_items([{"column": ["spam", "ham", "eggs"]}])
print(dataset)
encoder = MultiHotEncoder(columns=["column"])
encoder.fit_transform(dataset)
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
bveeramani
added
bug
Something that is supposed to be working; but isn't
P1
Issue that should be fixed within a few weeks
air
data
Ray Data-related issues
labels
Dec 28, 2022
What happened + What you expected to happen
Tried to transform a dataset with
MultiHotEncoder
, but got aTypeError
:Versions / Dependencies
ray: f40ac95
pyarrow: 10.0.1
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: