You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I enable verbose stats logs,
then I create a Ray Dataset with the following execution plan:
Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[Map(read_dataset)->Map(map_fn)]
Then, I produce data by iterating over the rows:
for row in dataset.iter_rows():
I get this error:
for row in dataset.iter_rows():
File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 244, in _wrapped_iterator
for batch in batch_iterable:
File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 161, in _create_iterator
block_iterator, stats, blocks_owned_by_consumer = self._to_block_iterator()
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/iterator/iterator_impl.py", line 33, in _to_block_iterator
block_iterator, stats, executor = ds._plan.execute_to_iterator()
File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 57, in handle_trace
raise e
File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 49, in handle_trace
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/plan.py", line 425, in execute_to_iterator
block_iter = itertools.chain([next(gen)], gen)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/legacy_compat.py", line 31, in execute_to_legacy_block_iterator
for bundle in bundle_iter:
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/interfaces/executor.py", line 37, in __next__
return self.get_next()
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/streaming_executor.py", line 152, in get_next
self._outer.shutdown(isinstance(e, StopIteration))
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/streaming_executor.py", line 186, in shutdown
stats_summary_string = self._final_stats.to_summary().to_string(
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/stats.py", line 944, in to_string
out += "\n" + self.runtime_metrics()
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/stats.py", line 979, in runtime_metrics
out += fmt_line(summ.base_name, op_total_time)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/stats.py", line 969, in fmt_line
return f"* {name}: {fmt(time)} ({time / total_wall_time * 100:.3f}%)\n"
ZeroDivisionError: division by zero
It turns out that there was an error in the user defined functionread_dataset, which is not mentioned in the logs.
What I expected
I enabled verbose stats logs because I suspected I might be doing something wrong. Instead of helping me, the verbose logs masked my error and misled me to look for problems in Ray code instead.
I only identified my error once I randomly disabled the verbose logs and saw the real error in the logs.
2024-07-22 11:21:51,627 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[Map(read_dataset)]
Running 0: 0%| | 0/10 [00:00<?, ?it/s]2024-07-22 11:21:52,331 ERROR streaming_executor_state.py:455 -- An exception was raised from a task of operator "Map(read_dataset)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
2024-07-22 11:21:52,332 ERROR exceptions.py:73 -- Exception occurred in Ray Data or Ray Core internal code. If you continue to see this error, please open an issue on the Ray project GitHub page with the full stack trace below: https://github.com/ray-project/ray/issues/new/choose
ray.data.exceptions.SystemException
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 244, in _wrapped_iterator
for batch in batch_iterable:
File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 161, in _create_iterator
block_iterator, stats, blocks_owned_by_consumer = self._to_block_iterator()
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/iterator/iterator_impl.py", line 33, in _to_block_iterator
block_iterator, stats, executor = ds._plan.execute_to_iterator()
File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 86, in handle_trace
raise e.with_traceback(None) from SystemException()
ZeroDivisionError: division by zero
Instead, repeating the example but setting: DataContext.get_current().verbose_stats_logs = False prints:
2024-07-22 11:22:22,950 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[Map(read_dataset)]
Running 0: 0%| | 0/10 [00:00<?, ?it/s]2024-07-22 11:22:22,959 ERROR streaming_executor_state.py:455 -- An exception was raised from a task of operator "Map(read_dataset)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
2024-07-22 11:22:22,964 ERROR exceptions.py:63 -- Exception occurred in user code, with the abbreviated stack trace below. By default, the Ray Data internal stack trace is omitted from stdout, and only written to the Ray Data log files at /tmp/ray/session_2024-07-22_11-21-46_834791_3983/logs/ray-data. To output the full stack trace to stdout, set `DataContext.log_internal_stack_trace_to_stdout` to True.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 244, in _wrapped_iterator
for batch in batch_iterable:
File "/usr/local/lib/python3.10/dist-packages/ray/data/iterator.py", line 161, in _create_iterator
block_iterator, stats, blocks_owned_by_consumer = self._to_block_iterator()
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/iterator/iterator_impl.py", line 33, in _to_block_iterator
block_iterator, stats, executor = ds._plan.execute_to_iterator()
File "/usr/local/lib/python3.10/dist-packages/ray/data/exceptions.py", line 84, in handle_trace
raise e.with_traceback(None)
ray.exceptions.RayTaskError(UserCodeException): ray::Map(read_dataset)() (pid=4241, ip=10.244.37.29)
File "<stdin>", line 2, in read_dataset
ValueError
The above exception was the direct cause of the following exception:
ray::Map(read_dataset)() (pid=4241, ip=10.244.37.29)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_operator.py", line 438, in _map_task
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_transformer.py", line 392, in __call__
for data in iter:
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_transformer.py", line 134, in _udf_timed_iter
output = next(input)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/map_transformer.py", line 216, in __call__
yield from self._row_fn(input, ctx)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 255, in transform_fn
out_row = fn(row)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 130, in fn
_handle_debugger_exception(e)
File "/usr/local/lib/python3.10/dist-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 146, in _handle_debugger_exception
raise UserCodeException() from e
ray.exceptions.UserCodeException
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
Oblynx
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jul 22, 2024
What happened + What you expected to happen
I enable verbose stats logs,
then I create a Ray Dataset with the following execution plan:
Then, I produce data by iterating over the rows:
I get this error:
It turns out that there was an error in the user defined function
read_dataset
, which is not mentioned in the logs.What I expected
I enabled verbose stats logs because I suspected I might be doing something wrong. Instead of helping me, the verbose logs masked my error and misled me to look for problems in Ray code instead.
I only identified my error once I randomly disabled the verbose logs and saw the real error in the logs.
Versions / Dependencies
5.15.0-1064-azure #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Reproduction script
This prints:
Instead, repeating the example but setting:
DataContext.get_current().verbose_stats_logs = False
prints:Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: