You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
amogkam opened this issue
Jan 6, 2023
· 2 comments
· Fixed by #31513
Assignees
Labels
bugSomething that is supposed to be working; but isn'tdataRay Data-related issuesP1Issue that should be fixed within a few weekstriageNeeds triage (eg: priority, bug/not-bug, and owning component)
self = <ray.data.dataset_pipeline.DatasetPipeline.repeat.<locals>.RepeatIterator object at 0x16a801a30>
def __next__(self) -> Dataset[T]:
# Still going through the original pipeline.
if self._original_iter:
try:
make_ds = next(self._original_iter)
self._results.append(make_ds)
def gen():
res = make_ds()
res._set_epoch(0)
return res
return gen
except StopIteration:
self._original_iter = None
# Calculate the cursor limit.
if times:
self._max_i = len(self._results) * (times - 1)
else:
self._max_i = float("inf")
# Going through a repeat of the pipeline.
if self._i < self._max_i:
> make_ds = self._results[self._i % len(self._results)]
E ZeroDivisionError: integer division or modulo by zero
The text was updated successfully, but these errors were encountered:
amogkam
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jan 6, 2023
jianoaix
changed the title
[Data] Iterating through DatasetPipeline fails with ZeroDivisionError
[Datasets] Iterating through DatasetPipeline fails with ZeroDivisionErrorJan 7, 2023
The issue seems to be the peek() (used in pipe.schema()) advanced the base_iterable of pipeline, which was then used to create another pipeline (with pipe.map_batches()).
Signed-off-by: amogkam [email protected]Closes#31505.
When peeking a DatasetPipeline via .schema() for example, the first dataset in the base iterator is consumed. Then when chaining new operations on the pipeline, such as a map_batches, the dataset that was peeked is lost.
In this PR, we change the implementation of peek to not consume the base iterable, but rather create a new iterable consisting of just the first dataset.
…roject#31513)
Signed-off-by: amogkam [email protected]Closesray-project#31505.
When peeking a DatasetPipeline via .schema() for example, the first dataset in the base iterator is consumed. Then when chaining new operations on the pipeline, such as a map_batches, the dataset that was peeked is lost.
In this PR, we change the implementation of peek to not consume the base iterable, but rather create a new iterable consisting of just the first dataset.
Signed-off-by: Andrea Pisoni <[email protected]>
bugSomething that is supposed to be working; but isn'tdataRay Data-related issuesP1Issue that should be fixed within a few weekstriageNeeds triage (eg: priority, bug/not-bug, and owning component)
What happened + What you expected to happen
I expect the below code snippet to not fail.
Instead it fails with the following error:
Versions / Dependencies
master
Reproduction script
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: