Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why ray.data.read_images cat not combine_chunks #34563

Open
yanxiaod123 opened this issue Apr 19, 2023 · 0 comments
Open

why ray.data.read_images cat not combine_chunks #34563

yanxiaod123 opened this issue Apr 19, 2023 · 0 comments
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P2 Important issue, but not time-critical

Comments

@yanxiaod123
Copy link

What happened + What you expected to happen

I use the following code test, i find apply if branch https://github.com/ray-project/ray/blob/master/python/ray/data/_internal/arrow_ops/transform_pyarrow.py#L283,if i have many chunks,how can i to do?because i search #34352 indicate many chunks due to slow.

Versions / Dependencies

ray==3.0.0.dev0

Reproduction script

dataset: Dataset = ray.data.read_images(paths="", size=(224, 224), parallelism=10)
start_time = time.time()
for data in dataset.iter_batches(batch_size=16, batch_format="numpy"):
pass
print('process time is ', time.time() - start_time)

Issue Severity

High: It blocks me from completing my task.

@yanxiaod123 yanxiaod123 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 19, 2023
@c21 c21 added P1 Issue that should be fixed within a few weeks data Ray Data-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 21, 2023
@anyscalesam anyscalesam added P2 Important issue, but not time-critical and removed P1 Issue that should be fixed within a few weeks labels Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants