We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When working with large arrays, setting with_format to e.g. numpy then applying map causes a significant slowdown for iterable datasets.
import numpy as np import time from datasets import Dataset, Features, Array3D features=Features(**{"array0": Array3D((None, 10, 10), dtype="float32"), "array1": Array3D((None,10,10), dtype="float32")}) dataset = Dataset.from_dict({f"array{i}": [np.zeros((x,10,10), dtype=np.float32) for x in [2000,1000]*25] for i in range(2)}, features=features)
Then
ds = dataset.to_iterable_dataset() ds = ds.with_format("numpy").map(lambda x: x) t0 = time.time() for ex in ds: pass t1 = time.time() print(t1-t0)
takes 27 s, whereas
ds = dataset.to_iterable_dataset() ds = ds.with_format("numpy") ds = dataset.to_iterable_dataset() t0 = time.time() for ex in ds: pass t1 = time.time() print(t1 - t0)
takes ~1s
Map should not introduce a slowdown when formatting is enabled.
3.0.2
The text was updated successfully, but these errors were encountered:
The below easily eats up 32G of RAM. Leaving it for a while bricked the laptop with 16GB.
dataset = load_dataset("Voxel51/OxfordFlowers102", data_dir="data").with_format("numpy") processed_dataset = dataset.map(lambda x: x)
Similar problems occur if using a real transform function in .map().
.map()
Sorry, something went wrong.
No branches or pull requests
Describe the bug
When working with large arrays, setting with_format to e.g. numpy then applying map causes a significant slowdown for iterable datasets.
Steps to reproduce the bug
Then
takes 27 s, whereas
takes ~1s
Expected behavior
Map should not introduce a slowdown when formatting is enabled.
Environment info
3.0.2
The text was updated successfully, but these errors were encountered: