You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We observed that high dimensional datasets are much slower to read when they are virtual (versioned) Datasets:
In [12]: shape = (19, 36, 26, 1)
In [14]: a = np.random.rand(*shape)
...: with TempDirCtx() as d:
...: with h5py.File(d / 'foo.h5', 'w') as f:
...: vf = VersionedHDF5File(f)
...: with vf.stage_version('v0') as sv:
...: sv.create_dataset('bar', data=a, chunks=a.shape)
...: with h5py.File(d / 'foo.h5', 'r') as f:
...: vf = VersionedHDF5File(f)
...: cv = vf[vf.current_version]
...: bar = cv['bar']
...: %time _ = [bar[:] for _ in range(1000)]
...:
CPU times: user 2.95 s, sys: 61.8 ms, total: 3.01 s
Wall time: 3.01 s
In [15]: a = np.random.rand(*shape)
...: with TempDirCtx() as d:
...: with h5py.File(d / 'foo.h5', 'w') as f:
...: f.create_dataset('bar', data=a, chunks=a.shape)
...: with h5py.File(d / 'foo.h5', 'r') as f:
...: bar = f['bar']
...: %time _ = [bar[:] for _ in range(1000)]
...:
CPU times: user 37.3 ms, sys: 60.2 ms, total: 97.5 ms
Wall time: 97.2 ms
A little bit of profiling points to H5S__hyper_project_intersection being an expensive function:
We observed that high dimensional datasets are much slower to read when they are virtual (versioned) Datasets:
A little bit of profiling points to
H5S__hyper_project_intersection
being an expensive function:Is it possible to speed up this function?
The text was updated successfully, but these errors were encountered: