You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
delete_versions recreates the raw dataset by creating a temporary _tmp_raw_data dataset. To populate _tmp_raw_data it creates a temporary numpy array in memory which is large enough to contain all data across all versions (in other words, it needs a lot of memory). This can fail:
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 515, in delete_versions
_delete_dataset(f, name, versions_to_delete)
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 439, in _delete_dataset
raw_data_chunks_map = _recreate_raw_data(f, name, versions_to_delete)
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 218, in _recreate_raw_data
n = np.full(new_raw_data.shape, _get_np_fillvalue(raw_data), dtype=new_raw_data.dtype)
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/numpy/core/numeric.py", line 343, in full
a = empty(shape, dtype, order)
MemoryError: Out of memory:
Unable to allocate 1.16 GiB for an array with shape (124610, 1250) and data type float64
We tried to run delete_versions again, but now it failed because the _tmp_raw_data dataset is already present.
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 515, in delete_versions
_delete_dataset(f, name, versions_to_delete)
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 439, in _delete_dataset
raw_data_chunks_map = _recreate_raw_data(f, name, versions_to_delete)
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 208, in _recreate_raw_data
new_raw_data = f['_version_data'][name].create_dataset(
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/h5py/_hl/group.py", line 183, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/usr/local/python/python3/std/lib64/python3.10/site-packages/h5py/_hl/dataset.py", line 165, in make_new_dset
dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 135, in h5py.h5d.create
ValueError: Unable to synchronously create dataset (name already exists)
I think there are a couple things to fix here:
we should use less peak memory in delete_versions / _recreate_raw_data. Instead of creating a temporary numpy array, can we directly write chunks to the raw dataset?
we should gracefully handle left behind _tmp_raw_data by previous failed delete_versions calls
we should clean up _tmp_raw_data on failure, probably by adding a try/except block around the code in _recreate_raw_data
The text was updated successfully, but these errors were encountered:
delete_versions
recreates the raw dataset by creating a temporary_tmp_raw_data
dataset. To populate_tmp_raw_data
it creates a temporary numpy array in memory which is large enough to contain all data across all versions (in other words, it needs a lot of memory). This can fail:We tried to run
delete_versions
again, but now it failed because the_tmp_raw_data
dataset is already present.I think there are a couple things to fix here:
delete_versions
/_recreate_raw_data
. Instead of creating a temporary numpy array, can we directly write chunks to the raw dataset?_tmp_raw_data
by previous faileddelete_versions
calls_tmp_raw_data
on failure, probably by adding a try/except block around the code in_recreate_raw_data
The text was updated successfully, but these errors were encountered: