Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete_versions: use less memory and gracefully handle failure (PyInf#11013) #273

Closed
ArvidJB opened this issue Sep 21, 2023 · 0 comments · Fixed by #277
Closed

delete_versions: use less memory and gracefully handle failure (PyInf#11013) #273

ArvidJB opened this issue Sep 21, 2023 · 0 comments · Fixed by #277
Assignees

Comments

@ArvidJB
Copy link
Collaborator

ArvidJB commented Sep 21, 2023

delete_versions recreates the raw dataset by creating a temporary _tmp_raw_data dataset. To populate _tmp_raw_data it creates a temporary numpy array in memory which is large enough to contain all data across all versions (in other words, it needs a lot of memory). This can fail:

  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 515, in delete_versions
    _delete_dataset(f, name, versions_to_delete)
  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 439, in _delete_dataset
    raw_data_chunks_map = _recreate_raw_data(f, name, versions_to_delete)
  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 218, in _recreate_raw_data
    n = np.full(new_raw_data.shape, _get_np_fillvalue(raw_data), dtype=new_raw_data.dtype)
  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/numpy/core/numeric.py", line 343, in full
    a = empty(shape, dtype, order)
MemoryError: Out of memory:
  Unable to allocate 1.16 GiB for an array with shape (124610, 1250) and data type float64

We tried to run delete_versions again, but now it failed because the _tmp_raw_data dataset is already present.

  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 515, in delete_versions
    _delete_dataset(f, name, versions_to_delete)
  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 439, in _delete_dataset
    raw_data_chunks_map = _recreate_raw_data(f, name, versions_to_delete)
  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/versioned_hdf5/replay.py", line 208, in _recreate_raw_data
    new_raw_data = f['_version_data'][name].create_dataset(
  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/h5py/_hl/group.py", line 183, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/usr/local/python/python3/std/lib64/python3.10/site-packages/h5py/_hl/dataset.py", line 165, in make_new_dset
    dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 135, in h5py.h5d.create
ValueError: Unable to synchronously create dataset (name already exists)

I think there are a couple things to fix here:

  1. we should use less peak memory in delete_versions / _recreate_raw_data. Instead of creating a temporary numpy array, can we directly write chunks to the raw dataset?
  2. we should gracefully handle left behind _tmp_raw_data by previous failed delete_versions calls
  3. we should clean up _tmp_raw_data on failure, probably by adding a try/except block around the code in _recreate_raw_data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants