Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update error/warning message to give correct instructions #287

Closed
wants to merge 1 commit into from

Conversation

ArvidJB
Copy link
Collaborator

@ArvidJB ArvidJB commented Oct 30, 2023

Fix error message since you cannot directly pass a file name to VersionedHDF5File.

Internal ticket: PyInf#11354

Fix error message since you cannot directly pass a file name
to `VersionedHDF5File`.

Internal ticket: PyInf#11354
ArvidJB added a commit to ArvidJB/versioned-hdf5 that referenced this pull request Nov 7, 2023
When hashing `dtype=object` arrays we need to include the
length of the individual strings, otherwise arrays with
identical contents after concatenating all elements
hash to the same value:
```
>>> import h5py
... import versioned_hdf5
... import numpy as np
...
... path = "temp.h5"
...
... with h5py.File(path, mode="w") as f:
...     file = versioned_hdf5.VersionedHDF5File(f)
...     with file.stage_version("r0") as group:
...         group.create_dataset(
...             "values", shape=(2,), dtype=h5py.string_dtype(encoding='ascii'), data=np.array([b"a", b"bc"], dtype=object),
...             maxshape=(None,), chunks=(10000,), compression="gzip", shuffle=False,
...         )
...     with file.stage_version("r1") as group:
...         group.create_dataset(
...             "values", shape=(2,), dtype=h5py.string_dtype(encoding='ascii'), data=np.array([b"ab", b"c"], dtype=object),
...             maxshape=(None,), chunks=(10000,), compression="gzip", shuffle=False,
...         )
...
... with h5py.File(path, mode="r") as f:
...     file = versioned_hdf5.VersionedHDF5File(f)
...     for i in range(2):
...         print(file[f"r{i}"]["values"][:])
...
[b'a' b'bc']
[b'a' b'bc']
```

By including the length of each string as part of the hash the hashes
will now be unique.

This also includes the changes in deshaw#287
and supersedes that PR.

Fixes deshaw#288
peytondmurray pushed a commit that referenced this pull request Nov 9, 2023
* Include string length when hashing

When hashing `dtype=object` arrays we need to include the
length of the individual strings, otherwise arrays with
identical contents after concatenating all elements
hash to the same value:
```
>>> import h5py
... import versioned_hdf5
... import numpy as np
...
... path = "temp.h5"
...
... with h5py.File(path, mode="w") as f:
...     file = versioned_hdf5.VersionedHDF5File(f)
...     with file.stage_version("r0") as group:
...         group.create_dataset(
...             "values", shape=(2,), dtype=h5py.string_dtype(encoding='ascii'), data=np.array([b"a", b"bc"], dtype=object),
...             maxshape=(None,), chunks=(10000,), compression="gzip", shuffle=False,
...         )
...     with file.stage_version("r1") as group:
...         group.create_dataset(
...             "values", shape=(2,), dtype=h5py.string_dtype(encoding='ascii'), data=np.array([b"ab", b"c"], dtype=object),
...             maxshape=(None,), chunks=(10000,), compression="gzip", shuffle=False,
...         )
...
... with h5py.File(path, mode="r") as f:
...     file = versioned_hdf5.VersionedHDF5File(f)
...     for i in range(2):
...         print(file[f"r{i}"]["values"][:])
...
[b'a' b'bc']
[b'a' b'bc']
```

By including the length of each string as part of the hash the hashes
will now be unique.

This also includes the changes in #287
and supersedes that PR.

Fixes #288

* Update comment
@ArvidJB
Copy link
Collaborator Author

ArvidJB commented Nov 10, 2023

Merged as part of #288 .

@ArvidJB ArvidJB closed this Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant