Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modification to Fix conversion of Zarr string dataset to HDF5 #1175

Draft
wants to merge 1 commit into
base: fix/zarr_str_export
Choose a base branch
from

Conversation

rly
Copy link
Contributor

@rly rly commented Aug 20, 2024

Proposed changes to #1171.

It would be nice if HDF5IO and the ObjectMapper can accept any Zarr array just like it accepts any Numpy array, instead of accepting only Zarr arrays that have been written with hdmf-zarr.

@rly rly requested a review from oruebel August 20, 2024 08:27
@rly rly marked this pull request as draft August 20, 2024 08:27
Copy link

codecov bot commented Aug 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.91%. Comparing base (2b90d21) to head (bb3de8b).

Additional details and impacted files
@@                   Coverage Diff                   @@
##           fix/zarr_str_export    #1175      +/-   ##
=======================================================
+ Coverage                88.83%   88.91%   +0.08%     
=======================================================
  Files                       45       45              
  Lines                     9841     9846       +5     
  Branches                  2798     2797       -1     
=======================================================
+ Hits                      8742     8755      +13     
+ Misses                     779      775       -4     
+ Partials                   320      316       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# NOTE: Numpy < 2.0 has only fixed-length strings.
# Numpy 2.0 introduces variable-length strings (dtype=np.dtypes.StringDType()).
# HDMF does not yet do any special handling of numpy arrays with variable-length strings.
if isinstance(value, np.ndarray) or _is_zarr_array(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic approach seems fine. A couple of thoughts:

  • I think instead of the if isinstance(value, np.ndarray) or _is_zarr_array(value) it would be useful to split this case into if and elif cases to treat numpy and zarr separately here. I think that would be more readable.
  • I'd move the _is_zarr_array method ot the utils or data_utils
  • Alternatively to this approach we could also use hasattr(value, filters) to distinguish been Zarr and numpy arrays, because numpy arrays don't have a filters attribute which is used in Zarr to store all the filters, including the encoder for strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants