Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local copy of NWB file to avoid use of special characters in folder names #61

Merged
merged 21 commits into from
Jan 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 19 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,29 @@
## 0.1.7 (Upcoming)

### Bugs
* Use path relative to the current Zarr file in the definition of links and references to avoid breaking
links when moving Zarr files @oruebel [#46](https://github.com/hdmf-dev/hdmf-zarr/pull/46)
* Fix bugs in requirements defined in setup.py @oruebel [#46](https://github.com/hdmf-dev/hdmf-zarr/pull/46)
* Fix bug regarding Sphinx external links @mavaylon1 [#53](https://github.com/hdmf-dev/hdmf-zarr/pull/53)
* Updated gallery tests to use test_gallery.py and necessary package dependcies @mavaylon1 [#53](https://github.com/hdmf-dev/hdmf-zarr/pull/53)
* Update dateset used in conversion tutorial, which caused warnings @oruebel [#56](https://github.com/hdmf-dev/hdmf-zarr/pull/56)
* Updated the storage of links/references to use paths relative to the current Zarr file to avoid breaking
links/reference when moving Zarr files @oruebel [#46](https://github.com/hdmf-dev/hdmf-zarr/pull/46)
* Fixed bugs in requirements defined in setup.py @oruebel [#46](https://github.com/hdmf-dev/hdmf-zarr/pull/46)
* Fixed bug regarding Sphinx external links @mavaylon1 [#53](https://github.com/hdmf-dev/hdmf-zarr/pull/53)
* Updated gallery tests to use test_gallery.py and necessary package dependcies
@mavaylon1 [#53](https://github.com/hdmf-dev/hdmf-zarr/pull/53)
* Updated dateset used in conversion tutorial, which caused warnings
@oruebel [#56](https://github.com/hdmf-dev/hdmf-zarr/pull/56)

### Docs
* Add tutorial illustrating how to create a new NWB file with NWBZarrIO @oruebel [#46](https://github.com/hdmf-dev/hdmf-zarr/pull/46)
* Add docs for describing the mapping of HDMF schema to Zarr storage @oruebel [#48](https://github.com/hdmf-dev/hdmf-zarr/pull/48)
* Added tutorial illustrating how to create a new NWB file with NWBZarrIO
@oruebel [#46](https://github.com/hdmf-dev/hdmf-zarr/pull/46)
* Added docs for describing the mapping of HDMF schema to Zarr storage
@oruebel [#48](https://github.com/hdmf-dev/hdmf-zarr/pull/48)
* Added ``docs/gallery/resources`` for storing local files used by the tutorial galleries
@oruebel [#61](https://github.com/hdmf-dev/hdmf-zarr/pull/61)
* Removed dependency on ``dandi`` library for data download in the conversion tutorial by storing the NWB files as
local resources @oruebel [#61](https://github.com/hdmf-dev/hdmf-zarr/pull/61)

## 0.1.0

### New features

- Created new optional Zarr-based I/O backend for writing files using Zarr's `zarr.store.DirectoryStore` backend, including support for iterative write, chunking, compression, simple and compound data types, links, object references, namespace and spec I/O.
* Created new optional Zarr-based I/O backend for writing files using Zarr's `zarr.store.DirectoryStore` backend,
including support for iterative write, chunking, compression, simple and compound data types, links, object
references, namespace and spec I/O.
51 changes: 27 additions & 24 deletions docs/gallery/plot_convert_nwb_hdf5.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,37 +5,45 @@
This tutorial illustrates how to convert data between HDF5 and Zarr using
a Neurodata Without Borders (NWB) file from the DANDI data archive as an example.
In this tutorial we will convert our example file from HDF5 to Zarr and then
back again to HDF5.
back again to HDF5. The NWB standard is defined using :hdmf-docs:`HDMF <>` and uses the
:py:class:`~ hdmf.backends.hdf5.h5tools.HDF5IO` HDF5 backend from HDMF for storage.
"""


###############################################################################
# Setup
# -----
#
# We first **download a small NWB file** from the DANDI neurophysiology data archive as an example.
# The NWB standard is defined using HDMF and uses the :py:class:`~ hdmf.backends.hdf5.h5tools.HDF5IO`
# HDF5 backend from HDMF for storage.
# Here we use a small NWB file from the DANDI neurophysiology data archive from
# `DANDIset 000009 <https://dandiarchive.org/dandiset/000009/0.220126.1903>`_ as an example.
# To download the file directly from DANDI we can use:
#
# .. code-block:: python
# :linenos:
#
# from dandi.dandiapi import DandiAPIClient
# dandiset_id = "000009"
# filepath = "sub-anm00239123/sub-anm00239123_ses-20170627T093549_ecephys+ogen.nwb" # ~0.5MB file
# with DandiAPIClient() as client:
# asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(filepath)
# s3_path = asset.get_content_url(follow_redirects=1, strip_query=True)
# filename = os.path.basename(asset.path)
# asset.download(filename)
#
# We here use a local copy of a small file from this DANDIset as an example:
#

# sphinx_gallery_thumbnail_path = 'figures/gallery_thumbnail_plot_convert_nwb.png'
import os
import shutil
from dandi.dandiapi import DandiAPIClient

dandiset_id = "000009"
filepath = "sub-anm00239123/sub-anm00239123_ses-20170627T093549_ecephys+ogen.nwb" # ~0.5MB file
with DandiAPIClient() as client:
asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(filepath)
s3_path = asset.get_content_url(follow_redirects=1, strip_query=True)
filename = os.path.basename(asset.path)
asset.download(filename)

###############################################################################
# Next we define the names of the files to generate as part of this tutorial and clean up any
# data from previous executions of this tutorial.

zarr_filename = "test_zarr_" + filename + ".zarr"
hdf_filename = "test_hdf5_" + filename
# Input file to convert
basedir = "resources"
filename = os.path.join(basedir, "sub_anm00239123_ses_20170627T093549_ecephys_and_ogen.nwb")
# Zarr file to generate for converting from HDF5 to Zarr
zarr_filename = "test_zarr_" + os.path.basename(filename) + ".zarr"
# HDF5 file to generate for converting from Zarr to HDF5
hdf_filename = "test_hdf5_" + os.path.basename(filename)

# Delete our converted HDF5 and Zarr file from previous runs of this notebook
for fname in [zarr_filename, hdf_filename]:
Expand All @@ -60,8 +68,6 @@
with NWBHDF5IO(filename, 'r', load_namespaces=False) as read_io: # Create HDF5 IO object for read
with NWBZarrIO(zarr_filename, mode='w') as export_io: # Create Zarr IO object for write
export_io.export(src_io=read_io, write_args=dict(link_data=False)) # Export from HDF5 to Zarr
export_io.close()
read_io.close()

###############################################################################
# .. note::
Expand Down Expand Up @@ -104,8 +110,6 @@
with NWBZarrIO(zarr_filename, mode='r') as read_io: # Create Zarr IO object for read
with NWBHDF5IO(hdf_filename, 'w') as export_io: # Create HDF5 IO object for write
export_io.export(src_io=read_io, write_args=dict(link_data=False)) # Export from Zarr to HDF5
export_io.close()
read_io.close()

###############################################################################
# Read the new HDF5 file back
Expand All @@ -116,4 +120,3 @@

with NWBHDF5IO(hdf_filename, 'r') as hr:
hf = hr.read()
hr.close()
2 changes: 0 additions & 2 deletions docs/gallery/plot_nwb_zarrio.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,6 @@
absolute_path = os.path.abspath(path)
with NWBZarrIO(path=path, mode="w") as io:
io.write(nwbfile)
io.close()

###############################################################################
# Test opening with the absolute path instead
Expand All @@ -145,4 +144,3 @@
# relative ``path`` here instead is fine.
with NWBZarrIO(path=absolute_path, mode="r") as io:
infile = io.read()
io.close()
6 changes: 4 additions & 2 deletions docs/gallery/plot_zarr_dataset_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,15 +94,13 @@
zarr_dir = "example_data.zarr"
with ZarrIO(path=zarr_dir, manager=get_manager(), mode='w') as zarr_io:
zarr_io.write(test_table)
zarr_io.close()

###############################################################################
# reading the table from Zarr

zarr_io = ZarrIO(path=zarr_dir, manager=get_manager(), mode='r')
intable = zarr_io.read()
intable.to_dataframe()
zarr_io.close()

###############################################################################
# Check dataset settings used.
Expand All @@ -112,3 +110,7 @@
(c.name,
str(c.data.chunks),
str(c.data.compressor)))

###############################################################################
#
zarr_io.close()
11 changes: 3 additions & 8 deletions docs/gallery/plot_zarr_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,16 +78,17 @@
zarr_dir = "example.zarr"
with ZarrIO(path=zarr_dir, manager=get_manager(), mode='w') as zarr_io:
zarr_io.write(users_table)
zarr_io.close()

###############################################################################
# Reading the table from Zarr
# ----------------------------------
zarr_io = ZarrIO(path=zarr_dir, manager=get_manager(), mode='r')
intable = zarr_io.read()
intable.to_dataframe()
zarr_io.close()

###############################################################################
#
zarr_io.close()

###############################################################################
# Converting to/from HDF5 using ``export``
Expand All @@ -105,8 +106,6 @@
with ZarrIO(path=zarr_dir, manager=get_manager(), mode='r') as zarr_read_io:
with HDF5IO(path="example.h5", manager=get_manager(), mode='w') as hdf5_export_io:
hdf5_export_io.export(src_io=zarr_read_io, write_args=dict(link_data=False)) # use export!
hdf5_export_io.close()
zarr_read_io.close()

###############################################################################
# .. note::
Expand All @@ -121,7 +120,6 @@
intable_from_hdf5 = hdf5_read_io.read()
intable_hdf5_df = intable_from_hdf5.to_dataframe()
intable_hdf5_df # display the table in the gallery output
hdf5_read_io.close()

###############################################################################
# Exporting the HDF5 file to Zarr
Expand All @@ -134,8 +132,6 @@
with HDF5IO(path="example.h5", manager=get_manager(), mode='r') as hdf5_read_io:
with ZarrIO(path="example_exp.zarr", manager=get_manager(), mode='w') as zarr_export_io:
zarr_export_io.export(src_io=hdf5_read_io, write_args=dict(link_data=False)) # use export!
zarr_export_io.close()
hdf5_read_io.close()

###############################################################################
# Check that the Zarr file is correct
Expand All @@ -144,4 +140,3 @@
intable_from_zarr = zarr_read_io.read()
intable_zarr_df = intable_from_zarr.to_dataframe()
intable_zarr_df # display the table in the gallery output
zarr_read_io.close()
24 changes: 24 additions & 0 deletions docs/gallery/resources/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Resources
=========

sub_anm00239123_ses_20170627T093549_ecephys_and_ogen.nwb
--------------------------------------------------------

This NWB file was downloaded from `DANDIset 000009 <https://dandiarchive.org/dandiset/000009/0.220126.1903>`_
The file was modified to replace ``:`` characters used in the name of the ``ElectrodeGroup`` called ``ADunit: 32`` in
``'general/extracellular_ephys/`` to ``'ADunit_32'``. The dataset ``general/extracellular_ephys/electrodes/group_name``
as part of the electrodes table was updated accordingly to list the appropriate group name. This is to avoid issues
on Windows file systems that do not support ``:`` as part of folder names. The asses can be downloaded from DANDI via:

.. code-block:: python
:linenos:

from dandi.dandiapi import DandiAPIClient
dandiset_id = "000009"
filepath = "sub-anm00239123/sub-anm00239123_ses-20170627T093549_ecephys+ogen.nwb" # ~0.5MB file
with DandiAPIClient() as client:
asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(filepath)
s3_path = asset.get_content_url(follow_redirects=1, strip_query=True)
filename = os.path.basename(asset.path)
asset.download(filename)

Binary file not shown.
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@
intersphinx_mapping = {
'python': ('https://docs.python.org/3.10', None),
'numpy': ('https://numpy.org/doc/stable/', None),
'scipy': ('https://docs.scipy.org/doc/scipy/reference', None),
'matplotlib': ('https://matplotlib.org', None),
'scipy': ('https://docs.scipy.org/doc/scipy/', None),
'matplotlib': ('https://matplotlib.org/stable/', None),
'h5py': ('https://docs.h5py.org/en/latest/', None),
'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None),
'hdmf': ('https://hdmf.readthedocs.io/en/stable/', None),
Expand Down
1 change: 1 addition & 0 deletions docs/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,4 @@ Known Limitations
- Currently the :py:class:`~hdmf_zarr.backend.ZarrIO` backend uses Zarr's :py:class:`~zarr.storage.DirectoryStore` only. Other `Zarr stores <https://zarr.readthedocs.io/en/stable/api/storage.html>`_ could be added but will require proper treatment of links and references for those backends as links are not supported in Zarr (see `zarr-python issues #389 <https://github.com/zarr-developers/zarr-python/issues/389>`_.
- Exporting of HDF5 files with external links is not yet fully implemented/tested. (see `hdmf-zarr issue #49 <https://github.com/hdmf-dev/hdmf-zarr/issues/49>`_.
- Object references are currently always resolved on read (as are links) rather than being loaded lazily (see `hdmf-zarr issue #50 <https://github.com/hdmf-dev/hdmf-zarr/issues/50>`_.
- Special characters (e.g., ``:``, ``<``, ``>``, ``"``, ``/``, ``\``, ``|``, ``?``, or ``*``) may not be supported by all file systems (e.g., on Windows) and as such should not be used as part of the names of Datasets or Groups as Zarr needs to create folders on the filesystem for these objects.
1 change: 0 additions & 1 deletion requirements-doc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@ sphinx>=4 # improved support for docutils>=0.17
sphinx_rtd_theme>=1 # <1 does not work with docutils>=0.17
sphinx-gallery
sphinx-copybutton
dandi
13 changes: 11 additions & 2 deletions test.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/usr/bin/env python

# NOTE this script is currently used in CI *only* to test the sphinx gallery examples using python test.py -e

import warnings
import re
import argparse
Expand Down Expand Up @@ -67,19 +66,27 @@ def _import_from_file(script):
def run_example_tests():
global TOTAL, FAILURES, ERRORS
logging.info('running example tests')

# get list of example scripts
examples_scripts = list()
for root, dirs, files in os.walk(os.path.join(os.path.dirname(__file__), "docs", "gallery")):
for f in files:
if f.endswith(".py"):
examples_scripts.append(os.path.join(root, f))

TOTAL += len(examples_scripts)
curr_dir = os.getcwd()
for script in examples_scripts:
os.chdir(curr_dir) # Reset the working directory
script_abs = os.path.abspath(script) # Determine the full path of the script
# Set the working dir to be relative to the script to allow the use of relative file paths in the scripts
os.chdir(os.path.dirname(script_abs))
try:
logging.info("Executing %s" % script)
ws = list()
with warnings.catch_warnings(record=True) as tmp:
_import_from_file(script)
# Import/run the example gallery
_import_from_file(script_abs)
for w in tmp: # ignore RunTimeWarnings about importing
if isinstance(w.message, RuntimeWarning) and not warning_re.match(str(w.message)):
ws.append(w)
Expand All @@ -89,6 +96,8 @@ def run_example_tests():
print(traceback.format_exc())
FAILURES += 1
ERRORS += 1
# Make sure to reset the working directory at the end
os.chdir(curr_dir)


def main():
Expand Down
9 changes: 8 additions & 1 deletion test_gallery.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,13 @@ def run_gallery_tests():
warnings.simplefilter("error")

TOTAL += len(gallery_file_names)
curr_dir = os.getcwd()
for script in gallery_file_names:
logging.info("Executing %s" % script)
os.chdir(curr_dir) # Reset the working directory
script_abs = os.path.abspath(script) # Determine the full path of the script
# Set the working dir to be relative to the script to allow the use of relative file paths in the scripts
os.chdir(os.path.dirname(script_abs))
try:
with warnings.catch_warnings(record=True):
warnings.filterwarnings(
Expand All @@ -106,11 +111,13 @@ def run_gallery_tests():
# against a different version of numpy than the one installed
"ignore", message=_numpy_warning_re, category=RuntimeWarning
)
_import_from_file(script)
_import_from_file(script_abs)
except Exception:
print(traceback.format_exc())
FAILURES += 1
ERRORS += 1
# Make sure to reset the working directory at the end
os.chdir(curr_dir)


def main():
Expand Down