-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add extension best practices #338
base: dev
Are you sure you want to change the base?
Changes from 2 commits
5649098
a750c3f
5c33003
99d2482
4076d70
ef20e9d
6af0fe7
25cdd41
21c5e8a
2e00f7f
f17e1b9
fe72670
9ffe41a
73d5b9b
aa57eae
ba32682
efd03d8
20a3b31
7131b01
d2d503e
f38013c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -16,15 +16,103 @@ If an extension is required, tutorials for the process may be found through the | |||||||||||||||||
It is also encouraged for extensions to contain their own check functions for their own best practices. | ||||||||||||||||||
See the` :ref:`adding_custom_checks` section of the Developer Guide for how to do this. | ||||||||||||||||||
|
||||||||||||||||||
Define new ``neurodata_types`` at the top-level (a.k.a., do not nest type definitions) | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
|
||||||||||||||||||
Rather than nesting definitions of ``neurodata_types``, all new types should be defined | ||||||||||||||||||
at the top-level of the schema. To include a specific ``neurodata_type`` in another type | ||||||||||||||||||
use the ``neurodata_type_inc`` key instead. For example: | ||||||||||||||||||
|
||||||||||||||||||
Use Existing Neurodata Types | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
.. tabs:: | ||||||||||||||||||
|
||||||||||||||||||
.. tab:: Do This | ||||||||||||||||||
|
||||||||||||||||||
.. tabs:: | ||||||||||||||||||
|
||||||||||||||||||
.. code-tab:: py Python | ||||||||||||||||||
|
||||||||||||||||||
from pynwb.spec import NWBGroupSpec | ||||||||||||||||||
|
||||||||||||||||||
# Define the first type | ||||||||||||||||||
type1_ext = NWBGroupSpec( | ||||||||||||||||||
name='custom_type1', | ||||||||||||||||||
doc='Example extension type 1', | ||||||||||||||||||
neurodata_type_def='MyNewType1', | ||||||||||||||||||
neurodata_type_inc='LabMetaData', | ||||||||||||||||||
) | ||||||||||||||||||
|
||||||||||||||||||
# Then define the second type and include the first type | ||||||||||||||||||
type2_ext = NWBGroupSpec( | ||||||||||||||||||
name='custom_type2', | ||||||||||||||||||
doc='Example extension type 2', | ||||||||||||||||||
neurodata_type_def='MyNewType2', | ||||||||||||||||||
groups=[NWBGroupSpec(neurodata_type_inc='MyNewType1', | ||||||||||||||||||
doc='Included group of ype MyNewType1')] | ||||||||||||||||||
) | ||||||||||||||||||
|
||||||||||||||||||
.. code-tab:: yaml YAML | ||||||||||||||||||
|
||||||||||||||||||
groups: | ||||||||||||||||||
- neurodata_type_def: MyNewType1 | ||||||||||||||||||
neurodata_type_inc: LabMetaData | ||||||||||||||||||
name: custom_type1 | ||||||||||||||||||
doc: Example extension type 1 | ||||||||||||||||||
- neurodata_type_def: MyNewType2 | ||||||||||||||||||
neurodata_type_inc: NWBContainer | ||||||||||||||||||
name: custom_type2 | ||||||||||||||||||
doc: Example extension type 2 | ||||||||||||||||||
groups: | ||||||||||||||||||
- neurodata_type_inc: MyNewType1 | ||||||||||||||||||
doc: Included group of ype MyNewType1 | ||||||||||||||||||
|
||||||||||||||||||
.. tab:: DON'T do this | ||||||||||||||||||
|
||||||||||||||||||
.. tabs:: | ||||||||||||||||||
|
||||||||||||||||||
.. code-tab:: py Python | ||||||||||||||||||
|
||||||||||||||||||
from pynwb.spec import NWBGroupSpec | ||||||||||||||||||
|
||||||||||||||||||
# Do NOT define a new type via ``neurodata_type_def`` within the | ||||||||||||||||||
# definition of another type. Always define the types separately | ||||||||||||||||||
# and use ``neurodata_type_inc`` to include the type | ||||||||||||||||||
type2_ext = NWBGroupSpec( | ||||||||||||||||||
name='custom_type2', | ||||||||||||||||||
doc='Example extension type 2', | ||||||||||||||||||
neurodata_type_def='MyNewType2', | ||||||||||||||||||
groups=[NWBGroupSpec( | ||||||||||||||||||
name='custom_type1', | ||||||||||||||||||
doc='Example extension type 1', | ||||||||||||||||||
neurodata_type_def='MyNewType1', | ||||||||||||||||||
neurodata_type_inc='LabMetaData', | ||||||||||||||||||
)] | ||||||||||||||||||
) | ||||||||||||||||||
|
||||||||||||||||||
.. code-tab:: yaml YAML | ||||||||||||||||||
|
||||||||||||||||||
groups: | ||||||||||||||||||
- neurodata_type_def: MyNewType2 | ||||||||||||||||||
neurodata_type_inc: NWBContainer | ||||||||||||||||||
name: custom_type2 | ||||||||||||||||||
doc: Example extension type 2 | ||||||||||||||||||
groups: | ||||||||||||||||||
- neurodata_type_def: MyNewType1 | ||||||||||||||||||
neurodata_type_inc: LabMetaData | ||||||||||||||||||
name: custom_type1 | ||||||||||||||||||
doc: Example extension type 1 | ||||||||||||||||||
|
||||||||||||||||||
Build on and Reuse Existing Neurodata Types | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
|
||||||||||||||||||
When possible, use existing types when creating extensions either by creating new ``neurodata_types`` that inherit from | ||||||||||||||||||
existing ones, or by creating ``neurodata_types`` that contain existing ones. Building on existing types facilitates the | ||||||||||||||||||
reuse of existing functionality and interpretation of the data. If a community extension already exists that has a | ||||||||||||||||||
similar scope, it is preferable to use that extension rather than creating a new one. | ||||||||||||||||||
similar scope, it is preferable to use that extension rather than creating a new one. For example: | ||||||||||||||||||
* Extend ``TimeSeries`` for storing timeseries data. NWB provides main types of ``TimeSeries`` | ||||||||||||||||||
and you should identify the most specific type of ``TimeSeries`` relevant for your use case | ||||||||||||||||||
(e.g., extend ``ElectricalSeries`` to define a new kind of electrical recording). | ||||||||||||||||||
* Extend ``DynamicTable`` to store tabular data/ | ||||||||||||||||||
bendichter marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
* Extend ``TimeIntervals`` to store specific annotations of intervals in time. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
Provide Documentation | ||||||||||||||||||
|
@@ -45,3 +133,67 @@ anybody who receives the data also receives the necessary data to interpret it. | |||||||||||||||||
.. note:: | ||||||||||||||||||
In :pynwb-docs:`PyNWB <>`, the extension is cached automatically. This can be specified explicitly with | ||||||||||||||||||
``io.write(filepath, cache_spec=True)`` | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
Use Attributes for small metadata related to a particular data object (Group or Dataset) | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
|
||||||||||||||||||
Attributes should be used mainly to store small metadata (usually less than 64 Kbytes) that | ||||||||||||||||||
is associated with a particular Group or Dataset. Typical uses of attributes are, e.g., to | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just reading this, it's hard to grasp 'how big' something is to be less than 64 KB Since most metadata are strings, maybe it would be useful to mention a relative scale to that? A standard Python string under 64 KB would correspond to ~1300 characters, from import sys
sys.getsizeof("a")
> 50
64_000 / 50
> 1280 Google also says the average number of characters per word in English is 4.7, so that's ~272 words per string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mostly relevant to fields like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea of relating this to characters. Do you know how big characters are within HDF5? Is it the same as Python strings in memory? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm I'll have to investigate that a bit deeper, but most likely not the same as Python... I seem to remember us having some issues related to their byte string types here on the Inspector, but maybe that was for fields that are technically set as datasets, not attributes Even in Python, if you apply a specific encoding it can change the size quite a bit sys.getsizeof("a".encode("utf-8"))
> 34 which would put the max closer to ~400 words There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, from https://docs.h5py.org/en/stable/strings.html#storing-strings and https://docs.h5py.org/en/stable/special.html#h5py.string_dtype and manually inspecting some of these attributes with HDFView It seems HDF5 stores with UTF-8 encoding, so about ~400 words on average from above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
words or characters? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
words using Google's estimate of average characters per word in English There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or just shy of ~1900 characters if that's easier to communicate |
||||||||||||||||||
define the ``unit`` of measurement of a dataset or to store a short ``description`` of | ||||||||||||||||||
a group or dataset. For larger data, datasets should be used instead. | ||||||||||||||||||
|
||||||||||||||||||
In practice, the main difference is that in PyNWB and MatNWB all Attributes are read into memory when reading the | ||||||||||||||||||
bendichter marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
NWB file. If you would like to allow users to read a file without reading all of these particular data values, use a | ||||||||||||||||||
Dataset. | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
Link data to relevant metadata | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
|
||||||||||||||||||
Often metadata relevant to a particular type of data is stored elsewhere, e.g., information | ||||||||||||||||||
about the ``Device`` used. To ensure relevant metadata can be uniquely identified, the data | ||||||||||||||||||
should include links to the relevant metadata. NWB provides a few key mechanisms for linking: | ||||||||||||||||||
|
||||||||||||||||||
* Use ``links`` (defined via ``NWBLinkSpec``) to link to a particular dataset or group | ||||||||||||||||||
* Use ``DynamicTableRegion`` to link to a set of rows in a ``DynamicTable`` | ||||||||||||||||||
* Use a ``dataset`` with an object reference data type to store collections of links | ||||||||||||||||||
to other objects, e.g., the following dtype to define a dataset of links to ``TimeSeries`` | ||||||||||||||||||
|
||||||||||||||||||
.. code-block:: yaml | ||||||||||||||||||
dtype: | ||||||||||||||||||
target_type: TimeSeries | ||||||||||||||||||
reftype: object | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
Best practices for object names | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
|
||||||||||||||||||
Names for groups, datasets, attributes, or links should typically: | ||||||||||||||||||
|
||||||||||||||||||
* **Use lowercase letters only** | ||||||||||||||||||
CodyCBakerPhD marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
* **Use ``_`` instead of `` `` to separate parts in names**. E.g., use the name | ||||||||||||||||||
CodyCBakerPhD marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
``starting_time`` instead of ``starting time`` | ||||||||||||||||||
* **Explicit**. E.g., avoid the use of ambiguous abbreviation in names. | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
Best practices for naming ``neurodata_types`` | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
|
||||||||||||||||||
For defining new types via ``neurodata_type_def`` use: | ||||||||||||||||||
|
||||||||||||||||||
* **Use camelcase:** notation, i.e., names of types should NOT include spaces, | ||||||||||||||||||
CodyCBakerPhD marked this conversation as resolved.
Show resolved
Hide resolved
bendichter marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
always start with an uppercase letter, and use a single capitalized letter to | ||||||||||||||||||
separate parts of the name. E.g,. ``neurodata_type_def: LaserMeasurement`` | ||||||||||||||||||
* **Use the postfix ``Series`` when extending a ``TimeSeries`` type.** E.g., when | ||||||||||||||||||
creating a new ``TimeSeries`` for laser measurements then add ``Series`` to | ||||||||||||||||||
the type name, e.g,. ``neurodata_type_def: LaserMeasurementSeries`` | ||||||||||||||||||
* **Use the postfix ``Table`` when extending a ``DynamicTable`` type.** e.g., | ||||||||||||||||||
``neurodata_type_def: LaserSettingsTable`` | ||||||||||||||||||
* **Explicit**. E.g., avoid the use of ambiguous abbreviation in names. | ||||||||||||||||||
|
||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
Use the ``ndx-template`` to create new extensions | ||||||||||||||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||
|
||||||||||||||||||
By using the :nwb_extension_git:`ndx-template` to create new extensions helps ensure | ||||||||||||||||||
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these
TimeSeries
/ElectricalSeries
be :py:class: intersphinx references?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for below on
TimeIntervals
/DynamicTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest intersphinx to the nwb-schema docs since this is for extensions