-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add extension best practices #338
base: dev
Are you sure you want to change the base?
Conversation
ReadTheDocs does not like the
|
* Extend ``TimeSeries`` for storing timeseries data. NWB provides main types of ``TimeSeries`` | ||
and you should identify the most specific type of ``TimeSeries`` relevant for your use case | ||
(e.g., extend ``ElectricalSeries`` to define a new kind of electrical recording). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these TimeSeries
/ElectricalSeries
be :py:class: intersphinx references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for below on TimeIntervals
/ DynamicTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest intersphinx to the nwb-schema docs since this is for extensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Attributes should be used mainly to store small metadata (usually less than 64 Kbytes) that | ||
is associated with a particular Group or Dataset. Typical uses of attributes are, e.g., to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reading this, it's hard to grasp 'how big' something is to be less than 64 KB
Since most metadata are strings, maybe it would be useful to mention a relative scale to that?
A standard Python string under 64 KB would correspond to ~1300 characters, from
import sys
sys.getsizeof("a")
> 50
64_000 / 50
> 1280
Google also says the average number of characters per word in English is 4.7, so that's ~272 words per string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly relevant to fields like description
, or sometimes reference_frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of relating this to characters. Do you know how big characters are within HDF5? Is it the same as Python strings in memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'll have to investigate that a bit deeper, but most likely not the same as Python... I seem to remember us having some issues related to their byte string types here on the Inspector, but maybe that was for fields that are technically set as datasets, not attributes
Even in Python, if you apply a specific encoding it can change the size quite a bit
sys.getsizeof("a".encode("utf-8"))
> 34
which would put the max closer to ~400 words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, from https://docs.h5py.org/en/stable/strings.html#storing-strings and https://docs.h5py.org/en/stable/special.html#h5py.string_dtype and manually inspecting some of these attributes with HDFView
It seems HDF5 stores with UTF-8 encoding, so about ~400 words on average from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~400 words
words or characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
words or characters?
words using Google's estimate of average characters per word in English
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just shy of ~1900 characters if that's easier to communicate
Co-authored-by: Cody Baker <[email protected]>
Co-authored-by: Cody Baker <[email protected]>
* **Use the postfix ``Table`` when extending a ``DynamicTable`` type.** e.g., | ||
``neurodata_type_def: LaserSettingsTable`` | ||
* **Explicit**. E.g., avoid the use of ambiguous abbreviation in names. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit flexibility: Consider data reuse and tool developers | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
One of the aims of NWB is to make reusing data easier. This means that when proposing an extension you need to put yourself in the shoes of someone who will receive an NWB dataset and attempt to analyze it. Additionally, consider developers that will try to write tools that take NWB datasets as inputs. It’s worth assessing how much additional code different ways of approaching your extension will lead to. | |
and you should identify the most specific type of ``TimeSeries`` relevant for your use case | ||
(e.g., extend ``ElectricalSeries`` to define a new kind of electrical recording). | ||
* Extend ``DynamicTable`` to store tabular data. | ||
* Extend ``TimeIntervals`` to store specific annotations of intervals in time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Extend ``TimeIntervals`` to store specific annotations of intervals in time. | |
* Extend ``TimeIntervals`` to store specific annotations of intervals in time. | |
Strive for backward compatible changes | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
NWB is already incorporated in many tools - proposing a change that will make already released NWB datasets non-compliant will cause a lot of confusion and will lead to significant cost to update codes. | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
By using the :nwb_extension_git:`ndx-template` to create new extensions helps ensure | ||
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`. | |
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`. | |
Get the community involved | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
Try to reach out to colleagues working with the type of data you are trying to add support for. The more eyes you will get on your extension the better it will get. |
Motivation
Integrate best practices from this PR: NeurodataWithoutBorders/nwb-overview#72