Saving Metadata #1039

matiasandina · 2022-09-13T12:30:03Z

matiasandina
Sep 13, 2022

Metadata are important information that accompany all experiments. I don't think this is the place to delve into this topic and assume we all understand why there is a need for saving metadata.

As a general rule, I have found that it's difficult for me to save metadata in bonsai, and choose to hard-code variables in my analysis code and refer to my handwritten notes with caution instead (e.g., channel map to go from column major to channels x time). Small changes like adding the frame number of one camera recording mess up the transformation and create spurious data.
Below are few examples of what I would want to save, but I assume the community has other variables of interest.

Reading and writing paths
Start time and end time of experiment
The ID(s) of the animals being recorded
The low and high cutoffs for band pass filtering
The number of the channels that were selected from RHD2000 node and their identities

Currently, some of this information is saved on the .bonsai itself. But that creates a few issues. First, in order to save it as it ran, you have to duplicate your sketch for new experiments. I understand this might be a design choice, but in my experience it becomes impractical to duplicate sketches many times, especially if you tweak a few values here and there (e.g., HSV thresholding). Second, as of today, the bonsai files are not trivial to parse programmatically to get the information out into another machine readable format (e.g., .yaml, .json, ...).

According to the last discussions of the gitter, this is a known issue and the team/users desire to make it better. People seem to externalize properties and save them (unclear how, but I am also partially doing this for some things using CsvWriter) and @glopesdev proposed using xpath queries to parse the information from the bonsai file.

I understand the difficulty to foresee all that users might want to save from an experiment and don't think that's the goal the development team should aim for. But because there's flexibility in the sketches, and because metadata changes the data in important ways, it would be desirable to have a record of this flexibility as a file that accompanies the recorded data.

glopesdev · 2022-09-13T13:41:23Z

glopesdev
Sep 13, 2022
Maintainer

@matiasandina Thank you for starting this conversation. This is indeed of course a critical point of any experiment, and something that we should all design well ahead of data collection.

What is the main difficulty of tackling this directly in bonsai itself is that bonsai is a general purpose programming language that can be used to create and run any kind of program or experiment. Therefore, it is much harder to know ahead of time what exactly even counts as metadata since there is a lot which is unknown regarding what is running in the workflow. Even parameters which might seem obvious such as acquisition sampling rate for ephys or cameras, become complicated if suddenly the camera is being triggered dynamically by some other hardware configured directly or indirectly by bonsai. Not to mention of course the complexities of formalizing and stabilizing metadata for rich experimental behavior tasks.

My feeling here is that this should be attacked two-fold:

At the level of the language, we can discuss what are existing (and future) ways to make it easier to record and log parameters of interest. A few possible options you mentioned already such as parsing the .bonsai file or logging parameters in CSV files are already widely used. It might be worth it to document these use cases for the benefit of the wider community such that anyone would know better how to use these options.
At the level of individual pipelines, if there are motifs or repeated data acquisition modules that several groups or labs are reusing, packages could be made which "formalize" the system, including logging data and metadata in a structured, well-defined way. Examples of this also make sense to have forthcoming.

Happy to hear other feedback on this.

1 reply

matiasandina Sep 21, 2022
Author

I agree with the points you wrote, this is a complex problem. I am interested in whatever "parsing the .bonsai options exist" but not sure where they might be available, is there an example documentation you can point to?

In terms of the language, would it be reasonable to toggle a "logging" version of bonsai? Something that stores a .log, yaml, or .json with the information of the run itself. "On this datetime, experiment.bonsai ran with the following machine readable parameters. Bonsai was (stopped by user/crashed/self-stopped) at this datetime". I think this would go a long way to centralize the parsing efforts into a format that is easy to parse (e.g., yaml) and move them away from the .bonsai itself, while creating a snapshot of what happened at data collection. The .bonsai file might change between experiments, sometimes multiple times a day, so the reliable way is to save whatever happened when using it to collect the data. Having a logger would also reduce the burden of individuals having to externalize properties and creating multiple csv files to store things that would better go into an experiment.yaml file.

Right now, both static and dynamic parameters face this issue. If dynamic parameters are hard to capture because of their very nature, it may be a good start to save the static ones. I have no knowledge to gauge how difficult it is to make this type of logger.

I think the point you make about workflows is a good one. In my use case, I think many people would have a version of RHD2000 -> SelectChannels -> MatrixWriter, Because the data is saved in ColumnMajor, information such as the channel map and the expected number of samples are key to reconstitute the data. What are good ways to promote sharing of workflows that might be reused (and save metadata properly) without limiting Bonsai's inherent flexibility?

It would be a great benefit if bonsai could generate folder structures/file namings that play nicely with standards, for example BIDS (Not trying to push a standard, this is just as an example.):

https://bids-standard.github.io/bids-starter-kit/folders_and_files/folders.html

ablot · 2022-09-21T11:13:13Z

ablot
Sep 21, 2022

To avoid to create multiple copies of my workflows, I use the simple extension from Bruno here to save the workflows when they start: #1002

It's still not ideal as it does not save dynamically updated parameters, but at least keeps most relevant metadata and has the advantage of being very simple.

2 replies

matiasandina Sep 21, 2022
Author

Sorry but isn't this the same as copying the bonsai file? The script does it programmatically, but do you get multiple copies of it?

ablot Sep 22, 2022

Almost the same. The only difference is that the file that I run is always the same, and stays in the same folder. The other is a static archive that I save with the data. That avoided me lot of confusion (of running the file from the wrong folder, or editing one file instead of the other).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bonsai

Saving Metadata #1039

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Bonsai

Saving Metadata #1039

matiasandina Sep 13, 2022

Replies: 2 comments · 3 replies

glopesdev Sep 13, 2022 Maintainer

matiasandina Sep 21, 2022 Author

ablot Sep 21, 2022

matiasandina Sep 21, 2022 Author

ablot Sep 22, 2022

matiasandina
Sep 13, 2022

Replies: 2 comments 3 replies

glopesdev
Sep 13, 2022
Maintainer

matiasandina Sep 21, 2022
Author

ablot
Sep 21, 2022

matiasandina Sep 21, 2022
Author