[WIP] [ENH] Provenance BEP028 #439

remiadon · 2020-03-25T10:09:21Z

This PR has been replaced by #487

With this PR, we would like to introduce a new BEP “BIDS-Provenance” to record provenance information in BIDS, i.e how the data were generated and processed, going all the way from raw data to results rendering. The proposed model is built as an extension of the W3C PROV model.

Discussions on this BEP were initially started in Google document with @satra, @jbpoline, @yarikoptic, @remiadon, @cmaumet. We would now like to make this effort an official BIDS Extension Proposal (BEP) and continue building this model with the BIDS community. This effort would be co-moderated by @satra and @cmaumet.

@steering : could you let us know what are the next steps to make this an official BEP? We are happy to answer any questions you may have.

remiadon · 2020-03-27T14:05:51Z

@cmaumet the linkchecker is returning some errors
I thought you have more experience with this kind of CI tool
Can you tell what this is about ? I don't think it's a big deal

sappelhoff · 2020-03-27T14:33:48Z

src/03-modality-agnostic-files.md

+
+Possible places to encode provenance
+
+**Dataset level provenance.** At the dataset level, provenance could be about the dataset itself, or about any entity in the dataset. This provenance may evolve as new data are added, which may include sourcedata, BIDS data, and BIDS derived data. One option is to make use of <code>[https://w3c.github.io/json-ld-syntax/#named-graphs](https://w3c.github.io/json-ld-syntax/#named-graphs)</code>


Why do you want to put your [https://w3c.github.io/json-ld-syntax/#named-graphs] link into a code block? If you are certain that that's necessary, you can use the backtick sign: `

once at the beginning and once at the end for inline code: like so

assuming you just want the link DESCRIPTION to be formatted like code, you need to put the backticks inside the square brackets: https://w3c.github.io/json-ld-syntax/#named-graphs

[ + backtick + description + backtick + ] + ( + url + )

see: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#code

interestingly, this works in the "preview" of the comment I just posted. But looking at the rendering now, it seems that the link description wrapped by backticks does NOT look like a codeblock ... meh 🤷‍♂️

pic of the preview:

@sappelhoff OK you are right, fixing this

mh okay, this didn't fix linkchecker. Seems more like a linkchecker issue ... your link works 🤷‍♂️

discussed with @cmaumet , i'll check by only keeping the link between the brackets, like :

[named-graphs](https://w3c.github.io/json-ld-syntax/#named-graphs)

remiadon · 2020-03-30T14:32:03Z

just referencing the related issues here

remiadon · 2020-03-30T14:36:03Z

and Pull Requests

[ENH] introduce GeneratedBy to "core" BIDS #440

remiadon · 2020-04-06T14:27:35Z

@sappelhoff the linkchecker you are using seems to be dead : when I look into circle-ci logs I find a pointer to https://github.com/wummel/linkchecker/issues
Going into the issues you can find this one, which indicates a new repo has been created.

Are you sure the linkchecker is up-to-date ?

sappelhoff · 2020-04-06T14:46:41Z

Are you sure the linkchecker is up-to-date ?

I remember that it was a bit of a pain to get it in back then in #293

but we are also not using the wummel/linkchecker directly, I think. See:https://github.com/yarikoptic/linkchecker

Hopefully @yarikoptic can help you. He implemented the linkchecker for our repo.

yarikoptic · 2020-04-06T17:09:03Z

yes, unfortunately we had to resort to my patched version.
this particular problem is that the original .html indeed doesn't have that anchor:

$> wget -O jsonld.html 'https://w3c.github.io/json-ld-syntax/#named-graphs'
--2020-04-06 13:04:25--  https://w3c.github.io/json-ld-syntax/
Resolving w3c.github.io (w3c.github.io)... 185.199.109.153, 185.199.111.153, 185.199.110.153, ...
Connecting to w3c.github.io (w3c.github.io)|185.199.109.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 571708 (558K) [text/html]
Saving to: ‘jsonld.html’

jsonld.html                  100%[===========================================>] 558.31K  --.-KB/s    in 0.1s    

2020-04-06 13:04:25 (4.12 MB/s) - ‘jsonld.html’ saved [571708/571708]

$> grep named-graphs jsonld.html 
        <li class="changed"><a class="sectionRef" href="#named-graphs"></a>,</li>
        This keyword is described in <a class="sectionRef" href="#named-graphs"></a>.</dd>
      See <a class="sectionRef" href="#named-graphs"></a> for more information,
<p>See <a href="#named-graphs" class="sectionRef"></a> for other uses of indexing in JSON-LD.</p>
    contained within the same <a>map</a>, a feature discussed further in <a href="#named-graphs" class="sectionRef"></a>.</p>
      <a href="#named-graphs" class="sectionRef"></a> for more details.</p>
      See <a class="sectionRef" href="#named-graphs"></a> for further discussion on
      See <a class="sectionRef" href="#named-graphs"></a> for further discussion on
      <p>See <a class="sectionRef" href="#named-graphs"></a>.</p>

$> grep named-graphs jsonld.html | grep -v 'href="#named-gr'

so either it is some java-script magic or some other goodness which would not work for linkchecker (since it has no JS runtime support AFAIK). Thus either some more conventional permalink could be found, or disable linkchecker altogether

yarikoptic · 2020-04-06T17:21:49Z

FWIW

yes, unfortunately we had to resort to my patched version.
this particular problem is that the original .html indeed doesn't have that anchor:

$> wget -O jsonld.html 'https://w3c.github.io/json-ld-syntax/#named-graphs'
--2020-04-06 13:04:25--  https://w3c.github.io/json-ld-syntax/
Resolving w3c.github.io (w3c.github.io)... 185.199.109.153, 185.199.111.153, 185.199.110.153, ...
Connecting to w3c.github.io (w3c.github.io)|185.199.109.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 571708 (558K) [text/html]
Saving to: ‘jsonld.html’

jsonld.html                  100%[===========================================>] 558.31K  --.-KB/s    in 0.1s    

2020-04-06 13:04:25 (4.12 MB/s) - ‘jsonld.html’ saved [571708/571708]

$> grep named-graphs jsonld.html 
        <li class="changed"><a class="sectionRef" href="#named-graphs"></a>,</li>
        This keyword is described in <a class="sectionRef" href="#named-graphs"></a>.</dd>
      See <a class="sectionRef" href="#named-graphs"></a> for more information,
<p>See <a href="#named-graphs" class="sectionRef"></a> for other uses of indexing in JSON-LD.</p>
    contained within the same <a>map</a>, a feature discussed further in <a href="#named-graphs" class="sectionRef"></a>.</p>
      <a href="#named-graphs" class="sectionRef"></a> for more details.</p>
      See <a class="sectionRef" href="#named-graphs"></a> for further discussion on
      See <a class="sectionRef" href="#named-graphs"></a> for further discussion on
      <p>See <a class="sectionRef" href="#named-graphs"></a>.</p>

$> grep named-graphs jsonld.html | grep -v 'href="#named-gr'

so either it is some java-script magic or some other goodness which would not work for linkchecker (since it has no JS runtime support AFAIK). Thus either some more conventional permalink could be found, or disable linkchecker altogether

PS FWIW -- asked the origin: w3c/json-ld-syntax#343

This reverts commit fb87411. According to w3c/json-ld-syntax#343 (comment) references should point to final published versions on https://www.w3.org/TR/json-ld11/

yarikoptic · 2020-04-20T19:09:33Z

pushed 4a06044 which brings "correct" URL for named-graphs back.

sappelhoff

some formatting comments

sappelhoff · 2020-04-21T08:50:03Z

src/03-modality-agnostic-files.md

+* Docs to Markdown version 1.0β20
+* Tue Mar 24 2020 09:07:42 GMT-0700 (PDT)
+* Source doc: BIDS Extension Proposal XX (BEP0XX): Provenance
+----->


lines 220 till 235 can be removed (some tool output)

sappelhoff · 2020-04-21T08:51:34Z

src/03-modality-agnostic-files.md

+###  Available under the CC-BY 4.0 International license
+
+Extension moderator/lead: Satra Ghosh &lt;[[email protected]](mailto:[email protected])> Camille Maumet &lt;[email protected]>
+


lines 237 until 243 would need to be removed as well if this is a proposal to be directly integrated into the specification

sappelhoff · 2020-04-21T08:52:51Z

src/03-modality-agnostic-files.md

+This specification is an extension of BIDS, and general principles are shared. The specification should work for many different settings and facilitate the integration with other imaging methods.
+
+To see the original BIDS specification, see this link. This document inherits all components of the original specification (e.g. how to store imaging data, events, stimuli and behavioral data), and should be seen as an extension of it, not a replacement.
+```


same for lines 244 until 250 (can be deleted): This is usually a disclaimer that we use in BEPs. it should not be part of what we put into our specification directly

sappelhoff · 2020-04-21T08:54:06Z

src/03-modality-agnostic-files.md

+
+ii. Provenance records MUST use the [PROV model](https://www.w3.org/TR/prov-o/) ontology and SHOULD be augmented by terms curated in the BIDS specification, the [NIDM](http://nidm.nidash.org/) model, and future enhancements to these models.
+
+iii. If provenance records are included, these records of provenance of a dataset or a file MUST be described using a `[&lt;prefix>_]prov.jsonld` file. Since these [jsonld](https://json-ld.org/) documents are graph objects, they can be aggregated without the need to apply any inheritance principle. 


looks like this needs to be fixed: [<prefix>_]prov.jsonld

sappelhoff · 2020-04-21T08:54:32Z

src/03-modality-agnostic-files.md

+
+Example context: Common
+
+[https://some/url/to/bids_common_context.jsonld]()


Suggested change

[https://some/url/to/bids_common_context.jsonld]()

[https://some/url/to/bids_common_context.jsonld](https://some/url/to/bids_common_context.jsonld)

sappelhoff · 2020-04-21T08:54:41Z

src/03-modality-agnostic-files.md

+
+Example context: Provenance
+
+[https://some/url/to/bids_provenance_context.jsonld]()


Suggested change

[https://some/url/to/bids_provenance_context.jsonld]()

[https://some/url/to/bids_provenance_context.jsonld](https://some/url/to/bids_provenance_context.jsonld)

cmaumet · 2020-04-21T13:21:27Z

@sappelhoff: Thanks for your review! We'll look into this as soon as possible. Do you have more info on the process to make this an official BEP? Thank you!

sappelhoff · 2020-04-21T13:51:45Z

@sappelhoff: Thanks for your review! We'll look into this as soon as possible. Do you have more info on the process to make this an official BEP? Thank you!

@cmaumet @remiadon usually the process goes a bit like this:

BEP process

open an issue on GitHub announcing your intent and goals (with reasoning, why it's necessary)
get people on board, decide for BEP leads (i.e., moderators)
get to work on a Google Doc, engaging with the community
towards finalizing the BEP, convert from Google Doc to markdown and open a pull request to the specification
implement necessary changes for the bids-validator and potentially add an example on bids-examples
merge everything and celebrate 🙂

At some point (between step 2 and 3), we make the BEP official by adding a number (e.g. BEP006 ...) and making an entry on our list of active BEPs

In the past, this step of "making a BEP official" was done by @chrisgorgo / a BIDS maintainer. But now we have our @bids-standard/steering group, which will review a BEP and make it official (or request changes). See the "Draft BEP review" point in the BIDS governance.

To me it seems like you are already at the "final" stage of your BEP (at least you are converting to markdown and opening a PR), and the issue is that your BEP has not yet been made official? Perhaps it'd be good if you get the BEP (perhaps even this PR) into a presentable state for the @bids-standard/steering group and then ask for an official review.

This is also interesting for two more BEPs that will soon want to get an "official" status.

NIRS BEP030: Extend BIDS to add near-infrared spectroscopy (NIRS) #438
- @rob-luke @lpollonini et al.
MOTION BEP029: Extend BIDS to add motion capture and movement data #443
- @sjeung @JuliusWelzel @helenacockx et al.

See also:

BEP Lead Guidelines

cmaumet · 2020-04-21T14:03:12Z

Hi @sappelhoff Thanks a lot!

To me it seems like you are already at the "final" stage of your BEP (at least you are converting to markdown and opening a PR), and the issue is that your BEP has not yet been made official?

Although we did make a pull request, I would not say that our BEP is at the final stage yet. We would like to engage with the community, get feedback and improve it first.

At some point (between step 2 and 3), we make the BEP official by adding a number (e.g. BEP006 ...) and making an entry on our list of active BEPs

In the past, this step of "making a BEP official" was done by @chrisgorgo / a BIDS maintainer. But now we have our @bids-standard/steering group, which will review a BEP and make it official (or request changes). See the "Draft BEP review" point in the BIDS governance.

Yes, that's exactly what we are after now. It would be nice to be able to describe our effort as an official BEP (with a number, a link from the BIDS website, etc.) as we get more people onboard.

How can we ask @bids-standard/steering to review our BEP and make it official? (Is a ping on GitHub enough?)

Thank you!!

sappelhoff · 2020-04-21T14:10:08Z

Is a ping on GitHub enough?

probably yes, but to be sure we can ask @franklin-feingold to also put it on the agenda for the next steering group meeting. He also publishes the meeting notes of these steering group meetings regularly on the webpage news section

yarikoptic · 2020-04-21T15:08:50Z

I think it would be nice to

include in the description of this PR a list of TODO items which are yet to be done (or were already done and thus marked done)
create at least one demo dataset containing concrete example of provenance annotation. since it most likely would be created automatically (i.e. by a tool) it might be worth targeting at least one (or a few) BIDS Apps and enhance them with ability to provide such annotation.

cmaumet · 2020-04-21T16:09:39Z

Thanks @yarikoptic! I'd be in favour of creating a space for community discussions around the 'BIDS PROV' BEP (maybe regular calls, a GitHub repo for examples etc.). And to me both of your suggestion would happen in that space.

But first it would be nice to have the greenlight from BIDS steering to make this an official BEP, no?

(Note: We started looking into creating real life examples with @remiadon, this is work-in-progress and currently under my lab GitHub organization at https://github.com/Inria-Visages/BIDS-prov/tree/master/examples. This could be moved and discussed in the 'BIDS PROV' discussion space).

sappelhoff · 2020-05-08T09:23:39Z

This BEP is now official: BEP028, see: bids-standard/bids-website#123

effigies · 2020-05-27T18:22:23Z

src/03-modality-agnostic-files.md

+sub-01/ 
+    func/
+        sub-01_task-xyz_acq-test1_run-1_bold.nii.gz
+        sub-01_task-xyz_acq-test1_run-1_prov.jsonld


This would appear to apply equally well to sub-01_task-xyz_acq-test1_run-1_events.tsv. I would suggest that prov is not an appropriate suffix, and could either be made into an extension .prov or a double-extension .prov.jsonld.

@effigies - i agree. i'm waiting for a PR to this PR to get merged before making any additional changes.

satra · 2020-05-27T23:51:57Z

Closing in favor of #487. @sappelhoff @franklin-feingold - if we can update the BEP028 PR link that would be great.

franklin-feingold · 2020-05-28T00:03:51Z

hey @satra - BEP028 from the website directs to your google doc currently - doesn't appear the PR was linked

bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance capture/expression for BIDS datasets. This PR just follows the idea of bids-standard#371 (small atomic ENHs), and is based on current state of the specification where we have GeneratedBy to describe how a BIDS derivative dataset came to its existence. ## Rationale As I had previously stated in many (face-to-face when it was still possible ;)) conversations, in my view, any BIDS dataset is a derivative dataset. Even if it contains "raw" data, it is never given by gods, but is a result of some process (let's call it pipeline for consistency) which produced it out of some other data. That is why there is 1) `sourcedata/` to provide placement for such original (as "raw" in terms of processing, but "raw"er in terms of its relation to actual data acquired by equipment), and 2) `code/` to provide placement for scripts used to produce or "tune" the dataset. Typically "sourcedata" is either a collection of DICOMs or a collection of data in some other formats (e.g. nifti) which is then either converted or just renamed into BIDS layout. When encountering a new BIDS dataset ATM it requires forensics and/or data archaeology to discover how this BIDS dataset came about, to e.g. possibly figure out the source of the buggy (meta)data it contains. At the level of individual files, some tools already add ad-hoc fields during conversion into side car .json files they produce, <details> <summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary> ```shell (git-annex)lena:~/datalad/dbic/QA[master]git $> git grep ConversionSoftware | head -n 2 sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftware": "dcm2niix", sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0", ``` </details> ATM I need to add such metadata to datasets produced by heudiconv to make sure that in case of incremental conversions there is no switch in versions of the software.

Remi Adon added 2 commits March 25, 2020 10:08

[ADD] bids provencance proposal

bbe42c9

lint bids-prov markdown

35fccbb

yarikoptic mentioned this pull request Mar 25, 2020

[ENH] introduce GeneratedBy to "core" BIDS #440

Merged

3 tasks

satra changed the title ~~[WIP] [ENH] bids provencance proposal~~ [WIP] [ENH] bids provenance proposal Mar 25, 2020

remiadon mentioned this pull request Mar 27, 2020

Fork BIDS-specs and create a PR with the BIDS prov spec [1D] bids-standard/BEP028_BIDSprov#8

Closed

remiadon closed this Mar 27, 2020

remiadon reopened this Mar 27, 2020

remiadon marked this pull request as ready for review March 27, 2020 14:14

sappelhoff reviewed Mar 27, 2020

View reviewed changes

simplified uri in chapter 03

cd31b35

mardkown uri : try removing link in name

66bd405

This was referenced Apr 6, 2020

Updating description in BIDS-prov PR to BIDS-standard [1/2 day] bids-standard/BEP028_BIDSprov#11

Closed

Pass all tests in linkchecker into BIDS-prov PR to BIDS-standard [3h] bids-standard/BEP028_BIDSprov#12

Closed

yarikoptic mentioned this pull request Apr 6, 2020

FYI: anchors aren't in .html thus cannot be easily validated w3c/json-ld-syntax#343

Closed

[RM] link causing link checker to fail

fb87411

remiadon changed the title ~~[WIP] [ENH] bids provenance proposal~~ [WIP] [ENH] Provenance BEP Apr 20, 2020

Revert "[RM] link causing link checker to fail" with tuned up URL

4a06044

This reverts commit fb87411. According to w3c/json-ld-syntax#343 (comment) references should point to final published versions on https://www.w3.org/TR/json-ld11/

sappelhoff reviewed Apr 21, 2020

View reviewed changes

sappelhoff added the BEP label Apr 21, 2020

sappelhoff mentioned this pull request Apr 30, 2020

Add BEP numbers and links for NIRS and motion capture bids-standard/bids-website#120

Closed

sappelhoff changed the title ~~[WIP] [ENH] Provenance BEP~~ [WIP] [ENH] Provenance BEP028 May 8, 2020

yarikoptic mentioned this pull request May 27, 2020

[ENH] BEP 003: Common Derivatives #265

Merged

5 tasks

effigies reviewed May 27, 2020

View reviewed changes

satra mentioned this pull request May 27, 2020

BEP028 - Provenance #487

Closed

satra closed this May 27, 2020

cmaumet mentioned this pull request Jul 31, 2020

rewrite README and repo transfer to BIDS-spec bids-standard/BEP028_BIDSprov#21

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [ENH] Provenance BEP028 #439

[WIP] [ENH] Provenance BEP028 #439

remiadon commented Mar 25, 2020 •

edited by satra

Loading

remiadon commented Mar 27, 2020

sappelhoff Mar 27, 2020

sappelhoff Mar 27, 2020

remiadon Mar 27, 2020

sappelhoff Mar 27, 2020

remiadon Mar 30, 2020

remiadon commented Mar 30, 2020

remiadon commented Mar 30, 2020

remiadon commented Apr 6, 2020

sappelhoff commented Apr 6, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Apr 20, 2020

sappelhoff left a comment

sappelhoff Apr 21, 2020

sappelhoff Apr 21, 2020

sappelhoff Apr 21, 2020

sappelhoff Apr 21, 2020

sappelhoff Apr 21, 2020

sappelhoff Apr 21, 2020

cmaumet commented Apr 21, 2020

sappelhoff commented Apr 21, 2020

cmaumet commented Apr 21, 2020

sappelhoff commented Apr 21, 2020

yarikoptic commented Apr 21, 2020

cmaumet commented Apr 21, 2020 •

edited

Loading

sappelhoff commented May 8, 2020

effigies May 27, 2020

satra May 27, 2020

satra commented May 27, 2020

franklin-feingold commented May 28, 2020


		Possible places to encode provenance

		Dataset level provenance. At the dataset level, provenance could be about the dataset itself, or about any entity in the dataset. This provenance may evolve as new data are added, which may include sourcedata, BIDS data, and BIDS derived data. One option is to make use of <code>[https://w3c.github.io/json-ld-syntax/#named-graphs](https://w3c.github.io/json-ld-syntax/#named-graphs)</code>

		### Available under the CC-BY 4.0 International license

		Extension moderator/lead: Satra Ghosh <[[email protected]](mailto:[email protected])> Camille Maumet <[email protected]>


		ii. Provenance records MUST use the [PROV model](https://www.w3.org/TR/prov-o/) ontology and SHOULD be augmented by terms curated in the BIDS specification, the [NIDM](http://nidm.nidash.org/) model, and future enhancements to these models.

		iii. If provenance records are included, these records of provenance of a dataset or a file MUST be described using a `[<prefix>_]prov.jsonld` file. Since these [jsonld](https://json-ld.org/) documents are graph objects, they can be aggregated without the need to apply any inheritance principle.


		Example context: Common

		[https://some/url/to/bids_common_context.jsonld]()


		Example context: Provenance

		[https://some/url/to/bids_provenance_context.jsonld]()

[WIP] [ENH] Provenance BEP028 #439

[WIP] [ENH] Provenance BEP028 #439

Conversation

remiadon commented Mar 25, 2020 • edited by satra Loading

This PR has been replaced by #487

remiadon commented Mar 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

remiadon commented Mar 30, 2020

remiadon commented Mar 30, 2020

remiadon commented Apr 6, 2020

sappelhoff commented Apr 6, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Apr 20, 2020

sappelhoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmaumet commented Apr 21, 2020

sappelhoff commented Apr 21, 2020

BEP process

cmaumet commented Apr 21, 2020

sappelhoff commented Apr 21, 2020

yarikoptic commented Apr 21, 2020

cmaumet commented Apr 21, 2020 • edited Loading

sappelhoff commented May 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

satra commented May 27, 2020

franklin-feingold commented May 28, 2020

remiadon commented Mar 25, 2020 •

edited by satra

Loading

cmaumet commented Apr 21, 2020 •

edited

Loading