Skip to content

Commit

Permalink
Merge pull request #812 from neuropoly/bep031_sample_entity
Browse files Browse the repository at this point in the history
[ENH] BEP031 - New entity: sample and samples.tsv file
  • Loading branch information
effigies authored Jul 26, 2021
2 parents 275f771 + f02aff2 commit 1323f23
Show file tree
Hide file tree
Showing 4 changed files with 89 additions and 0 deletions.
7 changes: 7 additions & 0 deletions src/02-common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@ misunderstanding we clarify them here.
context, a session may also indicate a group of related scans,
taken in one or more visits.

1. **Sample** - a sample pertaining to a subject such as tissue, primary cell
or cell-free sample.
The `sample-<label>` key/value pair is used to distinguish between different
samples from the same subject.
The label MUST be unique per subject and is RECOMMENDED to be unique
throughout the dataset.

1. **Data acquisition** - a continuous uninterrupted block of time during which
a brain scanning instrument was acquiring data according to particular
scanning sequence/protocol.
Expand Down
66 changes: 66 additions & 0 deletions src/03-modality-agnostic-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,72 @@ to date of birth.
}
```

## Samples file

Template:

```Text
samples.tsv
samples.json
```

The purpose of this file is to describe properties of samples, indicated by the `sample` entity.
This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
If this file exists, it MUST contain the three following columns:

- `sample_id`: MUST consist of `sample-<label>` values identifying one row
for each sample

- `participant_id`: MUST consist of `sub-<label>`

- `sample_type`: MUST consist of sample type values, either `cell line`, `in vitro differentiated cells`,
`primary cell`, `cell-free sample`, `cloning host`, `tissue`, `whole organisms`, `organoid` or
`technical sample` from [ENCODE Biosample Type](https://www.encodeproject.org/profiles/biosample_type)

Other optional columns MAY be used to describe the samples.
Each sample MUST be described by one and only one row.

Commonly used *optional* columns in `samples.tsv` files are `pathology` and
`derived_from`. We RECOMMEND to make use of these columns, and in case that
you do use them, we RECOMMEND to use the following values for them:

- `pathology`: string value describing the pathology of the sample or type of control.
When different from `healthy`, pathology SHOULD be specified in `samples.tsv`.
The pathology MAY instead be specified in [Sessions files](06-longitudinal-and-multi-site-studies.md#sessions-file)
in case it changes over time.

- `derived_from`: `sample-<label>` key/value pair from which a sample is derived from,
for example a slice of tissue (`sample-02`) derived from a block of tissue (`sample-01`),
as illustrated in the example below.

`samples.tsv` example:

```Text
sample_id participant_id sample_type derived_from
sample-01 sub-01 tissue n/a
sample-02 sub-01 tissue sample-01
sample-03 sub-01 tissue sample-01
sample-04 sub-02 tissue n/a
sample-05 sub-02 tissue n/a
```

It is RECOMMENDED to accompany each `samples.tsv` file with a sidecar
`samples.json` file to describe the TSV column names and properties of their values
(see also the [section on tabular files](02-common-principles.md#tabular-files)).

`samples.json` example:

```JSON
{
"sample_type": {
"Description": "type of sample from ENCODE Biosample Type (https://www.encodeproject.org/profiles/biosample_type)",
},
"derived_from": {
"Description": "sample_id from which the sample is derived"
}
}
```

## Phenotypic and assessment data

Template:
Expand Down
11 changes: 11 additions & 0 deletions src/schema/entities.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,17 @@ session:
(for example, training).
type: string
format: label
sample:
name: Sample
entity: sample
description: |
A sample pertaining to a subject such as tissue, primary cell
or cell-free sample.
The `sample-<label>` key/value pair is used to distinguish between different
samples from the same subject.
The label MUST be unique per subject and is RECOMMENDED to be unique
throughout the dataset.
format: label
task:
name: Task
entity: task
Expand Down
5 changes: 5 additions & 0 deletions src/schema/top_level_files.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,8 @@ participants:
extensions:
- .tsv
- .json
samples:
required: false
extensions:
- .tsv
- .json

0 comments on commit 1323f23

Please sign in to comment.