Skip to content

Commit

Permalink
docs: Clarifies text for Feature Dataset Presence Matrix, rename sche…
Browse files Browse the repository at this point in the history
…ma file (#211)

* Clarifies 'Feature Dataset Presence Matrix' specification, removes version from schema file to reduce maintance cost of other docs referring to it

* Remove white spaces
  • Loading branch information
pablo-gar authored Feb 22, 2023
1 parent 61b655d commit 902d4a8
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Cell Census schema](https://github.com/chanzuckerberg/cell-census/blob/main/docs/cell_census_schema_0.0.1.md#cell-metadata--census_objcensus_dataorganismobs--somadataframe).\n",
"`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Cell Census schema](https://github.com/chanzuckerberg/cell-census/blob/main/docs/cell_census_schema.md#cell-metadata--census_objcensus_dataorganismobs--somadataframe).\n",
"\n",
"All of these can be used to fetch specific columns or specific rows matching a condition. For the latter we need to know the values we are looking for _a priori_.\n",
"\n",
Expand Down Expand Up @@ -716,7 +716,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -759,7 +758,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -1151,7 +1149,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -1186,7 +1183,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.9"
},
"vscode": {
"interpreter": {
Expand Down
12 changes: 7 additions & 5 deletions docs/cell_census_schema_0.1.0.md → docs/cell_census_schema.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# CELLxGENE Cell Census Schema

**Version**: 0.1.0.
**Version**: 0.1.1.

**Last edited**: Jan, 2023.
**Last edited**: Feb, 2023.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED" "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://tools.ietf.org/html/bcp14), [RFC2119](https://www.rfc-editor.org/rfc/rfc2119.txt), and [RFC8174](https://www.rfc-editor.org/rfc/rfc8174.txt) when, and only when, they appear in all capitals, as shown here.

Expand Down Expand Up @@ -711,9 +711,9 @@ The following columns MUST be included:

#### Feature dataset presence matrix – `census_obj["census_data"][organism].ms["RNA"]["feature_dataset_presence_matrix"]``SOMASparseNDArray`

In some datasets, there are features not included in the source data. To clarify the difference between features that were not included and features that were not measured, the Cell Census MUST include a presence matrix encoded as a `SOMASparseNDArray`.
In some datasets, there are features not included in the source data. To clarify the difference between features that were not included and features that were not measured, for each `SOMAExperiment` the Cell Census MUST include a presence matrix encoded as a `SOMASparseNDArray`.

For all features included in the Cell Census, the dataset presence matrix MUST indicate what features are included in each dataset of the Cell Census. This information MUST be encoded as a boolean matrix, `True` indicates the feature was included in the dataset, `False` otherwise. This is a two-dimensional matrix and it MUST be `N x M` where `N` is the number of datasets and `M` is the number of features. The matrix is indexed by the `soma_joinid` value of `census_obj["census_info"]["datasets"]` and `census_obj["census_data"][organism].ms["RNA"].var`.
For all features included in the Cell Census, the dataset presence matrix MUST indicate what features are included in each dataset of the Cell Census. This information MUST be encoded as a boolean matrix, `True` indicates the feature was included in the dataset, `False` otherwise. This is a two-dimensional matrix and it MUST be `N x M` where `N` is the number of datasets in the `SOMAExperiment` and `M` is the number of features. The matrix is indexed by the `soma_joinid` value of `census_obj["census_info"]["datasets"]` and `census_obj["census_data"][organism].ms["RNA"].var`.

If the feature has at least one cell with a value greater than zero in the count data matrix X in the dataset of origin, the value MUST be `True`; otherwise, it MUST be `False`.

Expand Down Expand Up @@ -834,9 +834,11 @@ Cell metadata MUST be encoded as a `SOMADataFrame` with the following columns:
</table>



## Changelog

### Version 0.1.1
* Adds clarifying text for "Feature Dataset Presence Matrix"

### Version 0.1.0
* The "Dataset Presence Matrix" was renamed to "Feature Dataset Presence Matrix" and moved from `census_obj["census_data"][organism].ms["RNA"].varp["dataset_presence_matrix"]` to `census_obj["census_data"][organism].ms["RNA"]["feature_dataset_presence_matrix"]`.
* Editorial: changes all double quotes in the schema to ASCII quotes 0x22.
Expand Down

0 comments on commit 902d4a8

Please sign in to comment.