From 902d4a86ef55cde55264a86421494d9b7094bc3e Mon Sep 17 00:00:00 2001 From: pablo-gar Date: Tue, 21 Feb 2023 21:15:40 -0800 Subject: [PATCH] docs: Clarifies text for Feature Dataset Presence Matrix, rename schema file (#211) * Clarifies 'Feature Dataset Presence Matrix' specification, removes version from schema file to reduce maintance cost of other docs referring to it * Remove white spaces --- .../comp_bio_query_data_and_metadata.ipynb | 7 ++----- ..._census_schema_0.1.0.md => cell_census_schema.md} | 12 +++++++----- 2 files changed, 9 insertions(+), 10 deletions(-) rename docs/{cell_census_schema_0.1.0.md => cell_census_schema.md} (98%) diff --git a/api/python/notebooks/analysis_demo/comp_bio_query_data_and_metadata.ipynb b/api/python/notebooks/analysis_demo/comp_bio_query_data_and_metadata.ipynb index 5c306832c..32d315dbe 100644 --- a/api/python/notebooks/analysis_demo/comp_bio_query_data_and_metadata.ipynb +++ b/api/python/notebooks/analysis_demo/comp_bio_query_data_and_metadata.ipynb @@ -123,7 +123,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Cell Census schema](https://github.com/chanzuckerberg/cell-census/blob/main/docs/cell_census_schema_0.0.1.md#cell-metadata--census_objcensus_dataorganismobs--somadataframe).\n", + "`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Cell Census schema](https://github.com/chanzuckerberg/cell-census/blob/main/docs/cell_census_schema.md#cell-metadata--census_objcensus_dataorganismobs--somadataframe).\n", "\n", "All of these can be used to fetch specific columns or specific rows matching a condition. For the latter we need to know the values we are looking for _a priori_.\n", "\n", @@ -716,7 +716,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -759,7 +758,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -1151,7 +1149,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -1186,7 +1183,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.9" }, "vscode": { "interpreter": { diff --git a/docs/cell_census_schema_0.1.0.md b/docs/cell_census_schema.md similarity index 98% rename from docs/cell_census_schema_0.1.0.md rename to docs/cell_census_schema.md index 4465f0f15..80e27443d 100644 --- a/docs/cell_census_schema_0.1.0.md +++ b/docs/cell_census_schema.md @@ -1,8 +1,8 @@ # CELLxGENE Cell Census Schema -**Version**: 0.1.0. +**Version**: 0.1.1. -**Last edited**: Jan, 2023. +**Last edited**: Feb, 2023. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED" "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://tools.ietf.org/html/bcp14), [RFC2119](https://www.rfc-editor.org/rfc/rfc2119.txt), and [RFC8174](https://www.rfc-editor.org/rfc/rfc8174.txt) when, and only when, they appear in all capitals, as shown here. @@ -711,9 +711,9 @@ The following columns MUST be included: #### Feature dataset presence matrix – `census_obj["census_data"][organism].ms["RNA"]["feature_dataset_presence_matrix"]` – `SOMASparseNDArray` -In some datasets, there are features not included in the source data. To clarify the difference between features that were not included and features that were not measured, the Cell Census MUST include a presence matrix encoded as a `SOMASparseNDArray`. +In some datasets, there are features not included in the source data. To clarify the difference between features that were not included and features that were not measured, for each `SOMAExperiment` the Cell Census MUST include a presence matrix encoded as a `SOMASparseNDArray`. -For all features included in the Cell Census, the dataset presence matrix MUST indicate what features are included in each dataset of the Cell Census. This information MUST be encoded as a boolean matrix, `True` indicates the feature was included in the dataset, `False` otherwise. This is a two-dimensional matrix and it MUST be `N x M` where `N` is the number of datasets and `M` is the number of features. The matrix is indexed by the `soma_joinid` value of `census_obj["census_info"]["datasets"]` and `census_obj["census_data"][organism].ms["RNA"].var`. +For all features included in the Cell Census, the dataset presence matrix MUST indicate what features are included in each dataset of the Cell Census. This information MUST be encoded as a boolean matrix, `True` indicates the feature was included in the dataset, `False` otherwise. This is a two-dimensional matrix and it MUST be `N x M` where `N` is the number of datasets in the `SOMAExperiment` and `M` is the number of features. The matrix is indexed by the `soma_joinid` value of `census_obj["census_info"]["datasets"]` and `census_obj["census_data"][organism].ms["RNA"].var`. If the feature has at least one cell with a value greater than zero in the count data matrix X in the dataset of origin, the value MUST be `True`; otherwise, it MUST be `False`. @@ -834,9 +834,11 @@ Cell metadata MUST be encoded as a `SOMADataFrame` with the following columns: - ## Changelog +### Version 0.1.1 +* Adds clarifying text for "Feature Dataset Presence Matrix" + ### Version 0.1.0 * The "Dataset Presence Matrix" was renamed to "Feature Dataset Presence Matrix" and moved from `census_obj["census_data"][organism].ms["RNA"].varp["dataset_presence_matrix"]` to `census_obj["census_data"][organism].ms["RNA"]["feature_dataset_presence_matrix"]`. * Editorial: changes all double quotes in the schema to ASCII quotes 0x22.