docs: Clarifies text for Feature Dataset Presence Matrix, rename sche…

…ma file (#211) * Clarifies 'Feature Dataset Presence Matrix' specification, removes version from schema file to reduce maintance cost of other docs referring to it * Remove white spaces
chanzuckerberg · Feb 22, 2023 · 902d4a8 · 902d4a8
1 parent 61b655d
commit 902d4a8
Show file tree

Hide file tree

Showing 2 changed files with 9 additions and 10 deletions.
diff --git a/api/python/notebooks/analysis_demo/comp_bio_query_data_and_metadata.ipynb b/api/python/notebooks/analysis_demo/comp_bio_query_data_and_metadata.ipynb
@@ -123,7 +123,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Cell Census schema](https://github.com/chanzuckerberg/cell-census/blob/main/docs/cell_census_schema_0.0.1.md#cell-metadata--census_objcensus_dataorganismobs--somadataframe).\n",
+    "`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Cell Census schema](https://github.com/chanzuckerberg/cell-census/blob/main/docs/cell_census_schema.md#cell-metadata--census_objcensus_dataorganismobs--somadataframe).\n",
     "\n",
     "All of these can be used to fetch specific columns or specific rows matching a condition. For the latter we need to know the values we are looking for _a priori_.\n",
     "\n",
@@ -716,7 +716,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -759,7 +758,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -1151,7 +1149,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -1186,7 +1183,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.10.9"
   },
   "vscode": {
    "interpreter": {

diff --git a/docs/cell_census_schema_0.1.0.md → docs/cell_census_schema.md b/docs/cell_census_schema_0.1.0.md → docs/cell_census_schema.md
@@ -1,8 +1,8 @@
 # CELLxGENE Cell Census Schema 
 
-**Version**: 0.1.0.
+**Version**: 0.1.1.
 
-**Last edited**: Jan, 2023.
+**Last edited**: Feb, 2023.
 
 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED" "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://tools.ietf.org/html/bcp14), [RFC2119](https://www.rfc-editor.org/rfc/rfc2119.txt), and [RFC8174](https://www.rfc-editor.org/rfc/rfc8174.txt) when, and only when, they appear in all capitals, as shown here.
 
@@ -711,9 +711,9 @@ The following columns MUST be included:
 
 #### Feature dataset presence matrix – `census_obj["census_data"][organism].ms["RNA"]["feature_dataset_presence_matrix"]` – `SOMASparseNDArray`
 
-In some datasets, there are features not included in the source data. To clarify the difference between features that were not included and features that were not measured, the Cell Census MUST include a presence matrix encoded as a `SOMASparseNDArray`.
+In some datasets, there are features not included in the source data. To clarify the difference between features that were not included and features that were not measured, for each `SOMAExperiment` the Cell Census MUST include a presence matrix encoded as a `SOMASparseNDArray`.
 
-For all features included in the Cell Census, the dataset presence matrix MUST indicate what features are included in each dataset of the Cell Census. This information MUST be encoded as a boolean matrix, `True` indicates the feature was included in the dataset, `False` otherwise. This is a two-dimensional matrix and it MUST be `N x M` where `N` is the number of datasets and `M` is the number of features. The matrix is indexed by the `soma_joinid` value of  `census_obj["census_info"]["datasets"]` and `census_obj["census_data"][organism].ms["RNA"].var`.
+For all features included in the Cell Census, the dataset presence matrix MUST indicate what features are included in each dataset of the Cell Census. This information MUST be encoded as a boolean matrix, `True` indicates the feature was included in the dataset, `False` otherwise. This is a two-dimensional matrix and it MUST be `N x M` where `N` is the number of datasets in the `SOMAExperiment` and `M` is the number of features. The matrix is indexed by the `soma_joinid` value of  `census_obj["census_info"]["datasets"]` and `census_obj["census_data"][organism].ms["RNA"].var`.
 
 If the feature has at least one cell with a value greater than zero in the count data matrix X in the dataset of origin, the value MUST be `True`; otherwise, it MUST be `False`.
 
@@ -834,9 +834,11 @@ Cell metadata MUST be encoded as a `SOMADataFrame` with the following columns:
 </table>
 
 
-
 ## Changelog
 
+### Version 0.1.1
+* Adds clarifying text for "Feature Dataset Presence Matrix"
+
 ### Version 0.1.0
 * The "Dataset Presence Matrix" was renamed to "Feature Dataset Presence Matrix" and moved from  `census_obj["census_data"][organism].ms["RNA"].varp["dataset_presence_matrix"]`  to `census_obj["census_data"][organism].ms["RNA"]["feature_dataset_presence_matrix"]`.
 * Editorial: changes all double quotes in the schema to ASCII quotes 0x22.