Audit notebook 4 #377

pablo-gar · 2023-04-13T00:32:33Z

Completes work for #360

Removes notebook census_rank_gene_groups.ipynb as it is trivial and doesn't add significant value.

codecov · 2023-04-13T00:40:55Z

Codecov Report

❗ No coverage uploaded for pull request base (main@281ffdf). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #377   +/-   ##
=======================================
  Coverage        ?   88.33%           
=======================================
  Files           ?       50           
  Lines           ?     2727           
  Branches        ?        0           
=======================================
  Hits            ?     2409           
  Misses          ?      318           
  Partials        ?        0

Flag	Coverage Δ
unittests	`88.33% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

api/python/notebooks/api_demo/census_compute_over_X.ipynb

api/python/notebooks/api_demo/census_dataset_presence.ipynb

ebezzi

A few comments but LGTM otherwise.

ebezzi · 2023-04-14T11:59:26Z

api/python/notebooks/api_demo/census_dataset_presence.ipynb

    "\n",
-    "*Goal:* look up all datasets that have a feature_id present."
+    "Similarly we can check what datasets measured a gene or set of genes."


This doesn't seem very clear to me.

ebezzi · 2023-04-14T12:00:43Z

api/python/notebooks/api_demo/census_datasets.ipynb

    "\n",
-    "The \"locator\" returned by this API will include a `uri` and additional information that may be necessary to use the URI (eg, the S3 region).\n",
+    "You can download the original H5AD file for any given dataset. This is the same H5AD you can download from the CELLxGENE Portal, and may contain additional data-submitter provided information which was not included in the Census.\n",


from CELLxGENE Discover? Otherwise portal should be lowercase IMHO.

atolopko-czi

I found one unfinished sentence and there may be cells that are lacking output (though I may be confused by the complicated diff of the notebook changes). Other than that, just a number of minor corrections and suggestions.

atolopko-czi · 2023-04-14T15:07:53Z

api/python/notebooks/analysis_demo/comp_bio_explore_and_load_lung_data.ipynb

    "\n",
    "## Learning about the lung data in the Census\n",
    "\n",
-    "First we open the Census, if you are not familiar with the basics of Census API you should take a look at notebook \"Learning about the CELLxGENE Census\" at `comp_bio_census_info.ipynb`.\n"
+    "First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"


Suggested change

"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"

"First, we will open the Census. If you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"

atolopko-czi · 2023-04-14T15:08:19Z

api/python/notebooks/analysis_demo/comp_bio_explore_and_load_lung_data.ipynb

@@ -2190,7 +2185,7 @@
    "  - Mostly data from cells (\\~80%) rather than nucleus (\\~20%)\n",
    "- A total of **~12k** genes were measured across all cells.\n",
    "\n",
-    "## Fetching a sample of all human lung data from the Census.\n",
+    "##  Fetching all single-cell human lung data from the Census.\n",


Suggested change

"## Fetching all single-cell human lung data from the Census.\n",

"## Fetching all single-cell human lung data from the Census.\n",

atolopko-czi · 2023-04-14T15:08:49Z

api/python/notebooks/analysis_demo/comp_bio_explore_and_load_lung_data.ipynb

@@ -2304,7 +2297,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### QC metrics on gene expression of all Lung data\n",
+    "## Calculating QC metrics of the lung data.\n",


Suggested change

"## Calculating QC metrics of the lung data.\n",

"## Calculating QC metrics of the lung data\n",

atolopko-czi · 2023-04-14T15:10:17Z

api/python/notebooks/api_demo/census_compute_over_X.ipynb

    "\n",
-    "This notebook computes a variety of per-gene and per-cell statistics for a user-defined query.\n",
+    "*NOTE*: when query results are small, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",


Suggested change

"*NOTE*: when query results are small, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",

"*NOTE*: when query results are small enough to fit in memory, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",

atolopko-czi · 2023-04-14T15:12:54Z

api/python/notebooks/api_demo/census_dataset_presence.ipynb

@@ -4,11 +4,27 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Census datasets presence\n",
+    "# Genes measured in each cell (presence matrix)\n",


Suggested change

"# Genes measured in each cell (presence matrix)\n",

"# Genes measured in each cell (dataset presence matrix)\n",

atolopko-czi · 2023-04-14T15:24:54Z

api/python/notebooks/api_demo/census_datasets.ipynb

+    "## Fetching the datasets table\n",
+    "\n",
+    "\n",
+    "Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrane`."


Suggested change

"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrane`."

"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrame`."

atolopko-czi · 2023-04-14T15:31:23Z

api/python/notebooks/api_demo/census_datasets.ipynb

@@ -535,13 +542,24 @@
    }
   ],
   "source": [
+    "# Option 1: Get location\n",


Worth also showing a request.get call or a shell command using curl? Minimally, add a comment like # Download the object using a command of your choice, like curl.

atolopko-czi · 2023-04-14T15:31:51Z

api/python/notebooks/api_demo/census_datasets.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Option 2: Direct download\n",


Nit: make this option 1, since it's easier and is the Census Way. :)

atolopko-czi · 2023-04-14T15:34:53Z

api/python/notebooks/api_demo/census_query_extract.ipynb

+    "- `column_names` — list of strings indicating what metadata columns to fetch. \n",
+    "- `value_filter` — Python expression with selection conditions to fetch rows, it is similar to [`pandas.DataFrame.query()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html), for full details see [`tiledb.QueryCondition`](https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#query-condition) shortly:\n",
+    "   - Expressions are one or more comparisons\n",
+    "   - Comparisons are one of column op value or column op column\n",


Suggested change

" - Comparisons are one of column op value or column op column\n",

" - Comparisons are one of `<column> <op> <value>` or `<column> <op> <column>`\n",

formatting might help readability

atolopko-czi · 2023-04-14T15:50:40Z

api/python/notebooks/api_demo/census_summary_cell_counts.ipynb

-    "This dataframe is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n",
+    "## Creating summary counts beyond pre-calculated values.\n",
+    "\n",
+    "The dataframe above is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n",


Suggested change

"The dataframe above is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n",

"The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.\n",

Co-authored-by: Emanuele Bezzi <[email protected]>

pablo-gar · 2023-04-18T01:13:10Z

Thanks @atolopko-czi and @ebezzi I've addressed your comments

atolopko-czi

LGTM!

Matching changes in #377

pablo-gar added 2 commits April 12, 2023 17:30

update notebooks

fa5eee4

rename notebooks, remove unnecessary notebook

93b1a29

pablo-gar requested review from atolopko-czi and ebezzi April 13, 2023 00:33

pablo-gar marked this pull request as ready for review April 13, 2023 00:33

lint fix

e79dab8

ebezzi changed the title ~~Pablo gar/audit notebook 4~~ Audit notebook 4 Apr 14, 2023

ebezzi reviewed Apr 14, 2023

View reviewed changes

api/python/notebooks/api_demo/census_compute_over_X.ipynb Outdated Show resolved Hide resolved

ebezzi reviewed Apr 14, 2023

View reviewed changes

api/python/notebooks/api_demo/census_dataset_presence.ipynb Outdated Show resolved Hide resolved

ebezzi approved these changes Apr 14, 2023

View reviewed changes

atolopko-czi requested changes Apr 14, 2023

View reviewed changes

pablo-gar and others added 3 commits April 14, 2023 09:50

Editorial change

93983c4

Co-authored-by: Emanuele Bezzi <[email protected]>

Editorial change

2f36dd5

Co-authored-by: Emanuele Bezzi <[email protected]>

address review comments

194c8e6

pablo-gar requested a review from atolopko-czi April 18, 2023 01:12

atolopko-czi approved these changes Apr 18, 2023

View reviewed changes

pablo-gar merged commit 0f49ea1 into main Apr 18, 2023

pablo-gar deleted the pablo-gar/audit-notebook-4 branch April 18, 2023 17:22

pablo-gar mentioned this pull request Apr 18, 2023

Update python notebooks to comply with notebook editorial guidelines #360

Closed

12 tasks

mlin mentioned this pull request Apr 20, 2023

[r] revise census_query_extract.Rmd #393

Merged

mlin added a commit that referenced this pull request Apr 21, 2023

revise census_query_extract.Rmd (#393)

5f5deac

Matching changes in #377

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit notebook 4 #377

Audit notebook 4 #377

pablo-gar commented Apr 13, 2023

codecov bot commented Apr 13, 2023 •

edited

Loading

ebezzi left a comment

ebezzi Apr 14, 2023

ebezzi Apr 14, 2023

atolopko-czi left a comment

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

pablo-gar commented Apr 18, 2023

atolopko-czi left a comment

	"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"
	"First, we will open the Census. If you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"

	"## Fetching all single-cell human lung data from the Census.\n",
	"## Fetching all single-cell human lung data from the Census.\n",

	"## Calculating QC metrics of the lung data.\n",
	"## Calculating QC metrics of the lung data\n",

	"NOTE: when query results are small, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",
	"NOTE: when query results are small enough to fit in memory, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",

	"# Genes measured in each cell (presence matrix)\n",
	"# Genes measured in each cell (dataset presence matrix)\n",

	"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrane`."
	"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrame`."

	" - Comparisons are one of column op value or column op column\n",
	" - Comparisons are one of `<column> <op> <value>` or `<column> <op> <column>`\n",

	"The dataframe above is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n",
	"The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.\n",

Audit notebook 4 #377

Audit notebook 4 #377

Conversation

pablo-gar commented Apr 13, 2023

codecov bot commented Apr 13, 2023 • edited Loading

Codecov Report

ebezzi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atolopko-czi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablo-gar commented Apr 18, 2023

atolopko-czi left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 13, 2023 •

edited

Loading