Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit notebook 4 #377

Merged
merged 6 commits into from
Apr 18, 2023
Merged

Audit notebook 4 #377

merged 6 commits into from
Apr 18, 2023

Conversation

pablo-gar
Copy link
Contributor

Completes work for #360

Removes notebook census_rank_gene_groups.ipynb as it is trivial and doesn't add significant value.

@pablo-gar pablo-gar marked this pull request as ready for review April 13, 2023 00:33
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@281ffdf). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #377   +/-   ##
=======================================
  Coverage        ?   88.33%           
=======================================
  Files           ?       50           
  Lines           ?     2727           
  Branches        ?        0           
=======================================
  Hits            ?     2409           
  Misses          ?      318           
  Partials        ?        0           
Flag Coverage Δ
unittests 88.33% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ebezzi ebezzi changed the title Pablo gar/audit notebook 4 Audit notebook 4 Apr 14, 2023
Copy link
Member

@ebezzi ebezzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but LGTM otherwise.

"\n",
"*Goal:* look up all datasets that have a feature_id present."
"Similarly we can check what datasets measured a gene or set of genes."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem very clear to me.

"\n",
"The \"locator\" returned by this API will include a `uri` and additional information that may be necessary to use the URI (eg, the S3 region).\n",
"You can download the original H5AD file for any given dataset. This is the same H5AD you can download from the CELLxGENE Portal, and may contain additional data-submitter provided information which was not included in the Census.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from CELLxGENE Discover? Otherwise portal should be lowercase IMHO.

Copy link
Collaborator

@atolopko-czi atolopko-czi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one unfinished sentence and there may be cells that are lacking output (though I may be confused by the complicated diff of the notebook changes). Other than that, just a number of minor corrections and suggestions.

"\n",
"## Learning about the lung data in the Census\n",
"\n",
"First we open the Census, if you are not familiar with the basics of Census API you should take a look at notebook \"Learning about the CELLxGENE Census\" at `comp_bio_census_info.ipynb`.\n"
"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"
"First, we will open the Census. If you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n"

@@ -2190,7 +2185,7 @@
" - Mostly data from cells (\\~80%) rather than nucleus (\\~20%)\n",
"- A total of **~12k** genes were measured across all cells.\n",
"\n",
"## Fetching a sample of all human lung data from the Census.\n",
"## Fetching all single-cell human lung data from the Census.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"## Fetching all single-cell human lung data from the Census.\n",
"## Fetching all single-cell human lung data from the Census.\n",

@@ -2304,7 +2297,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### QC metrics on gene expression of all Lung data\n",
"## Calculating QC metrics of the lung data.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"## Calculating QC metrics of the lung data.\n",
"## Calculating QC metrics of the lung data\n",

"\n",
"This notebook computes a variety of per-gene and per-cell statistics for a user-defined query.\n",
"*NOTE*: when query results are small, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"*NOTE*: when query results are small, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",
"*NOTE*: when query results are small enough to fit in memory, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n",

@@ -4,11 +4,27 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Census datasets presence\n",
"# Genes measured in each cell (presence matrix)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"# Genes measured in each cell (presence matrix)\n",
"# Genes measured in each cell (dataset presence matrix)\n",

"## Fetching the datasets table\n",
"\n",
"\n",
"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrane`."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrane`."
"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrame`."

@@ -535,13 +542,24 @@
}
],
"source": [
"# Option 1: Get location\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth also showing a request.get call or a shell command using curl? Minimally, add a comment like # Download the object using a command of your choice, like curl.

"metadata": {},
"outputs": [],
"source": [
"# Option 2: Direct download\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: make this option 1, since it's easier and is the Census Way. :)

"- `column_names` — list of strings indicating what metadata columns to fetch. \n",
"- `value_filter` — Python expression with selection conditions to fetch rows, it is similar to [`pandas.DataFrame.query()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html), for full details see [`tiledb.QueryCondition`](https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#query-condition) shortly:\n",
" - Expressions are one or more comparisons\n",
" - Comparisons are one of column op value or column op column\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" - Comparisons are one of column op value or column op column\n",
" - Comparisons are one of `<column> <op> <value>` or `<column> <op> <column>`\n",

formatting might help readability

"This dataframe is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n",
"## Creating summary counts beyond pre-calculated values.\n",
"\n",
"The dataframe above is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The dataframe above is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n",
"The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.\n",

pablo-gar and others added 3 commits April 14, 2023 09:50
Co-authored-by: Emanuele Bezzi <[email protected]>
Co-authored-by: Emanuele Bezzi <[email protected]>
@pablo-gar
Copy link
Contributor Author

Thanks @atolopko-czi and @ebezzi I've addressed your comments

Copy link
Collaborator

@atolopko-czi atolopko-czi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pablo-gar pablo-gar merged commit 0f49ea1 into main Apr 18, 2023
@pablo-gar pablo-gar deleted the pablo-gar/audit-notebook-4 branch April 18, 2023 17:22
mlin added a commit that referenced this pull request Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants