-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit notebook 4 #377
Audit notebook 4 #377
Conversation
Codecov Report
@@ Coverage Diff @@
## main #377 +/- ##
=======================================
Coverage ? 88.33%
=======================================
Files ? 50
Lines ? 2727
Branches ? 0
=======================================
Hits ? 2409
Misses ? 318
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but LGTM otherwise.
"\n", | ||
"*Goal:* look up all datasets that have a feature_id present." | ||
"Similarly we can check what datasets measured a gene or set of genes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem very clear to me.
"\n", | ||
"The \"locator\" returned by this API will include a `uri` and additional information that may be necessary to use the URI (eg, the S3 region).\n", | ||
"You can download the original H5AD file for any given dataset. This is the same H5AD you can download from the CELLxGENE Portal, and may contain additional data-submitter provided information which was not included in the Census.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from CELLxGENE Discover
? Otherwise portal should be lowercase IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one unfinished sentence and there may be cells that are lacking output (though I may be confused by the complicated diff of the notebook changes). Other than that, just a number of minor corrections and suggestions.
"\n", | ||
"## Learning about the lung data in the Census\n", | ||
"\n", | ||
"First we open the Census, if you are not familiar with the basics of Census API you should take a look at notebook \"Learning about the CELLxGENE Census\" at `comp_bio_census_info.ipynb`.\n" | ||
"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n" | |
"First, we will open the Census. If you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)\n" |
@@ -2190,7 +2185,7 @@ | |||
" - Mostly data from cells (\\~80%) rather than nucleus (\\~20%)\n", | |||
"- A total of **~12k** genes were measured across all cells.\n", | |||
"\n", | |||
"## Fetching a sample of all human lung data from the Census.\n", | |||
"## Fetching all single-cell human lung data from the Census.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"## Fetching all single-cell human lung data from the Census.\n", | |
"## Fetching all single-cell human lung data from the Census.\n", |
@@ -2304,7 +2297,7 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"### QC metrics on gene expression of all Lung data\n", | |||
"## Calculating QC metrics of the lung data.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"## Calculating QC metrics of the lung data.\n", | |
"## Calculating QC metrics of the lung data\n", |
"\n", | ||
"This notebook computes a variety of per-gene and per-cell statistics for a user-defined query.\n", | ||
"*NOTE*: when query results are small, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"*NOTE*: when query results are small, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n", | |
"*NOTE*: when query results are small enough to fit in memory, it may be easier to use the `SOMAExperiment` Query class to extract an AnnData, and then just compute over that. This tutorial shows means of incrementally processing larger-than-core (RAM) data, where incremental (online) algorithms are used.\n", |
@@ -4,11 +4,27 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Census datasets presence\n", | |||
"# Genes measured in each cell (presence matrix)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"# Genes measured in each cell (presence matrix)\n", | |
"# Genes measured in each cell (dataset presence matrix)\n", |
"## Fetching the datasets table\n", | ||
"\n", | ||
"\n", | ||
"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrane`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrane`." | |
"Each Census contains a top-level dataframe itemizing the datasets contained therein. You can read this into a `pandas.DataFrame`." |
@@ -535,13 +542,24 @@ | |||
} | |||
], | |||
"source": [ | |||
"# Option 1: Get location\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth also showing a request.get
call or a shell command using curl
? Minimally, add a comment like # Download the object using a command of your choice, like curl
.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Option 2: Direct download\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: make this option 1, since it's easier and is the Census Way. :)
"- `column_names` — list of strings indicating what metadata columns to fetch. \n", | ||
"- `value_filter` — Python expression with selection conditions to fetch rows, it is similar to [`pandas.DataFrame.query()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html), for full details see [`tiledb.QueryCondition`](https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#query-condition) shortly:\n", | ||
" - Expressions are one or more comparisons\n", | ||
" - Comparisons are one of column op value or column op column\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" - Comparisons are one of column op value or column op column\n", | |
" - Comparisons are one of `<column> <op> <value>` or `<column> <op> <column>`\n", |
formatting might help readability
"This dataframe is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n", | ||
"## Creating summary counts beyond pre-calculated values.\n", | ||
"\n", | ||
"The dataframe above is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The dataframe above is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.\n", | |
"The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.\n", |
Co-authored-by: Emanuele Bezzi <[email protected]>
Co-authored-by: Emanuele Bezzi <[email protected]>
Thanks @atolopko-czi and @ebezzi I've addressed your comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Completes work for #360
Removes notebook
census_rank_gene_groups.ipynb
as it is trivial and doesn't add significant value.