-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds 'duplicate cells' notebooks #448
Conversation
Codecov Report
@@ Coverage Diff @@
## main #448 +/- ##
=======================================
Coverage 88.51% 88.51%
=======================================
Files 50 50
Lines 2770 2770
=======================================
Hits 2452 2452
Misses 318 318
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions
"\n", | ||
"* There are superset datasets containing data from multiple datasets.\n", | ||
"> *For example [Tabula Sapiens](https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5) has one dataset with all of its cells and separate datasets with cells divided by high-level lineage (i.e. immune, epithelial, stromal, endothelial)*\n", | ||
"* There are datasets with meta-analysis of pre-existing datasets.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"* There are datasets with meta-analysis of pre-existing datasets.\n", | |
"* A dataset may provide a meta-analysis of a pre-existing datasets.\n", |
} | ||
], | ||
"source": [ | ||
"adata" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(adata.obs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
"outputs": [], | ||
"source": [ | ||
"with cellxgene_census.open_soma() as census:\n", | ||
" nk_cells_unique = census[\"census_data\"][\"homo_sapiens\"].obs.read(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not nk_cells_primary
instead of introducing a new term (unique)? (for adata below, too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
" )\n", | ||
"\n", | ||
" # get iterator for X\n", | ||
" iterator = query.X(\"raw\").tables()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternately, you could retrieve the obs data, concat it, and show the count, as done for the other examples. This we de-emphasize the "out-of-core" purpose. Or retrieve the obs data and show the is_primary_data values for a single chunk are all True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to still emphasize the out-of-core aspect of this, the reason being "repetition is our ally" for communicating to the user how to perform out-of-core and exclude primary cells
Co-authored-by: Andrew Tolopko <[email protected]>
Co-authored-by: Andrew Tolopko <[email protected]>
Co-authored-by: Andrew Tolopko <[email protected]>
Co-authored-by: Andrew Tolopko <[email protected]>
Co-authored-by: Andrew Tolopko <[email protected]>
Co-authored-by: Andrew Tolopko <[email protected]>
"\n", | ||
"## An example: duplicate cells in the Tabula Muris Senis data\n", | ||
"\n", | ||
"Let's take a look at an example from the Census using the Tabula Muris Senis data. Some datasets contain non-primary cell data.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this introduces a new term: "non-primary cell data". Is this the same as "duplicated cell data"? If the same, I suggest sticking with the defined term ("duplicate"), or defining primary/non-primary in the intro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
"\n", | ||
"Let's take a look at an example from the Census using the Tabula Muris Senis data. Some datasets contain non-primary cell data.\n", | ||
"\n", | ||
"We can obtain cell metadata for the **main** Tabula Muris Senis dataset: \"All - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x\", which contains only primary cell data\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here - the term "primary" is used. If these are helpful, maybe we just need to add them to the definition above (i.e., what is a "primary" or "non-primary" cell?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
"1. This dataset only contains cells from liver.\n", | ||
"2. All cells are labelled as `False` for `is_primary_data`. **This is because the cells are marked as duplicate cells of the main Tabula Muris Senis dataset.**\n", | ||
"\n", | ||
"## Filtering out duplicates cells\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo (extra 's'): suggest: "duplicate cells"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds notebook to increase visibility to the cell metadata variable
is_primary_data