integrate cell types and reclassify #13

stemangiola · 2022-10-08T12:48:31Z

divide cells based on mcroclusters (e.g. B cells, CD8 T, monocytes). This is not always trivial, we have some high-confidence annotation, but some cells cannot be easily classified in T, B, Monocytes.

select a small set of gene markers for the major immune cells (B, T, mono, DC) and do integration on 11M cells. This is procedurally cleaner, however I don't know if it's possible to do so with so many cells, even if we just use 20 genes.
@multimeric list in a comment below the best candidate algorithms that have been shown to suit atlas-level integration.
@ALL we will pick a couple from that list
@ConnieLWS select two optimal datasets to test the initial classification @ConnieLWS and integration @multimeric
@multimeric implement/install those methods, and test with the small sample selection that @ConnieLWS is using
@multimeric, then select high-confidence NK cells (starting testing with a small random cell selection of the full NK database) and try to produce an integrated PCA and UMAP (colouring by file_id and .sample, omitting the legends to save space in the plot).
@ConnieLWS run PCA, and tSNE (built from the first ~5 PC) with the gene signature we have (colouring by file_id and .sample, omitting the legends to save space in the plot), without integration.
Integrate cells in the same space, and do clustering and trajectory.

The text was updated successfully, but these errors were encountered:

stemangiola · 2022-10-20T07:10:48Z

I start proposing a small number of transcriptomic markers, please if you can extend this list.

t: CD3G, CD4, CD8A, ...
nk: GNLY, NCAM1, ...
b: CD79A, ...
monocyte: CD68, CD14, S100A9, NKG7...
dc: FCER1A, ...

@ConnieLWS could you please add you gene list here?

This is the current gene list but it's still being refined:

Tcell.sig <- c("CD3G", "CD4","CAMK4", "CD2", "CD3D", "CD3E") 
Bcell.sig <- c("CD79A", "BANK1", "BLK", "CD19", "CD22",  "CD79B",  "CPNE5", "FCRL1") 
Monocyte.sig <- c("CD68", "CD14", "S100A9", "NKG7")
DC.sig <- c("FCER1A", "CLEC4C", "CIITA", "BCL11A")
NK.sig <- c("GNLY", "KLRF1", "NKG7", "KLRD1", "PRF1")

FYI @goknurginer

ConnieLWS · 2022-10-21T05:51:55Z

Do you want tissue-specific marker genes for immune cells? If so, which tissue types would you like to focus on first?

stemangiola · 2022-10-21T07:59:59Z

Do you want tissue-specific marker genes for immune cells? If so, which tissue types would you like to focus on first?

No just a very small list of generic markers that would cluster integrated 11M cells of all tissues. after we divide cells into major macro clusters, we will integrate them separately using all genes.

stemangiola · 2022-10-24T08:52:32Z

With our small gene signature, we should "validate" it on the high-confidence cell types, for example using boxplots for the scaled gene-transcript abundance.

For obtaining the high-confidence cells, you can do

metadata |> filter(confidence_class==1)

stemangiola · 2022-10-25T05:51:24Z

In the meanwhile @multimeric add couple of features we need, let's start with MNN (scater) integration method using 10-50 genes, and start with 100K cells (we have 11M immune cells in total).

stemangiola · 2022-10-25T08:59:24Z

@ConnieLWS @multimeric FYI

"A unified analysis of atlas single cell data"

https://www.biorxiv.org/content/10.1101/2022.08.06.503038v1.full

multimeric · 2022-11-04T02:38:50Z

Here are some I think I'll try to benchmark, based on Connie's literature review:

Scanorama (Python)
scVI (Python)
LIGER (R)
Seurat (R)

stemangiola · 2022-11-04T02:51:29Z

Here are some I think I'll try to benchmark, based on Connie's literature review:

Scanorama (Python)

scVI (Python)

LIGER (R)

Seurat (R)

Great,

please select one between scVI and Scanorama, according your feeling about what the internet says in the application to atlas-level (millions of cells and multi-study) integration.
In parallel proceed with LIGER for the moment (as @ConnieLWS is approaching Seurat)

multimeric · 2022-11-04T02:54:02Z

You don't think we have scope for 2 Python tools?

stemangiola · 2022-11-04T03:00:48Z

You don't think we have scope for 2 Python tools?

Potentially, but the goal at this stage is to get the "minimum viable product", so we have to be careful of using our time parsimoniously. If you find yourself waiting for computation (we should avoid this testing on small chunks of data) you can work on your figure for the paper (in the todo list)

multimeric · 2022-11-04T03:02:16Z

Currently I have no data set to test these tools on anyway.

stemangiola · 2022-11-04T03:35:30Z

Currently I have no data set to test these tools on anyway.

You can first implement the tool with dummy data (the dataset queries in the README file). This initial dataset selection should not be a bottleneck.

ConnieLWS · 2022-11-08T06:27:47Z

Tested initial classification using 27 marker genes. The gene signature is still being refined.

Tcell.sig <- c("CD3G", "CD4","CAMK4", "CD2", "CD3D", "CD3E") 
Bcell.sig <- c("CD79A", "BANK1", "BLK", "CD19", "CD22",  "CD79B",  "CPNE5", "FCRL1") 
Monocyte.sig <- c("CD68", "CD14", "S100A9", "NKG7")
DC.sig <- c("FCER1A", "CLEC4C", "CIITA", "BCL11A")
NK.sig <- c("GNLY", "KLRF1", "NKG7", "KLRD1", "PRF1")

Initial testing was performed on 2 samples (~10k cells each) from one dataset:

stemangiola assigned ConnieLWS Oct 19, 2022

stemangiola assigned multimeric Oct 28, 2022

ConnieLWS moved this from Todo to In Progress in human-cell-atlas Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrate cell types and reclassify #13

integrate cell types and reclassify #13

stemangiola commented Oct 8, 2022 •

edited

Loading

stemangiola commented Oct 20, 2022 •

edited by ConnieLWS

Loading

ConnieLWS commented Oct 21, 2022

stemangiola commented Oct 21, 2022

stemangiola commented Oct 24, 2022

stemangiola commented Oct 25, 2022

stemangiola commented Oct 25, 2022

multimeric commented Nov 4, 2022

stemangiola commented Nov 4, 2022

multimeric commented Nov 4, 2022

stemangiola commented Nov 4, 2022 •

edited

Loading

multimeric commented Nov 4, 2022

stemangiola commented Nov 4, 2022

ConnieLWS commented Nov 8, 2022

integrate cell types and reclassify #13

integrate cell types and reclassify #13

Comments

stemangiola commented Oct 8, 2022 • edited Loading

stemangiola commented Oct 20, 2022 • edited by ConnieLWS Loading

ConnieLWS commented Oct 21, 2022

stemangiola commented Oct 21, 2022

stemangiola commented Oct 24, 2022

stemangiola commented Oct 25, 2022

stemangiola commented Oct 25, 2022

multimeric commented Nov 4, 2022

stemangiola commented Nov 4, 2022

multimeric commented Nov 4, 2022

stemangiola commented Nov 4, 2022 • edited Loading

multimeric commented Nov 4, 2022

stemangiola commented Nov 4, 2022

ConnieLWS commented Nov 8, 2022

stemangiola commented Oct 8, 2022 •

edited

Loading

stemangiola commented Oct 20, 2022 •

edited by ConnieLWS

Loading

stemangiola commented Nov 4, 2022 •

edited

Loading