-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate cell types and reclassify #13
Comments
I start proposing a small number of transcriptomic markers, please if you can extend this list.
@ConnieLWS could you please add you gene list here? This is the current gene list but it's still being refined:
FYI @goknurginer |
Do you want tissue-specific marker genes for immune cells? If so, which tissue types would you like to focus on first? |
No just a very small list of generic markers that would cluster integrated 11M cells of all tissues. after we divide cells into major macro clusters, we will integrate them separately using all genes. |
With our small gene signature, we should "validate" it on the high-confidence cell types, for example using boxplots for the scaled gene-transcript abundance. For obtaining the high-confidence cells, you can do metadata |> filter(confidence_class==1) |
In the meanwhile @multimeric add couple of features we need, let's start with MNN (scater) integration method using 10-50 genes, and start with 100K cells (we have 11M immune cells in total). |
"A unified analysis of atlas single cell data" https://www.biorxiv.org/content/10.1101/2022.08.06.503038v1.full |
Great,
|
You don't think we have scope for 2 Python tools? |
Potentially, but the goal at this stage is to get the "minimum viable product", so we have to be careful of using our time parsimoniously. If you find yourself waiting for computation (we should avoid this testing on small chunks of data) you can work on your figure for the paper (in the todo list) |
Currently I have no data set to test these tools on anyway. |
You can first implement the tool with dummy data (the dataset queries in the README file). This initial dataset selection should not be a bottleneck. |
Tested initial classification using 27 marker genes. The gene signature is still being refined.
Initial testing was performed on 2 samples (~10k cells each) from one dataset: |
file_id
and.sample
, omitting the legends to save space in the plot).file_id
and.sample
, omitting the legends to save space in the plot), without integration.The text was updated successfully, but these errors were encountered: