-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using stdeconvolve with normalized / integrated data #25
Comments
Hi @Acaro12, Thanks so much for using The reason why With respect to combining multiple datasets, you could follow our strategy when analyzing the 4 breast cancer sections. Essentially, we take the union of overdispersed genes determined for each of the sections separately, then fit LDA models on the merged dataset, which is all the spots and the combined set of overdispersed genes. Note that in this case, all of the sections were taken from the same biopsy and so it is reasonable to assume that the technical variation between them should be low. If the sections are from different samples, then it might be more appropriate to analyze each separately. We have done this on datasets generated from different samples from the same tissue type (mouse olfactory bulb) and we have found high concordance between the deconvolved cell types (see Supplementary Figure S7). So although each sample is processed separately, Hope this helps and let me know if you have any other questions, |
Hi, How do you think such variance could potentially affect the gene expression profiles of the STdeconvolve resolved topics or cell types? |
Hi @joachimsiaw Thanks for your question! Essentially, the total counts per spot are treated as independent from all the other data generating variables in the LDA model. Therefore there is no need to depth normalize the total counts in each spot like there is for scRNA-seq data, for example. Additionally, LDA requires frequency counts of words or terms, specified as a matrix of nonnegative integers and so transforming the values to non-integers would be incompatible. We do however preprocess the data to remove poorly captured genes and low quality spots. We also feature select for overdispersed genes across spots as proxy for cell type specific gene expression. It's possible that large variations in cell density and thus total gene counts in spots could affect the genes that are detected as being overdispersed. I'll also add that we tested Hope this helps, |
@bmill3r Can you comment on how the gene expression profiles of the topics are generated?
|
Hi @joachimsiaw The gene expression profiles of the deconvolved cell types are essentially probability distributions of each deconvolved cell type over the genes (and not means or medians). In the context of Hope this answers your question, |
Hi everyone, This blog post and accompanying video walking through a simulation-based approach for exploring why we don't need normalization with STdeconvolve may be useful for you as you explore these interesting questions in the context of your own research pursuits: https://jef.works/blog/2023/05/04/normalization-clustering-vs-deconvolution/ Hope it helps, |
Dear Brendan,
I am using stdeconvolve with an 8-sample visium integrated seurat object. All samples were individually normalized with seurat's sctransform algorithm before anchor-based integration (also seurat toolkit).
The data output from sctransform (and consequently also after integration) can be negative and is of data format double. Hence, it cannot be used with stdeconvolve.
I have two questions:
Thank you so much in advance for your time!
Best,
Christoph
The text was updated successfully, but these errors were encountered: