Will scanpy implement ICA? #767

chris-rands · 2019-08-05T09:58:45Z

Are there plans to include independent component analysis (ICA) in Scanpy? This would be useful for me. I know that it is already implemented in Seurat. Thank you

ivirshup · 2019-08-07T12:17:27Z

Could you tell us about how you've found it useful, or point us towards some literature on it being used? I think this would be easy enough to implement, but when I tried a naive implementation the results weren't that compelling. It's very possible I should've played around with the parameters more. @flying-sheep, do you have thoughts on this?

Here's a simple implementation:

def ica(adata, n_components, inplace=True, **kwargs): 
    from sklearn.decomposition import FastICA 
    ica_transformer = FastICA(n_components=n_components, **kwargs) 
    x_ica = ica_transformer.fit_transform(adata.X) 
    if inplace:
        adata.obsm["X_ica"] = x_ica 
        adata.varm["ICs"] = ica_transformer.components_.T 
    else:
        return ica_transformer

Fougere87 · 2019-08-07T13:11:44Z

Hi !
To answer about how it is useful, we are using ICA in our lab to a dataset of more than 100k cells, with a lot of complexity, and the main advantage of ICA against PCA is that it helps us detecting small populations of cells. As these small populations are not accounting for a lot of variance within the dataset, using a treshold on PCs, we discarded the PCs that would allow the separate them.
Another advantage is that we do not make use of a selection of "highly variable genes" anymore, and use all genes expressed in more than 100 cells for the whole analysis... Doing the same and applying PCA gave us quite poor results..

We made use of Seurat implementation.. and I tried fastICA from sklearn once but I couldn't obtain similar results... I have not looked thoroughly into seurat's code tough...

Hope it helps !
Best

chris-rands · 2019-08-07T14:08:13Z

Thank you both. In terms of literature, see this paper (PMID: 30096299), in particular Figure S2

ivirshup · 2019-08-08T07:10:35Z

I looked through the R fastica package, and I think one of the main differences was the default tolerance. I've changed that and results seem a bit better. I don't think I have a great reference point to evaluate it though. Any chance you could provide a vignette of ICA being used with single cell data, so we can see if the python version can recapitulate the results?

fidelram · 2019-08-30T07:45:00Z

@chris-rands Any update on this? Does the code from @ivirshup gives you any meaningful results?

gokceneraslan · 2019-08-30T11:50:26Z

BTW there is a nice "Faster ICA" algorithm now, called Picard. It's implemented as a python package here:

https://pierreablin.github.io/picard/generated/picard.picard.html#picard.picard

It might be worthwhile to try this too.

ivirshup · 2019-10-10T04:52:33Z

@gokceneraslan, got around to giving picard a shot. For the pbmc3k dataset, it's a little slower than sklearns FastICA, though sklearn gives non-convergence warnings more often. Have you tried it before?

chris-rands · 2019-10-10T08:20:50Z

Thanks for the feedback and sorry for the delay. The code snippet using sklearn.decomposition.FastICA gave me quite similar results to PCA for my data in terms of the downstream UMAP visualisations (when I simply embedded 50 components in the neighbourhood graph). One difference is that the ICA was slower to compute. I am not confident to create a vignette, as I'm unclear what the 'correct' results should look like. I have not tried the picard implementation yet.

Fougere87 · 2019-10-10T09:08:11Z

Hi, I tried the snippet, with fastICA and picard, and with a number of cells higher than 30,000, the whitening step cannot be completed. This seems be due to some Lapack limitations.
ValueError: Too large work array required -- computation cannot be performed with standard 32-bit LAPACK.
I don't know how to get around this....
Best,
Chloé

ivirshup · 2019-11-12T06:38:30Z

Sorry for the late response, completley forgot to post my response here.

@Fougere87, did that whitening issue occur with picard as well? I saw that with sklearn. I think we could get around that by whitening ourselves with ARPACK.

Picard and sklearn look pretty similar to me in a quick comparison. Below are top 16/30 components (ranked by Geary's C, autocorrelation on the connectivity graph) cell loadings on the pbmc3k dataset. The umap and connectivity matrix here were computed on top of a PCA – which I should maybe do differently. However I think the results are similar enough that it's probably not of consequence.

sklearn FastICA

picard ICA

ivirshup added the Enhancement ✨ label Aug 7, 2019

ivirshup mentioned this issue Nov 13, 2019

sc.metrics module (add confusion matrix & Geary's C methods) #915

Merged

ivirshup linked a pull request Nov 30, 2019 that will close this issue

[WIP] ICA #941

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will scanpy implement ICA? #767

Will scanpy implement ICA? #767

chris-rands commented Aug 5, 2019 •

edited

Loading

ivirshup commented Aug 7, 2019

Fougere87 commented Aug 7, 2019

chris-rands commented Aug 7, 2019

ivirshup commented Aug 8, 2019

fidelram commented Aug 30, 2019

gokceneraslan commented Aug 30, 2019 •

edited

Loading

ivirshup commented Oct 10, 2019

chris-rands commented Oct 10, 2019

Fougere87 commented Oct 10, 2019

ivirshup commented Nov 12, 2019

Will scanpy implement ICA? #767

Will scanpy implement ICA? #767

Comments

chris-rands commented Aug 5, 2019 • edited Loading

ivirshup commented Aug 7, 2019

Fougere87 commented Aug 7, 2019

chris-rands commented Aug 7, 2019

ivirshup commented Aug 8, 2019

fidelram commented Aug 30, 2019

gokceneraslan commented Aug 30, 2019 • edited Loading

ivirshup commented Oct 10, 2019

chris-rands commented Oct 10, 2019

Fougere87 commented Oct 10, 2019

ivirshup commented Nov 12, 2019

chris-rands commented Aug 5, 2019 •

edited

Loading

gokceneraslan commented Aug 30, 2019 •

edited

Loading