Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

novosparc exhausts memory when anndata.X in sparse matrix format (CSC or CSR) #54

Open
sztankatt opened this issue Sep 20, 2021 · 0 comments

Comments

@sztankatt
Copy link

sztankatt commented Sep 20, 2021

Hi, I've been trying to run novosparc with an anndata object, which has its matrix '.X' in sparse dataformat. I've tried both 'scipy.sparse.csc_matrix' and 'scipy.sparse.csr_matrix' and for both, the 'tissue.reconstruct' function hangs, and requests enormous amounts of memory (after 250GB I killed the process). If I convert the sparse matrix into a regular 'numpy' array, there is no problem, novosparc runs.

Below is a code example:


import numpy as np
import pandas as pd
import scanpy as sc

import novosparc

data_dir = '/data/local/rajewsky/home/tsztank/projects/spatial/repos/sts-paper/'
data_dir += 'projects/slideseq_v2/processed_data/slideseq_v2_mouse_hippo_new/'
data_dir += 'illumina/complete_data/automated_analysis/slideseq/umi_cutoff_50'

dataset = sc.read(f'{data_dir}/results.h5ad')
num_cells = 2000

#sc.pp.subsample(dataset, n_obs=num_cells)
dataset = dataset[dataset.obs.total_counts>1000, :]

gene_names = dataset.var.index.tolist()

num_cells, num_genes = dataset.shape

is_var_gene = dataset.var['highly_variable']
# select only 100 genes
var_genes = list(is_var_gene.index[is_var_gene])[:100]

dge_rep = dataset.to_df()[var_genes]

from scipy.sparse import csc_matrix, csr_matrix

# if you uncomment this, dataset.X will be stored not as sparse matrix, rather
# as a regular numpy array. uncommenting this doesnt throw an error
# M = dataset.X.toarray()
# del dataset.X
# dataset.X = M

num_locations = 1000
locations_circle = novosparc.gm.construct_circle(num_locations = num_locations)

tissue = novosparc.cm.Tissue(dataset=dataset, locations=locations_circle)

num_neighbors_s = num_neighbors_t = 5

tissue.setup_smooth_costs(dge_rep=dge_rep, num_neighbors_s=num_neighbors_s, 
    num_neighbors_t=num_neighbors_t)

tissue.reconstruct(alpha_linear=0, epsilon=5e-3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant