323 thousand cells. Is it too much for the memory? #2705

tuhulab · 2024-10-14T10:10:11Z

tuhulab
Oct 14, 2024

Hi there,

I plan to read 323 thousand cells into the memory. Is that too much? It takes really long time to respond.

I am running on the cluster with 16 vCPU, memory 96 Gb. Should I use more RAM?
Any advice for me to optimize?

import cellxgene_census

with cellxgene_census.open_soma(census_version="2024-07-01") as census:
    adata = cellxgene_census.get_anndata(
        census = census,
        organism = "Homo sapiens",
        obs_value_filter = "tissue == 'lung' and cell_type == 'CD4-positive, alpha-beta T cell'",
        column_names = {"obs": ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]},
    )

    print(adata)

MaximilianLombardo · 2024-10-14T15:33:16Z

MaximilianLombardo
Oct 14, 2024
Maintainer

Hi Tu, Thanks for your question! Could you provide more details about your intended use case? For example, are you performing a meta-analysis, training a model, or something else? In general, rather than using a machine with more memory, I would recommend executing your query lazily. This way, you avoid needing to hold the entire dataset in memory. You can achieve this with our ExperimentDataPipe class, which retrieves data lazily from the Census in batches. This method prevents loading the entire training dataset into memory at once. You can find more details about using ExperimentDataPipe in our Census documentation: Training a PyTorch Model (Create an ExperimentDataPipe) <https://chanzuckerberg.github.io/cellxgene-census/notebooks/experimental/pytorch.html#Create-an-ExperimentDataPipe> . Let us know if this solution works for your use case, or feel free to provide more context if you need additional guidance.

…

On Mon, Oct 14, 2024 at 12:10 PM Tu Hu ***@***.***> wrote: Hi there, I plan to read 323 thousand cells into the memory. Is that too much? It takes really long time to respond. I am running on the cluster with 16 vCPU, memory 96 Gb. Should I use more RAM? Any advice for me to optimize? import cellxgene_census with cellxgene_census.open_soma(census_version="2024-07-01") as census: adata = cellxgene_census.get_anndata( census = census, organism = "Homo sapiens", obs_value_filter = "tissue == 'lung' and cell_type == 'CD4-positive, alpha-beta T cell'", column_names = {"obs": ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]}, ) print(adata) — Reply to this email directly, view it on GitHub <#2705>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGDZQDI76L6WDG4DH4SEP43Z3OKBTAVCNFSM6AAAAABP4USLRWVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZXGMYTKOBVGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

323 thousand cells. Is it too much for the memory? #2705

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

323 thousand cells. Is it too much for the memory? #2705

tuhulab Oct 14, 2024

Replies: 1 comment

MaximilianLombardo Oct 14, 2024 Maintainer

tuhulab
Oct 14, 2024

MaximilianLombardo
Oct 14, 2024
Maintainer