Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error running pipeline #44

Closed
littleju714 opened this issue Jul 8, 2022 · 4 comments
Closed

error running pipeline #44

littleju714 opened this issue Jul 8, 2022 · 4 comments

Comments

@littleju714
Copy link

littleju714 commented Jul 8, 2022

Hi! Thanks again for your excellent work!

I am running pipeline on my data, it has 4 study about myeloid cells from different labs. And their celltype are labeled based on different methods. For example, one study has celltype all as "myeloid cells", one as " TYPE1, TYPE2, TYPE3", one as" TAM1(PD-L1),TAM2".

I have get rid of "scanvi" and "scgen" methods in my config since they use celltype. But I keep the original celltype in the obs otherwise it will break in the embedding step.
So can I still run the pipeline with my data?

It has the errors like :
1.

Traceback (most recent call last):
  File "scripts/integration/runIntegration.py", line 81, in <module>
    runIntegration(file, out, run, hvg, batch, celltype)
  File "scripts/integration/runIntegration.py", line 36, in runIntegration
    integrated = method(adata, batch)
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/integration.py", line 317, in mnn
    **kwargs,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/mnnpy/mnn.py", line 126, in mnn_correct
    svd_mode=svd_mode, do_concatenate=do_concatenate, **kwargs)
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/mnnpy/mnn.py", line 182, in mnn_correct
    new_batch_in, sigma)
IndexError: arrays used as indices must be of integer (or boolean) type
Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/metrics.py", line 340, in metrics
    verbose=False,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/silhouette.py", line 115, in silhouette_batch
    sil_means = sil_all.groupby("group").mean()
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1499, in mean
    numeric_only=numeric_only,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1016, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only, min_count=min_count
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1121, in _cython_agg_blocks
    raise DataError("No numeric types to aggregate")
pandas.core.base.DataError: No numeric types to aggregate

And here is my config:

ROOT: /data/msun/01_integration
r_env : scib-R4.0
py_env : scib-pipeline-R4.0
timing: false

unintegrated_metrics: false

FEATURE_SELECTION:
  hvg: 2000
  full_feature: 0

SCALING:
  - unscaled
  - scaled

METHODS:
# python methods : bbknn, combat, desc, mnn, saucie, scanorama, scanvi, scgen, scvi, trvae, trvaep
  bbknn:
    output_type: knn
  combat:
    output_type: full
  desc:
    output_type: embed
  mnn:
    output_type: full
  saucie:
    output_type:
      - full
      - embed
  scanorama:
    output_type:
      - embed
      - full
  #scanvi:
  #  output_type: embed
  #  no_scale: true
  #  use_celltype: true
  #scgen:
  #  output_type: full
  #  use_celltype: true
  scvi:
    no_scale: true
    output_type: embed
  #trvae:
  #  no_scale: true
  #  output_type:
  #    - embed
  #    - full
  #trvaep:
  #  no_scale: true
  #  output_type:
  #    - embed
  #    - full
# R methods : conos, fastmnn, harmony, liger, seurat, seuratpca
  conos: 
    R: true
    output_type: knn
  fastmnn:
    R: true
    output_type:
      - embed
      - full
  harmony:
    R: true
    output_type: embed
  liger:
    no_scale: true
    R: true
    output_type: embed
  seurat:
    R: true
    output_type: full
  seuratrpca:
      R: true
      output_type: full

DATA_SCENARIOS:
  integrate_output:
    batch_key: batch # name of key on anndata.obs that annotates the batches
    label_key: celltype  # name of key on anndata.obs that annotates the cell identity labels
    organism: mouse
    assay: expression
    file: /data/msun/01_integration/ori_data/with_layers/pure_adatas.h5ad

Could you help me with it? Does this error happen because of celltype issue or something else? Is it necessary to relabel their celltype?

Thank you for your time!!!!

@littleju714
Copy link
Author

I know how to fix the error 1:
chriscainx/mnnpy#30
I need to make the numba=0.45.0 and llvmlite 0.30.0, but it may be incompatible with others. So I give up mnn.

@littleju714
Copy link
Author

littleju714 commented Jul 18, 2022

I have updated the metrics.py from the scib in github. And the error 2 becomes:

Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/metrics.py", line 340, in metrics
    verbose=False,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/silhouette.py", line 113, in silhouette_batch
    sil_df = pd.concat(sil_dfs).reset_index(drop=True)
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 295, in concat
    sort=sort,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 342, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

But I don't know how to fix it.

@mumichae
Copy link
Collaborator

Hi, it seems like you might not be getting any values for the batch silhouette (batch ASW) score. Could you check what the result of the metric is on the integrated output that is causing the error?

import scib

asw_batch = scib.me.silhouette_batch(
    adata_int,
    batch_key=batch_key,
    group_key=label_key,
    embed='X_emb',
    return_all=True,
    verbose=True,
)

If return_all is True, you will get a Dataframe instead of an overall metric. I'm guessing it is empty in your case.

If 'X_emb' is not available, try computing and using the PCA instead

asw_batch = scib.me.silhouette_batch(
    adata_int,
    batch_key=batch_key,
    group_key=label_key,
    embed='X_pca',
    return_all=True,
    verbose=True,
)

@mumichae
Copy link
Collaborator

I changed the code so that you get NaN if the dataframe is empty. Feel free to update scib and rerun the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants