Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting using trained model. #27

Open
faith-8 opened this issue Sep 20, 2023 · 5 comments
Open

Predicting using trained model. #27

faith-8 opened this issue Sep 20, 2023 · 5 comments

Comments

@faith-8
Copy link

faith-8 commented Sep 20, 2023

Hi, I've successfully trained a model from scratch by following the tutorial on the following link
https://cpa-tools.readthedocs.io/en/latest/tutorials/combosciplex_Rdkit_embeddings.html

However, I'm currently lost on how to use the trained model in predicting an unseen dataset. I've tried creating the a new anndata with unseen perturbation but the following error occured.

INFO     Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup                             
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[48], line 1
----> 1 model.predict(ood_adata, batch_size=1024)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\torch\autograd\grad_mode.py:27](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/torch/autograd/grad_mode.py:27), in _DecoratorContextManager.__call__..decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\cpa\_model.py:679](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/cpa/_model.py:679), in CPA.predict(self, adata, indices, batch_size, n_samples, return_mean)
    676 assert self.module.recon_loss in ["gauss", "nb", "zinb"]
    677 self.module.eval()
--> 679 adata = self._validate_anndata(adata)
    680 if indices is None:
    681     indices = np.arange(adata.n_obs)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\scvi\model\base\_base_model.py:415](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/scvi/model/base/_base_model.py:415), in BaseModelClass._validate_anndata(self, adata, copy_if_view)
    409 if adata_manager is None:
    410     logger.info(
    411         "Input AnnData not setup with scvi-tools. "
    412         + "attempting to transfer AnnData setup"
    413     )
    414     self._register_manager_for_instance(
...
    230     self.attr_key,
    231     categorical_dtype=cat_dtype,
    232 )

ValueError: Category CHEMBL1213492+CHEMBL491473 not found in source registry. Cannot transfer setup without `extend_categories = True`.

Any help would be appreciated.

@HelloWorldLTY
Copy link

HelloWorldLTY commented Dec 1, 2023

Hi, same question here. The authors seem to believe that data with known combination but different dosage are OOD data, shown in the default tutorial. This should work since dosage is encoded by an independent encoder. However, as users, we believe OOD should mean samples we do not know drug perturbation/cell type/dosage, and the authors have another tutorial to handle this case.

@HelloWorldLTY
Copy link

Just notice that they have a version with drug embeddings database, which would at least allow us to predict the contributions of drugs in this database:
https://colab.research.google.com/github/theislab/cpa/blob/master/docs/tutorials/combosciplex_Rdkit_embeddings.ipynb#scrollTo=79062e65-3de9-4916-8999-449ef2df3edf

@M0hammadL
Copy link
Member

Hi, you can use these embeddings as an example or any other gene or drug embeddings to generalize to unseen embeddings

@M0hammadL
Copy link
Member

Hi, same question here. I think the definition of OOD between the authors and users might be different here. The authors seem to believe that data with known combination but different dosage are OOD data. This should work since dosage is encoded by an independent encoder. However, as users, we believe OOD should mean samples we do not know drug perturbation/cell type/dosage. Therefore, I think CPA does not have the function precisely matched our definition.

I suggest you to read the toturials we have all sorts of scenarios dosage, cell types unseen drugs and combinations and genes etc.

@HelloWorldLTY
Copy link

Thanks for your notes, just clarified my words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants