Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in loss function? #742

Open
paula-tataru opened this issue Oct 29, 2021 · 0 comments
Open

Error in loss function? #742

paula-tataru opened this issue Oct 29, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@paula-tataru
Copy link

Hi,

I was looking into re-fitting the dynamical model starting from the previous estimated parameters using load_pars = True in recover_dynamics, to see if the fit increases.

This lead me to noticing some presumably unexpected behavior.

To illustrate, I used only some of the top velocity genes mentioned in the tutorial on the pancreas dataset https://scvelo.readthedocs.io/DynamicalModeling/#Top-likelihood-genes

import scvelo as scv
import numpy as np

adata = scv.datasets.pancreas()
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_neighbors=30)

genes = ["Pcsk2", "Ank", "Cpe"]
all_genes = adata.var_names.tolist()
index = [all_genes.index(g) for g in genes]

gene = 'Ank'

scv.tl.recover_dynamics(adata, var_names=genes, n_jobs=4)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.latent_time(adata)
scv.pl.scatter(adata, [gene], color=['latent_time', 'velocity'], size=100, add_linfit=True)
before_loss = adata.varm['loss']
before_lk = adata.var["fit_likelihood"]

copy = adata.copy()
scv.tl.recover_dynamics(copy, var_names=genes, n_jobs=4, load_pars=True)
scv.tl.velocity(copy, mode='dynamical')
scv.tl.latent_time(copy)
scv.pl.scatter(copy, [gene], color=['latent_time', 'velocity'], size=100, add_linfit=True)
after_loss = copy.varm['loss']
after_lk = copy.var["fit_likelihood"]

for i in index:
    b = before_loss[i][np.where(~np.isnan(before_loss[i]))]
    a = after_loss[i][np.where(~np.isnan(after_loss[i]))]
    print("\n", i, all_genes[i])
    print("First run", b)
    print("Second run", a)
    print("Likelihoods", before_lk[i], after_lk[i])

This gives the following output

1065 Pcsk2
First run [685.46052609 300.28369447 253.4203696  182.610123   172.84769704
 169.95443291 169.73621456 170.54997998 129.88252498]
Second run [170.55002374 168.97730684 167.76548566 167.03546193 166.85076305
 131.60626496 131.46394603]
Likelihoods 0.7811821275312446 0.7678911738095799

 641 Ank
First run [818.5772059  442.53292667 481.76190086 422.24417444 345.3885125
 339.45685147 310.46193836 301.67644873 297.81560142 246.655008  ]
Second run [281.63963773 280.58363005 289.80538517 271.09269915 267.56137928
 259.15811369 257.32628282 264.28637709 259.19052107 261.31952427
 210.70557714]
Likelihoods 0.5955842645891559 0.6525387496831958

 1710 Cpe
First run [762.53913158 788.10980335 813.85277972 648.39829186 551.63063129
 552.19661997 552.10028203 541.45357252 540.89984471 454.89913062
 437.58513745 404.53944133 396.40565625 388.91901759 395.19910108]
Second run [378.19669228 374.73382683 353.45290071 339.7233515  330.39074211]
Likelihoods 0.5131029367955839 0.558141231252011

A few things I noticed with respect to the loss, which might indicate the presence of a bug:

  • The loss function doesn't seem to always decrease, there are situations where it goes up rather than down, for example for Cpe it goes from 788.11 to 813.85
  • When retraining the model starting from the previous estimated parameters, the initial loss is not the same as the best loss from the previous run. For Pcsk2, the initial estimation ended with a loss of 129.88, while the second estimation started with a worse loss of 170.55. For Cpe, the opposite happens, where the second estimation started with a better loss of 378.2, vs 395.2.
  • When retraining the model starting from the previous estimated parameters, we can end up with estimated parameters that are worse than the starting values, as shown by both the loss and likelihood for gene Pcsk2.

Additionally, it seems that for Ank, the likelihood increased by a bit more than 0.05 when running the second round of parameter estimation. I generated the phase plots with the different sets of parameters, coloring the cells by the latent time and velocity. The values seem to change quite a bit. Would this indicate that the model returns sub-optimal parameters?
Initial estimation:
before
Second estimation:
after

Thanks,
Paula

@paula-tataru paula-tataru added the bug Something isn't working label Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant