Error in loss function? #742

paula-tataru · 2021-10-29T12:06:07Z

Hi,

I was looking into re-fitting the dynamical model starting from the previous estimated parameters using load_pars = True in recover_dynamics, to see if the fit increases.

This lead me to noticing some presumably unexpected behavior.

To illustrate, I used only some of the top velocity genes mentioned in the tutorial on the pancreas dataset https://scvelo.readthedocs.io/DynamicalModeling/#Top-likelihood-genes

import scvelo as scv
import numpy as np

adata = scv.datasets.pancreas()
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_neighbors=30)

genes = ["Pcsk2", "Ank", "Cpe"]
all_genes = adata.var_names.tolist()
index = [all_genes.index(g) for g in genes]

gene = 'Ank'

scv.tl.recover_dynamics(adata, var_names=genes, n_jobs=4)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.latent_time(adata)
scv.pl.scatter(adata, [gene], color=['latent_time', 'velocity'], size=100, add_linfit=True)
before_loss = adata.varm['loss']
before_lk = adata.var["fit_likelihood"]

copy = adata.copy()
scv.tl.recover_dynamics(copy, var_names=genes, n_jobs=4, load_pars=True)
scv.tl.velocity(copy, mode='dynamical')
scv.tl.latent_time(copy)
scv.pl.scatter(copy, [gene], color=['latent_time', 'velocity'], size=100, add_linfit=True)
after_loss = copy.varm['loss']
after_lk = copy.var["fit_likelihood"]

for i in index:
    b = before_loss[i][np.where(~np.isnan(before_loss[i]))]
    a = after_loss[i][np.where(~np.isnan(after_loss[i]))]
    print("\n", i, all_genes[i])
    print("First run", b)
    print("Second run", a)
    print("Likelihoods", before_lk[i], after_lk[i])

This gives the following output

1065 Pcsk2
First run [685.46052609 300.28369447 253.4203696  182.610123   172.84769704
 169.95443291 169.73621456 170.54997998 129.88252498]
Second run [170.55002374 168.97730684 167.76548566 167.03546193 166.85076305
 131.60626496 131.46394603]
Likelihoods 0.7811821275312446 0.7678911738095799

 641 Ank
First run [818.5772059  442.53292667 481.76190086 422.24417444 345.3885125
 339.45685147 310.46193836 301.67644873 297.81560142 246.655008  ]
Second run [281.63963773 280.58363005 289.80538517 271.09269915 267.56137928
 259.15811369 257.32628282 264.28637709 259.19052107 261.31952427
 210.70557714]
Likelihoods 0.5955842645891559 0.6525387496831958

 1710 Cpe
First run [762.53913158 788.10980335 813.85277972 648.39829186 551.63063129
 552.19661997 552.10028203 541.45357252 540.89984471 454.89913062
 437.58513745 404.53944133 396.40565625 388.91901759 395.19910108]
Second run [378.19669228 374.73382683 353.45290071 339.7233515  330.39074211]
Likelihoods 0.5131029367955839 0.558141231252011

A few things I noticed with respect to the loss, which might indicate the presence of a bug:

The loss function doesn't seem to always decrease, there are situations where it goes up rather than down, for example for Cpe it goes from 788.11 to 813.85
When retraining the model starting from the previous estimated parameters, the initial loss is not the same as the best loss from the previous run. For Pcsk2, the initial estimation ended with a loss of 129.88, while the second estimation started with a worse loss of 170.55. For Cpe, the opposite happens, where the second estimation started with a better loss of 378.2, vs 395.2.
When retraining the model starting from the previous estimated parameters, we can end up with estimated parameters that are worse than the starting values, as shown by both the loss and likelihood for gene Pcsk2.

Additionally, it seems that for Ank, the likelihood increased by a bit more than 0.05 when running the second round of parameter estimation. I generated the phase plots with the different sets of parameters, coloring the cells by the latent time and velocity. The values seem to change quite a bit. Would this indicate that the model returns sub-optimal parameters?
Initial estimation:

Second estimation:

Thanks,
Paula

The text was updated successfully, but these errors were encountered:

paula-tataru added the bug Something isn't working label Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in loss function? #742

Error in loss function? #742

paula-tataru commented Oct 29, 2021

Error in loss function? #742

Error in loss function? #742

Comments

paula-tataru commented Oct 29, 2021