Error on using 'groupby' covariate with 'stochastic' mode within scv.tl.velocity() #874

JesseRop · 2022-05-04T14:29:57Z

Hi @WeilerP , thanks for the great package and all your efforts in maintaining it!

I was able to replicate the error above as in the scripts below.
I was wondering whether there's is a way to calculate RNA velocities and velocity streams independently per cluster (or batch) in the stochastic/deterministic mode so to allow the accounting of different dynamics per cluster (or batch). I'm aware of recalculating velocities while accounting for differential kinetics per cluster in the dynamical mode but the mode doesn't work well for my data.
Do I have to subset the different clusters into separate anndata objects and run RNA velocities and then merge the objects back together?
I was thinking 'groupby' can be used to condition for the different clusters but it seems it has to be used in combination with 'group' to restrict to RNA velocity calculation to just one(or more) cluster rather than perform RNA velocity for all cell conditioning for each cluster.

adata = scv.datasets.dentategyrus()

scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)

scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

scv.tl.velocity(adata, mode='stochastic', min_r2=0.1, groupby='clusters')

Error output

computing velocities

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
Input In [408], in <cell line: 1>()
----> 1 scv.tl.velocity(adata, mode='stochastic', min_r2=0.1, groupby='clusters')

File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/velocity.py:428, in velocity(data, vkey, mode, fit_offset, fit_offset2, filter_genes, groups, groupby, groups_for_fit, constrain_ratio, use_raw, use_latent_time, perc, min_r2, min_likelihood, r2_adjusted, use_highly_variable, diff_kinetics, copy, **kwargs)
    419         _adata = adata if groups is None else adata[cell_subset]
    420         velo = Velocity(
    421             _adata,
    422             residual=residual,
   (...)
    426             use_highly_variable=use_highly_variable,
    427         )
--> 428     velo.compute_stochastic(fit_offset, fit_offset2, mode, perc=perc)
    430 write_residuals(adata, vkey, velo._residual, cell_subset)
    431 write_residuals(adata, f"variance_{vkey}", velo._residual2, cell_subset)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/velocity.py:147, in Velocity.compute_stochastic(self, fit_offset, fit_offset2, mode, perc)
    141 res2_std = (cov_us - _gamma2 * var_ss - _offset2).std(0)
    143 # solve multiple regression
    144 self._offset[idx], self._offset2[idx], self._gamma[idx] = (
    145     maximum_likelihood(_Ms, _Mu, _Mus, _Mss, fit_offset, fit_offset2)
    146     if mode == "bayes"
--> 147     else leastsq_generalized(
    148         _Ms,
    149         _Mu,
    150         var_ss,
    151         cov_us,
    152         res_std,
    153         res2_std,
    154         fit_offset,
    155         fit_offset2,
    156         perc,
    157     )
    158 )
    160 self._residual = self._Mu - self._gamma * self._Ms
    161 if fit_offset:

File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/optimization.py:173, in leastsq_generalized(x, y, x2, y2, res_std, res2_std, fit_offset, fit_offset2, perc)
    171     for i in range(n_var):
    172         A = np.c_[x[:, i]]
--> 173         gamma[i] = np.linalg.pinv(A.T.dot(A)).dot(A.T.dot(y[:, i]))
    175 offset[np.isnan(offset)] = 0
    176 offset_ss[np.isnan(offset_ss)] = 0

File <__array_function__ internals>:5, in pinv(*args, **kwargs)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:2002, in pinv(a, rcond, hermitian)
   2000     return wrap(res)
   2001 a = a.conjugate()
-> 2002 u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)
   2004 # discard small singular values
   2005 cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True)

File <__array_function__ internals>:5, in svd(*args, **kwargs)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:1660, in svd(a, full_matrices, compute_uv, hermitian)
   1657         gufunc = _umath_linalg.svd_n_s
   1659 signature = 'D->DdD' if isComplexType(t) else 'd->ddd'
-> 1660 u, s, vh = gufunc(a, signature=signature, extobj=extobj)
   1661 u = u.astype(result_t, copy=False)
   1662 s = s.astype(_realType(result_t), copy=False)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:97, in _raise_linalgerror_svd_nonconvergence(err, flag)
     96 def _raise_linalgerror_svd_nonconvergence(err, flag):
---> 97     raise LinAlgError("SVD did not converge")

LinAlgError: SVD did not converge

Versions

scvelo==0.2.5.dev71+g85295fc  scanpy==1.9.1  anndata==0.7.8  loompy==3.0.7  numpy==1.21.5  scipy==1.8.0  matplotlib==3.5.1  sklearn==1.0.2  pandas==1.4.2  

Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
ERROR: XMLRPC request failed [code: -32500]
RuntimeError: PyPI's XMLRPC API is currently disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.

WeilerP · 2022-05-04T14:55:46Z

@JesseRop, thanks for reporting this. Looks similar to #856. I'll check if I can reproduce the error on my side and get back to you ASAP.
In general: If the dynamical mode does not work on your data, you shouldn't simply switch to another (less sophisticated) one. Both approaches rely on a specific shape of phase portraits and cell types aligned along it. See e.g. #216 or #462 for possible reasons for failure.

WeilerP · 2022-05-04T21:12:58Z

@JesseRop, the numpy version is the issue. Not sure why, though. When downgrading to numpy<1.21, the error is no longer thrown.
@michalk8, @ivirshup did you encounter something similar before by chance, or are aware of anything introduced in numpy>1.21 causing this?

monica-nicolau · 2023-05-24T19:20:13Z

Hi everyone!
@WeilerP are you sure the numpy version is the only issue? Because I've been running scv.tl.velocity for multiple (5 other) samples and it raises this error only for one sample, how could it be working for the other samples if the numpy version is the same?
It's not a memory issue either, because I tried running the command different times and it always raises the same error. Could you help me understand what it is the problem of this object?

My current np.__version__ is '1.21.1'

UPDATE ***
I managed to avoid the error by specifying enforce=True during the normalization process carried out through this function: scv.pp.filter_and_normalize()
After enforce=True, the scv.tl.velocity() function didn't raise anymore the "SVD did not converge" error :)

JesseRop added the bug Something isn't working label May 4, 2022

WeilerP mentioned this issue May 4, 2022

Error in tl.velocity: LinAlgError: SVD did not converge #856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on using 'groupby' covariate with 'stochastic' mode within scv.tl.velocity() #874

Error on using 'groupby' covariate with 'stochastic' mode within scv.tl.velocity() #874

JesseRop commented May 4, 2022

WeilerP commented May 4, 2022

WeilerP commented May 4, 2022

monica-nicolau commented May 24, 2023 •

edited

Loading

Error on using 'groupby' covariate with 'stochastic' mode within scv.tl.velocity() #874

Error on using 'groupby' covariate with 'stochastic' mode within scv.tl.velocity() #874

Comments

JesseRop commented May 4, 2022

WeilerP commented May 4, 2022

WeilerP commented May 4, 2022

monica-nicolau commented May 24, 2023 • edited Loading

monica-nicolau commented May 24, 2023 •

edited

Loading