Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running demo code results in "LinAlgError: SVD did not converge" or "ValueError: array must not contain infs or NaNs" #14

Open
styler00dollar opened this issue Apr 1, 2021 · 1 comment

Comments

@styler00dollar
Copy link

Like I already mentioned in Issue 13, the demo code seems to crash with an error.

from torchvision.models import resnet50
from flopco import FlopCo
from musco.pytorch import CompressorVBMF, CompressorPR, CompressorManual

model = resnet50(pretrained = True)
model.cuda()
model_stats = FlopCo(model, device = 'cuda')

compressor = CompressorVBMF(model,
                            model_stats,
                            ft_every=5, 
                            nglobal_compress_iters=2)
while not compressor.done:
    compressor.compression_step()
compressed_model = compressor.compressed_model
~/anaconda3/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag)
    104 
    105 def _raise_linalgerror_svd_nonconvergence(err, flag):
--> 106     raise LinAlgError("SVD did not converge")
    107 
    108 def _raise_linalgerror_lstsq(err, flag):

LinAlgError: SVD did not converge

or

~/anaconda3/lib/python3.8/site-packages/numpy/lib/function_base.py in asarray_chkfinite(a, dtype, order)
    495     a = asarray(a, dtype=dtype, order=order)
    496     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
--> 497         raise ValueError(
    498             "array must not contain infs or NaNs")
    499     return a

ValueError: array must not contain infs or NaNs

The output seems to be random and one of both, if code gets run multiple times.

@engharat
Copy link

engharat commented Jun 7, 2022

I managed to fix it by replacing scikit-tensor-py3 calls with tensotly calls. The example works fine now, and I avoided also an ugly numpy&scipy downgrade, which was required by scikit-tensor-py3.
For anyone interested, here is what I did:
Remove from musco/pytorch/compressor/decompositions/tucker2.py any import to scikit-tensor-py3 functions
Add
import tensorly
tensorly.set_backend("pytorch")
in get_tucker_factors the weight line becomes:
weights = tensorly.tensor(self.weight.cpu())
The tucker call changes so that it uses tensorly.decomposition.tucker:
core, (U_cout, U_cin, U_dd) = tensorly.decomposition.tucker(weights, [self.ranks[0], self.ranks[1], weights.shape[-1]], init='nvecs')
Finally few lines down, in the same function, change core = core.dot(U_dd.T) into core = core.matmul(U_dd.T) to use pytorch matrix multiplication (.dot works only for 1D vectors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants