Singular Ve matrix #61

davidhoule · 2017-07-28T20:08:44Z

I have installed the program and can run your example data set without problem. Similarly, I can run the program on small subsets of the SNPs with my phenotypic data. However, when I calculate relatedness based on the whole genome, the estimates of the relatedness matrix calculated in GEMMA appear to cause problems. I have diagnosed the relatedness matrix as the problem by using the full relatedness matrices in the sample example analyses of my data (two traits, three snps) that run with a relatedness matrix calculated from just the three snps.

The error is consistent, in that the program estimates a 0 Ve matrix, then crashes because of a singular matrix error:
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
1.6815
0.1562 1.6745
se(Vg):
0.3784
-nan -nan
REMLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.1228
-nan -nan
REMLE likelihood = -478.9822
MLE estimate for Vg in the null model:
1.6815
0.1563 1.6745
se(Vg):
0.1758
0.1246 0.1751
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
-nan
0.0000 0.0000
MLE likelihood = -140737488355560.3750
gsl: lu.c:262: ERROR: matrix is singular========================100.00%

It may be that the relatedness in my data set is the problem. It is certainly not what you find in human data. I am studying a set of 184 largely inbred lines, the Drosophila Genome Reference Panel. The majority of sites are fixed, but the proportion of heterozygotes is maybe 5% on average, but that varies among lines. In addition, about 5% of the line pairs are more related than second cousins, and a few seem to be full sibs. Bottom line is that genotypes are always very far from Hardy-Weinberg. I have not filtered the genome for high LD SNP pairs for the calculation of relatedness, although I am aware that this will pose problems for the actual genome-wide association analysis with more than a few SNPs.

I would be happy to furnish example data sets that create the problem if that would be helpful.

pcarbo · 2017-07-29T03:12:49Z

@davidhoule Your problem may be related to Issue #58. What version of GEMMA are you using, and which operating system?

davidhoule · 2017-07-29T18:45:43Z

I am using Ubuntu 16.04.2. I updated to version GEMMA 0.97, with the same results. Relatedness matrix that causes problems has just one (very near 0) negative eigenvalue. Bending matrix to all positive eigenvalues has no effect on the error. Other toy relatedness matrices with many negative near 0 eigenvalues run without problem.

pcarbo · 2017-07-29T19:36:53Z

@davidhoule Can you please try downloading the pre-compiled binary for GEMMA v0.96 Linux x86 and see if that works for you? We recently uncovered a bug with linking to GSL 2.x and the bugs looks very similar to the one you reported.

davidhoule · 2017-07-31T19:17:44Z

Tried the binary you furnished with the same result.

Then did an analysis using a relatedness matrix estimated in smartpca, part of the Eigensoft package. With this matrix, all versions of the program will run on my toy data set. Estimates of Ve are still 0, but tests are produced, and there are no errors:

Reading Files ...
## number of total individuals = 184
## number of analyzed individuals = 184
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 3
## number of analyzed SNPs = 3
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
0.8298
0.0682  0.7795
se(Vg):
-nan
0.1064  0.1491
REMLE estimate for Ve in the null model:
0.0000
0.0000  0.0388
se(Ve):
-nan
0.0804  0.1126
REMLE likelihood = -475.4224
MLE estimate for Vg in the null model:
0.8298
0.0685  0.8233
se(Vg):
0.0868
0.0613  0.0861
MLE estimate for Ve in the null model:
0.0000
0.0000  0.0000
se(Ve):
0.0000
0.0000  0.0000
MLE likelihood = 70368744177431.5234
Reading SNPs  ==================================================100.00%

I assume that the two nan SE are still indicating a problem?

pcarbo · 2017-08-01T01:09:34Z

@davidhoule Great progress. It sounds like your siutation is related to Issue #45. In some cases it appears that a different method for computing the relatedness matrix yields more or less numerically stable results, although it is hard to say exactly what the root cause is. I believe that the computation requires tha all the eigenvalues to be positive. It might be useful to check this before running the LMM analysis in GEMMA with the -eigen option.

It is useful to look at output file result.log.txt rather than the console output because it gives you the numbers in more precision.

Did you try the univariate LMM analysis? Do you get this problem in the univariate LMM analysis?

davidhoule · 2017-08-01T02:53:59Z

Univariate analyses do not seem to have the problem. The actual log.txt file does show very slightly positive estimates of Ve terms, and of all VE SE, except the one nan.

I suspect that there must be an 'effective 0' tolerance, as I have examples that run with many very slightly negative eigenvalues.

Thanks for your help with this issue.

pjotrp · 2017-08-14T11:44:54Z

Possibly this is fixed by #69. Please try the next release. If it does not resolve send me a dataset.

pjotrp · 2017-08-20T07:50:14Z

@davidhoule can you send me the toy dataset?

pjotrp · 2017-09-12T02:18:58Z

No response

pjotrp · 2018-06-29T13:43:49Z

We have a dataset now for testing that shows similar characteristics. Running -lmm 2 renders

##
## GEMMA Version    = 0.98-pre1 (2018/06/29)
## GSL Version      = 2.4
## Eigen Version    = 3.2.9
## OpenBlas         = OpenBLAS 0.2.19  - NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell
##   arch           = Haswell
##   threads        = 8
##   parallel type  = threaded
##
## Command Line Input = ./bin/gemma -bfile ./frontiers -lmm 2 -k ./GWA_2017.centered.cXX.txt -maf 0.05 -outdir ./skull/ -snps selected_200_SNPs.txt -debug -legacy -no-check 
##
## Date = Fri Jun 29 13:38:52 2018
##
## Summary Statistics:
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs/var = 73527
## number of analyzed SNPs/var = 200
## REMLE log-likelihood in the null model = 135.494
## MLE log-likelihood in the null model = 140.231
## pve estimate in the null model = 0.958692
## se(pve) in the null model = 0.037907
## vg estimate in the null model = 0.000473816
## ve estimate in the null model = 1.46539e-05
## beta estimate in the null model =   0.00169357
## se(beta) =   0.000546862
##
## Computation Time:
## total computation time = 0.0139996 min 
## computation time break down: 
##      time on eigen-decomposition = 0.00010455 min 
##      time on calculating UtX = 1.55333e-05 min 
##      time on optimization = 0.00404858 min 
##

while

~/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 2 3 4 5 6 7 8 9 10 -k ./GWA_2017.centered.cXX.txt -maf 0.3 -outdir ./skull/ -snps selected_200_SNPs.txt -debug

renders

MLE estimate for Vg in the null model: 
0.0004
-0.0001 0.0002
-0.0000 -0.0000 0.0002
0.0000  0.0001  0.0000  0.0002
0.0000  0.0000  -0.0000 -0.0000 0.0001
-0.0000 0.0000  -0.0000 -0.0000 0.0000  0.0002
-0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0001
-0.0000 0.0000  -0.0000 0.0000  -0.0000 0.0000  0.0001  0.0001
-0.0000 0.0000  0.0000  0.0000  0.0000  0.0000  -0.0000 0.0000  0.0001
-0.0000 -0.0000 0.0000  -0.0000 -0.0000 -0.0000 -0.0000 0.0000  0.0000  0.0001
se(Vg): 
0.0001
0.0001  0.0001
0.0001  0.0001  0.0001
0.0001  0.0000  0.0001  0.0001
0.0000  0.0001  -nan    0.0000  0.0001
0.0000  0.0000  0.0001  0.0000  0.0000  0.0000
0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  0.0001  0.0000  0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  0.0001  0.0000  0.0001  0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  -nan    0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  -nan
MLE estimate for Ve in the null model: 
0.0000
0.0000  0.0001
-0.0000 0.0000  0.0001
-0.0000 -0.0000 -0.0000 0.0000
0.0000  -0.0000 0.0000  0.0000  0.0001
-0.0000 0.0000  0.0000  -0.0000 -0.0000 0.0000
0.0000  0.0000  -0.0000 0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  -0.0000 -0.0000 0.0000  -0.0000 -0.0000 0.0000
0.0000  -0.0000 -0.0000 0.0000  -0.0000 0.0000  -0.0000 -0.0000 0.0000
0.0000  0.0000  -0.0000 0.0000  0.0000  -0.0000 0.0000  0.0000  -0.0000 0.0000
se(Ve): 
0.0000
0.0000  0.0000
0.0000  0.0000  0.0000
0.0000  0.0000  -nan    -nan
0.0000  0.0000  -nan    -nan    0.0000
0.0000  0.0000  0.0000  -nan    -nan    0.0000
0.0000  0.0000  -nan    -nan    0.0000  -nan    -nan
-nan    -nan    -nan    -nan    0.0000  -nan    0.0000  -nan
0.0000  0.0000  -nan    -nan    -nan    -nan    0.0000  -nan    -nan
0.0000  0.0000  -nan    -nan    -nan    0.0000  0.0000  -nan    0.0000  0.0000
MLE likelihood = -9060217478156.0625
GSL ERROR: matrix is singular in lu.c at line 266 errno 1

When I run with less phenotypes we get

 ~/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 4 6   -k ./GWA_2017.centered.cXX.txt -maf 0.1 -outdir ./skull/ -snps selected_200_SNPs.txt -debug

##
## GEMMA Version    = 0.97 (2017/12/19)
## GSL Version      = 2.4
## Eigen Version    = 3.3.4
## OpenBlas         = OpenBLAS 0.3.0.dev  - NO_AFFINITY HASWELL
##   arch           = HASWELL
##   threads        = 8
##   parallel type  = threaded
##
## Command Line Input = /home/wrk/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 4 6 -k ./GWA_2017.centered.cXX.txt -maf 0.1 -outdir ./skull/ -snps selected_200_SNPs.txt -debug 
##
## Date = Fri Jun 29 13:42:48 2018
##
## Summary Statistics:
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 3
## number of total SNPs/var = 73527
## number of analyzed SNPs/var = 182
## REMLE log-likelihood in the null model = 447.058
## MLE log-likelihood in the null model = -1.78495e+09
## REMLE estimate for Vg in the null model: 
0.000441807
-7.55708e-06    0.000265048
1.5968e-05      -1.5201e-05     0.000153374
## se(Vg): 
0.000111597
6.46807e-05     6.10626e-05
5.13028e-05     2.98818e-05     5.03876e-05
## REMLE estimate for Ve in the null model: 
2.14138e-05
-3.21985e-06    1.68684e-06
-1.21203e-05    -1.80521e-06    1.82497e-05
## se(Ve): 
1.72819e-05
5.62131e-06     2.21151e-06
1.07007e-05     -nan    1.17464e-05
## MLE estimate for Vg in the null model: 
0.000455099     -1.55256e-05    1.40252e-05
-1.55256e-05    0.000269695     -1.75401e-05
1.40252e-05     -1.75401e-05    0.00016208
## se(Vg): 
0.000104231
5.57158e-05     5.63383e-05
4.66896e-05     3.62062e-05     4.24585e-05
## MLE estimate for Ve in the null model: 
1.62625e-05     -1.99967e-06    -1.05415e-05
-1.99967e-06    1.05748e-06     -1.24257e-06
-1.05415e-05    -1.24257e-06    1.47746e-05
## se(Ve): 
2.94059e-23
5.38615e-24     5.59049e-24
4.19386e-24     4.11504e-24     -nan
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file): 
0.00169357
0.000552855
-0.000453467
## se(B): 
0.000661073
0.000185541
0.000610281
##
## Computation Time:
## total computation time = 0.0532221 min 
## computation time break down: 
##      time on eigen-decomposition = 7.95333e-05 min 
##      time on calculating UtX = 1.60833e-05 min 
##      time on optimization = 0.0286565 min 
##

pjotrp · 2018-07-27T01:48:31Z

Commit 70f4196 shows that in all cases eigen decomposition throws FP exceptions when enabling hardware checking. This follows #161.

pjotrp · 2018-08-30T15:18:11Z

The problem is in calcPab which generates NaN in some cases. Not completely clear what it is yet, but I think it has to do with uninitialized W. Related to #94

Closes genetics-statistics#61

pjotrp · 2018-09-06T13:06:32Z

I fixed the NaN issue in se(Ve) with above commit. Also for the multiple phenotypes we get an error where it matters:

./bin/gemma -bfile ./frontiers -lmm 1 -n 1 2 3 4 5 6 7 8 9 10 -k ./GWA_2017.centered.cXX.txt -maf 0.3 -outdir ./skull/ -snps selected_200_SNPs.txt -no-check
GEMMA 0.98-beta1 (2018-09-06) by Xiang Zhou and team (C) 2012-2018
Reading Files ... 
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 10
## number of total SNPs/var        =    73527
## number of considered SNPS       =      200
## number of analyzed SNPs         =       82
Start Eigen-Decomposition...
**** WARNING: Matrix G has 3 eigenvalues close to zero in src/lapack.cpp at line 280 in EigenDecomp_Zeroed
**** WARNING: Matrix G has more than one negative eigenvalues! in src/lapack.cpp at line 284 in EigenDecomp_Zeroed
**** WARNING: Matrix G has 3 eigenvalues close to zero in src/lapack.cpp at line 280 in EigenDecomp_Zeroed
**** WARNING: Matrix G has more than one negative eigenvalues! in src/lapack.cpp at line 284 in EigenDecomp_Zeroed
ERROR: Enforce failed for Trying to take the log of -0.138747 in src/mathfunc.cpp at line 108 in safe_log

It may be possible to solve this, but it is a different issue.

hans-recknagel · 2018-12-04T16:53:33Z

I am running into a similar error message when running the multivariate linear model. Running the same phenotypes individually using a LMM does not result in any errors. Only difference is that I have excluded Individuals that have any incomplete phenotype rows in the multivariate analysis. I am running the latest version of gemma on a macOS Sierra.

gemma -bfile males_dots_imputed_mlm -k centered_relationship_matrix.cXX.txt -lmm 1 -n 1 2 -o mlm_male_dots
GEMMA 0.98 (2018-09-28) by Xiang Zhou and team (C) 2012-2018
Reading Files ...

number of total individuals = 171

number of analyzed individuals = 171

number of covariates = 1

number of phenotypes = 2

number of total SNPs/var = 83655

number of analyzed SNPs = 83305

Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
0.0101
1.3313 568.4998
se(Vg):
0.0047
0.9901 341.0558
REMLE estimate for Ve in the null model:
0.0009
0.0298 1.0678
se(Ve):
0.0012
0.2447 86.1451
REMLE likelihood = -404.9952
MLE estimate for Vg in the null model:
0.0136
1.4578 573.3371
se(Vg):
0.0015
0.2417 62.1871
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.0000
0.0000 0.0000
MLE likelihood = -198158383604301984.0000
GSL ERROR: matrix is singular in lu.c at line 262 errno 1

pjotrp · 2018-12-04T23:48:02Z

@hans-recknagel do you mind sharing the dataset with me so I can replicate the problem?

hans-recknagel · 2018-12-05T10:50:03Z

Yes, I've sent them to you by mail. Thanks.

pjotrp · 2018-12-05T14:22:19Z

Looks to me like the covariates are highly correlated. mvlmm does not like that.

pjotrp · 2018-12-05T14:37:22Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Singular Ve matrix #61

Singular Ve matrix #61

davidhoule commented Jul 28, 2017

pcarbo commented Jul 29, 2017

davidhoule commented Jul 29, 2017

pcarbo commented Jul 29, 2017 •

edited

Loading

davidhoule commented Jul 31, 2017 •

edited by pcarbo

Loading

pcarbo commented Aug 1, 2017 •

edited

Loading

davidhoule commented Aug 1, 2017

pjotrp commented Aug 14, 2017

pjotrp commented Aug 20, 2017

pjotrp commented Sep 12, 2017

pjotrp commented Jun 29, 2018

pjotrp commented Jul 27, 2018

pjotrp commented Aug 30, 2018

pjotrp commented Sep 6, 2018

hans-recknagel commented Dec 4, 2018

pjotrp commented Dec 4, 2018

hans-recknagel commented Dec 5, 2018

pjotrp commented Dec 5, 2018

pjotrp commented Dec 5, 2018

hans-recknagel commented Dec 5, 2018

pjotrp commented Dec 5, 2018

hans-recknagel commented Dec 12, 2018

pjotrp commented Dec 13, 2018

Singular Ve matrix #61

Singular Ve matrix #61

Comments

davidhoule commented Jul 28, 2017

pcarbo commented Jul 29, 2017

davidhoule commented Jul 29, 2017

pcarbo commented Jul 29, 2017 • edited Loading

davidhoule commented Jul 31, 2017 • edited by pcarbo Loading

pcarbo commented Aug 1, 2017 • edited Loading

davidhoule commented Aug 1, 2017

pjotrp commented Aug 14, 2017

pjotrp commented Aug 20, 2017

pjotrp commented Sep 12, 2017

pjotrp commented Jun 29, 2018

pjotrp commented Jul 27, 2018

pjotrp commented Aug 30, 2018

pjotrp commented Sep 6, 2018

hans-recknagel commented Dec 4, 2018

number of total individuals = 171

number of analyzed individuals = 171

number of covariates = 1

number of phenotypes = 2

number of total SNPs/var = 83655

number of analyzed SNPs = 83305

pjotrp commented Dec 4, 2018

hans-recknagel commented Dec 5, 2018

pjotrp commented Dec 5, 2018

pjotrp commented Dec 5, 2018

hans-recknagel commented Dec 5, 2018

pjotrp commented Dec 5, 2018

hans-recknagel commented Dec 12, 2018

pjotrp commented Dec 13, 2018

pcarbo commented Jul 29, 2017 •

edited

Loading

davidhoule commented Jul 31, 2017 •

edited by pcarbo

Loading

pcarbo commented Aug 1, 2017 •

edited

Loading