Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singular Ve matrix #61

Closed
davidhoule opened this issue Jul 28, 2017 · 22 comments
Closed

Singular Ve matrix #61

davidhoule opened this issue Jul 28, 2017 · 22 comments

Comments

@davidhoule
Copy link

I have installed the program and can run your example data set without problem. Similarly, I can run the program on small subsets of the SNPs with my phenotypic data. However, when I calculate relatedness based on the whole genome, the estimates of the relatedness matrix calculated in GEMMA appear to cause problems. I have diagnosed the relatedness matrix as the problem by using the full relatedness matrices in the sample example analyses of my data (two traits, three snps) that run with a relatedness matrix calculated from just the three snps.

The error is consistent, in that the program estimates a 0 Ve matrix, then crashes because of a singular matrix error:
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
1.6815
0.1562 1.6745
se(Vg):
0.3784
-nan -nan
REMLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.1228
-nan -nan
REMLE likelihood = -478.9822
MLE estimate for Vg in the null model:
1.6815
0.1563 1.6745
se(Vg):
0.1758
0.1246 0.1751
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
-nan
0.0000 0.0000
MLE likelihood = -140737488355560.3750
gsl: lu.c:262: ERROR: matrix is singular========================100.00%

It may be that the relatedness in my data set is the problem. It is certainly not what you find in human data. I am studying a set of 184 largely inbred lines, the Drosophila Genome Reference Panel. The majority of sites are fixed, but the proportion of heterozygotes is maybe 5% on average, but that varies among lines. In addition, about 5% of the line pairs are more related than second cousins, and a few seem to be full sibs. Bottom line is that genotypes are always very far from Hardy-Weinberg. I have not filtered the genome for high LD SNP pairs for the calculation of relatedness, although I am aware that this will pose problems for the actual genome-wide association analysis with more than a few SNPs.

I would be happy to furnish example data sets that create the problem if that would be helpful.

@pcarbo
Copy link
Collaborator

pcarbo commented Jul 29, 2017

@davidhoule Your problem may be related to Issue #58. What version of GEMMA are you using, and which operating system?

@davidhoule
Copy link
Author

I am using Ubuntu 16.04.2. I updated to version GEMMA 0.97, with the same results. Relatedness matrix that causes problems has just one (very near 0) negative eigenvalue. Bending matrix to all positive eigenvalues has no effect on the error. Other toy relatedness matrices with many negative near 0 eigenvalues run without problem.

@pcarbo
Copy link
Collaborator

pcarbo commented Jul 29, 2017

@davidhoule Can you please try downloading the pre-compiled binary for GEMMA v0.96 Linux x86 and see if that works for you? We recently uncovered a bug with linking to GSL 2.x and the bugs looks very similar to the one you reported.

@davidhoule
Copy link
Author

davidhoule commented Jul 31, 2017

Tried the binary you furnished with the same result.

Then did an analysis using a relatedness matrix estimated in smartpca, part of the Eigensoft package. With this matrix, all versions of the program will run on my toy data set. Estimates of Ve are still 0, but tests are produced, and there are no errors:

Reading Files ...
## number of total individuals = 184
## number of analyzed individuals = 184
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 3
## number of analyzed SNPs = 3
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
0.8298
0.0682  0.7795
se(Vg):
-nan
0.1064  0.1491
REMLE estimate for Ve in the null model:
0.0000
0.0000  0.0388
se(Ve):
-nan
0.0804  0.1126
REMLE likelihood = -475.4224
MLE estimate for Vg in the null model:
0.8298
0.0685  0.8233
se(Vg):
0.0868
0.0613  0.0861
MLE estimate for Ve in the null model:
0.0000
0.0000  0.0000
se(Ve):
0.0000
0.0000  0.0000
MLE likelihood = 70368744177431.5234
Reading SNPs  ==================================================100.00%

I assume that the two nan SE are still indicating a problem?

@pcarbo
Copy link
Collaborator

pcarbo commented Aug 1, 2017

@davidhoule Great progress. It sounds like your siutation is related to Issue #45. In some cases it appears that a different method for computing the relatedness matrix yields more or less numerically stable results, although it is hard to say exactly what the root cause is. I believe that the computation requires tha all the eigenvalues to be positive. It might be useful to check this before running the LMM analysis in GEMMA with the -eigen option.

It is useful to look at output file result.log.txt rather than the console output because it gives you the numbers in more precision.

Did you try the univariate LMM analysis? Do you get this problem in the univariate LMM analysis?

@davidhoule
Copy link
Author

Univariate analyses do not seem to have the problem. The actual log.txt file does show very slightly positive estimates of Ve terms, and of all VE SE, except the one nan.

I suspect that there must be an 'effective 0' tolerance, as I have examples that run with many very slightly negative eigenvalues.

Thanks for your help with this issue.

@pjotrp
Copy link
Member

pjotrp commented Aug 14, 2017

Possibly this is fixed by #69. Please try the next release. If it does not resolve send me a dataset.

@pjotrp pjotrp self-assigned this Aug 14, 2017
@pjotrp
Copy link
Member

pjotrp commented Aug 20, 2017

@davidhoule can you send me the toy dataset?

@pjotrp
Copy link
Member

pjotrp commented Sep 12, 2017

No response

@pjotrp pjotrp closed this as completed Sep 12, 2017
@pjotrp
Copy link
Member

pjotrp commented Jun 29, 2018

We have a dataset now for testing that shows similar characteristics. Running -lmm 2 renders

##
## GEMMA Version    = 0.98-pre1 (2018/06/29)
## GSL Version      = 2.4
## Eigen Version    = 3.2.9
## OpenBlas         = OpenBLAS 0.2.19  - NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell
##   arch           = Haswell
##   threads        = 8
##   parallel type  = threaded
##
## Command Line Input = ./bin/gemma -bfile ./frontiers -lmm 2 -k ./GWA_2017.centered.cXX.txt -maf 0.05 -outdir ./skull/ -snps selected_200_SNPs.txt -debug -legacy -no-check 
##
## Date = Fri Jun 29 13:38:52 2018
##
## Summary Statistics:
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs/var = 73527
## number of analyzed SNPs/var = 200
## REMLE log-likelihood in the null model = 135.494
## MLE log-likelihood in the null model = 140.231
## pve estimate in the null model = 0.958692
## se(pve) in the null model = 0.037907
## vg estimate in the null model = 0.000473816
## ve estimate in the null model = 1.46539e-05
## beta estimate in the null model =   0.00169357
## se(beta) =   0.000546862
##
## Computation Time:
## total computation time = 0.0139996 min 
## computation time break down: 
##      time on eigen-decomposition = 0.00010455 min 
##      time on calculating UtX = 1.55333e-05 min 
##      time on optimization = 0.00404858 min 
##

while

~/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 2 3 4 5 6 7 8 9 10 -k ./GWA_2017.centered.cXX.txt -maf 0.3 -outdir ./skull/ -snps selected_200_SNPs.txt -debug

renders

MLE estimate for Vg in the null model: 
0.0004
-0.0001 0.0002
-0.0000 -0.0000 0.0002
0.0000  0.0001  0.0000  0.0002
0.0000  0.0000  -0.0000 -0.0000 0.0001
-0.0000 0.0000  -0.0000 -0.0000 0.0000  0.0002
-0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0001
-0.0000 0.0000  -0.0000 0.0000  -0.0000 0.0000  0.0001  0.0001
-0.0000 0.0000  0.0000  0.0000  0.0000  0.0000  -0.0000 0.0000  0.0001
-0.0000 -0.0000 0.0000  -0.0000 -0.0000 -0.0000 -0.0000 0.0000  0.0000  0.0001
se(Vg): 
0.0001
0.0001  0.0001
0.0001  0.0001  0.0001
0.0001  0.0000  0.0001  0.0001
0.0000  0.0001  -nan    0.0000  0.0001
0.0000  0.0000  0.0001  0.0000  0.0000  0.0000
0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  0.0001  0.0000  0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  0.0001  0.0000  0.0001  0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  -nan    0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  -nan
MLE estimate for Ve in the null model: 
0.0000
0.0000  0.0001
-0.0000 0.0000  0.0001
-0.0000 -0.0000 -0.0000 0.0000
0.0000  -0.0000 0.0000  0.0000  0.0001
-0.0000 0.0000  0.0000  -0.0000 -0.0000 0.0000
0.0000  0.0000  -0.0000 0.0000  0.0000  0.0000  0.0000
0.0000  0.0000  -0.0000 -0.0000 0.0000  -0.0000 -0.0000 0.0000
0.0000  -0.0000 -0.0000 0.0000  -0.0000 0.0000  -0.0000 -0.0000 0.0000
0.0000  0.0000  -0.0000 0.0000  0.0000  -0.0000 0.0000  0.0000  -0.0000 0.0000
se(Ve): 
0.0000
0.0000  0.0000
0.0000  0.0000  0.0000
0.0000  0.0000  -nan    -nan
0.0000  0.0000  -nan    -nan    0.0000
0.0000  0.0000  0.0000  -nan    -nan    0.0000
0.0000  0.0000  -nan    -nan    0.0000  -nan    -nan
-nan    -nan    -nan    -nan    0.0000  -nan    0.0000  -nan
0.0000  0.0000  -nan    -nan    -nan    -nan    0.0000  -nan    -nan
0.0000  0.0000  -nan    -nan    -nan    0.0000  0.0000  -nan    0.0000  0.0000
MLE likelihood = -9060217478156.0625
GSL ERROR: matrix is singular in lu.c at line 266 errno 1

When I run with less phenotypes we get

 ~/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 4 6   -k ./GWA_2017.centered.cXX.txt -maf 0.1 -outdir ./skull/ -snps selected_200_SNPs.txt -debug

##
## GEMMA Version    = 0.97 (2017/12/19)
## GSL Version      = 2.4
## Eigen Version    = 3.3.4
## OpenBlas         = OpenBLAS 0.3.0.dev  - NO_AFFINITY HASWELL
##   arch           = HASWELL
##   threads        = 8
##   parallel type  = threaded
##
## Command Line Input = /home/wrk/.guix-profile/bin/gemma -bfile ./frontiers -lmm 1 -n 1 4 6 -k ./GWA_2017.centered.cXX.txt -maf 0.1 -outdir ./skull/ -snps selected_200_SNPs.txt -debug 
##
## Date = Fri Jun 29 13:42:48 2018
##
## Summary Statistics:
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 3
## number of total SNPs/var = 73527
## number of analyzed SNPs/var = 182
## REMLE log-likelihood in the null model = 447.058
## MLE log-likelihood in the null model = -1.78495e+09
## REMLE estimate for Vg in the null model: 
0.000441807
-7.55708e-06    0.000265048
1.5968e-05      -1.5201e-05     0.000153374
## se(Vg): 
0.000111597
6.46807e-05     6.10626e-05
5.13028e-05     2.98818e-05     5.03876e-05
## REMLE estimate for Ve in the null model: 
2.14138e-05
-3.21985e-06    1.68684e-06
-1.21203e-05    -1.80521e-06    1.82497e-05
## se(Ve): 
1.72819e-05
5.62131e-06     2.21151e-06
1.07007e-05     -nan    1.17464e-05
## MLE estimate for Vg in the null model: 
0.000455099     -1.55256e-05    1.40252e-05
-1.55256e-05    0.000269695     -1.75401e-05
1.40252e-05     -1.75401e-05    0.00016208
## se(Vg): 
0.000104231
5.57158e-05     5.63383e-05
4.66896e-05     3.62062e-05     4.24585e-05
## MLE estimate for Ve in the null model: 
1.62625e-05     -1.99967e-06    -1.05415e-05
-1.99967e-06    1.05748e-06     -1.24257e-06
-1.05415e-05    -1.24257e-06    1.47746e-05
## se(Ve): 
2.94059e-23
5.38615e-24     5.59049e-24
4.19386e-24     4.11504e-24     -nan
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file): 
0.00169357
0.000552855
-0.000453467
## se(B): 
0.000661073
0.000185541
0.000610281
##
## Computation Time:
## total computation time = 0.0532221 min 
## computation time break down: 
##      time on eigen-decomposition = 7.95333e-05 min 
##      time on calculating UtX = 1.60833e-05 min 
##      time on optimization = 0.0286565 min 
##

@pjotrp pjotrp reopened this Jun 29, 2018
@pjotrp
Copy link
Member

pjotrp commented Jul 27, 2018

Commit 70f4196 shows that in all cases eigen decomposition throws FP exceptions when enabling hardware checking. This follows #161.

@pjotrp
Copy link
Member

pjotrp commented Aug 30, 2018

The problem is in calcPab which generates NaN in some cases. Not completely clear what it is yet, but I think it has to do with uninitialized W. Related to #94

pjotrp added a commit to genenetwork/GEMMA that referenced this issue Sep 6, 2018
@pjotrp
Copy link
Member

pjotrp commented Sep 6, 2018

I fixed the NaN issue in se(Ve) with above commit. Also for the multiple phenotypes we get an error where it matters:

./bin/gemma -bfile ./frontiers -lmm 1 -n 1 2 3 4 5 6 7 8 9 10 -k ./GWA_2017.centered.cXX.txt -maf 0.3 -outdir ./skull/ -snps selected_200_SNPs.txt -no-check
GEMMA 0.98-beta1 (2018-09-06) by Xiang Zhou and team (C) 2012-2018
Reading Files ... 
## number of total individuals = 49
## number of analyzed individuals = 49
## number of covariates = 1
## number of phenotypes = 10
## number of total SNPs/var        =    73527
## number of considered SNPS       =      200
## number of analyzed SNPs         =       82
Start Eigen-Decomposition...
**** WARNING: Matrix G has 3 eigenvalues close to zero in src/lapack.cpp at line 280 in EigenDecomp_Zeroed
**** WARNING: Matrix G has more than one negative eigenvalues! in src/lapack.cpp at line 284 in EigenDecomp_Zeroed
**** WARNING: Matrix G has 3 eigenvalues close to zero in src/lapack.cpp at line 280 in EigenDecomp_Zeroed
**** WARNING: Matrix G has more than one negative eigenvalues! in src/lapack.cpp at line 284 in EigenDecomp_Zeroed
ERROR: Enforce failed for Trying to take the log of -0.138747 in src/mathfunc.cpp at line 108 in safe_log

It may be possible to solve this, but it is a different issue.

@pjotrp pjotrp closed this as completed Sep 6, 2018
@hans-recknagel
Copy link

I am running into a similar error message when running the multivariate linear model. Running the same phenotypes individually using a LMM does not result in any errors. Only difference is that I have excluded Individuals that have any incomplete phenotype rows in the multivariate analysis. I am running the latest version of gemma on a macOS Sierra.

gemma -bfile males_dots_imputed_mlm -k centered_relationship_matrix.cXX.txt -lmm 1 -n 1 2 -o mlm_male_dots
GEMMA 0.98 (2018-09-28) by Xiang Zhou and team (C) 2012-2018
Reading Files ...

number of total individuals = 171

number of analyzed individuals = 171

number of covariates = 1

number of phenotypes = 2

number of total SNPs/var = 83655

number of analyzed SNPs = 83305

Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
0.0101
1.3313 568.4998
se(Vg):
0.0047
0.9901 341.0558
REMLE estimate for Ve in the null model:
0.0009
0.0298 1.0678
se(Ve):
0.0012
0.2447 86.1451
REMLE likelihood = -404.9952
MLE estimate for Vg in the null model:
0.0136
1.4578 573.3371
se(Vg):
0.0015
0.2417 62.1871
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.0000
0.0000 0.0000
MLE likelihood = -198158383604301984.0000
GSL ERROR: matrix is singular in lu.c at line 262 errno 1

@pjotrp
Copy link
Member

pjotrp commented Dec 4, 2018

@hans-recknagel do you mind sharing the dataset with me so I can replicate the problem?

@hans-recknagel
Copy link

Yes, I've sent them to you by mail. Thanks.

@pjotrp
Copy link
Member

pjotrp commented Dec 5, 2018

Looks to me like the covariates are highly correlated. mvlmm does not like that.

@pjotrp
Copy link
Member

pjotrp commented Dec 5, 2018

See also #175

@hans-recknagel
Copy link

Ok, so correlated variables are not appropriate to use then? I thought because these two different variables capture the phenotype better than just one, but describe pretty much the same phenotype, it would be best to analyse them together.

@pjotrp
Copy link
Member

pjotrp commented Dec 5, 2018

You should not use correlated covariates/phenotypes. There is no benefit anyway. In #175 we'll write a validator which will emit the error.

@hans-recknagel
Copy link

Thanks for your response. I was under the impression that the mvlmm is actually for correlated phenotypes (this is what is says in the the paper by Zhou and Stevens 2014).
The benefit in my particular case would be that in both phenotypic measures there is a measurement error, but by using the two different measures, the accuracy of getting the "true" value of the overall phenotype (and detecting its genetic basis) increases. This would decrease the chances of finding spurious genetic associations resulting from measurement errors.

@pjotrp
Copy link
Member

pjotrp commented Dec 13, 2018

This issue is closed. Please use the mailing list for discussion. You may get help there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants