-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Singular Ve matrix #61
Comments
@davidhoule Your problem may be related to Issue #58. What version of GEMMA are you using, and which operating system? |
I am using Ubuntu 16.04.2. I updated to version GEMMA 0.97, with the same results. Relatedness matrix that causes problems has just one (very near 0) negative eigenvalue. Bending matrix to all positive eigenvalues has no effect on the error. Other toy relatedness matrices with many negative near 0 eigenvalues run without problem. |
@davidhoule Can you please try downloading the pre-compiled binary for GEMMA v0.96 Linux x86 and see if that works for you? We recently uncovered a bug with linking to GSL 2.x and the bugs looks very similar to the one you reported. |
Tried the binary you furnished with the same result. Then did an analysis using a relatedness matrix estimated in smartpca, part of the Eigensoft package. With this matrix, all versions of the program will run on my toy data set. Estimates of Ve are still 0, but tests are produced, and there are no errors:
I assume that the two nan SE are still indicating a problem? |
@davidhoule Great progress. It sounds like your siutation is related to Issue #45. In some cases it appears that a different method for computing the relatedness matrix yields more or less numerically stable results, although it is hard to say exactly what the root cause is. I believe that the computation requires tha all the eigenvalues to be positive. It might be useful to check this before running the LMM analysis in GEMMA with the It is useful to look at output file Did you try the univariate LMM analysis? Do you get this problem in the univariate LMM analysis? |
Univariate analyses do not seem to have the problem. The actual log.txt file does show very slightly positive estimates of Ve terms, and of all VE SE, except the one nan. I suspect that there must be an 'effective 0' tolerance, as I have examples that run with many very slightly negative eigenvalues. Thanks for your help with this issue. |
Possibly this is fixed by #69. Please try the next release. If it does not resolve send me a dataset. |
@davidhoule can you send me the toy dataset? |
No response |
We have a dataset now for testing that shows similar characteristics. Running -lmm 2 renders
while
When I run with less phenotypes we get
|
The problem is in calcPab which generates NaN in some cases. Not completely clear what it is yet, but I think it has to do with uninitialized W. Related to #94 |
I fixed the NaN issue in se(Ve) with above commit. Also for the multiple phenotypes we get an error where it matters:
It may be possible to solve this, but it is a different issue. |
I am running into a similar error message when running the multivariate linear model. Running the same phenotypes individually using a LMM does not result in any errors. Only difference is that I have excluded Individuals that have any incomplete phenotype rows in the multivariate analysis. I am running the latest version of gemma on a macOS Sierra. gemma -bfile males_dots_imputed_mlm -k centered_relationship_matrix.cXX.txt -lmm 1 -n 1 2 -o mlm_male_dots number of total individuals = 171number of analyzed individuals = 171number of covariates = 1number of phenotypes = 2number of total SNPs/var = 83655number of analyzed SNPs = 83305Start Eigen-Decomposition... |
@hans-recknagel do you mind sharing the dataset with me so I can replicate the problem? |
Yes, I've sent them to you by mail. Thanks. |
Looks to me like the covariates are highly correlated. mvlmm does not like that. |
See also #175 |
Ok, so correlated variables are not appropriate to use then? I thought because these two different variables capture the phenotype better than just one, but describe pretty much the same phenotype, it would be best to analyse them together. |
You should not use correlated covariates/phenotypes. There is no benefit anyway. In #175 we'll write a validator which will emit the error. |
Thanks for your response. I was under the impression that the mvlmm is actually for correlated phenotypes (this is what is says in the the paper by Zhou and Stevens 2014). |
This issue is closed. Please use the mailing list for discussion. You may get help there. |
I have installed the program and can run your example data set without problem. Similarly, I can run the program on small subsets of the SNPs with my phenotypic data. However, when I calculate relatedness based on the whole genome, the estimates of the relatedness matrix calculated in GEMMA appear to cause problems. I have diagnosed the relatedness matrix as the problem by using the full relatedness matrices in the sample example analyses of my data (two traits, three snps) that run with a relatedness matrix calculated from just the three snps.
The error is consistent, in that the program estimates a 0 Ve matrix, then crashes because of a singular matrix error:
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
1.6815
0.1562 1.6745
se(Vg):
0.3784
-nan -nan
REMLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.1228
-nan -nan
REMLE likelihood = -478.9822
MLE estimate for Vg in the null model:
1.6815
0.1563 1.6745
se(Vg):
0.1758
0.1246 0.1751
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
-nan
0.0000 0.0000
MLE likelihood = -140737488355560.3750
gsl: lu.c:262: ERROR: matrix is singular========================100.00%
It may be that the relatedness in my data set is the problem. It is certainly not what you find in human data. I am studying a set of 184 largely inbred lines, the Drosophila Genome Reference Panel. The majority of sites are fixed, but the proportion of heterozygotes is maybe 5% on average, but that varies among lines. In addition, about 5% of the line pairs are more related than second cousins, and a few seem to be full sibs. Bottom line is that genotypes are always very far from Hardy-Weinberg. I have not filtered the genome for high LD SNP pairs for the calculation of relatedness, although I am aware that this will pose problems for the actual genome-wide association analysis with more than a few SNPs.
I would be happy to furnish example data sets that create the problem if that would be helpful.
The text was updated successfully, but these errors were encountered: