Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results using GEMMA 0.98 vs. GEMMA 0.96 #188

Closed
voichek opened this issue Dec 2, 2018 · 13 comments
Closed

Different results using GEMMA 0.98 vs. GEMMA 0.96 #188

voichek opened this issue Dec 2, 2018 · 13 comments

Comments

@voichek
Copy link

voichek commented Dec 2, 2018

Hi,

I have run the same data with GEMMA version 0.96 and version 0.98. The results are different (0.98 are incorrect) for some reason.

The command line:

$ gemma -bfile snps.plink -lmm 2 -k K  -o g96 -maf 0.05 -miss 0.5 -n 1
Reading Files ...

$ gemma-0.98-linux-static -bfile snps.plink -lmm 2 -k K  -o g98 -maf 0.05 -miss 0.5 -n 1
GEMMA 0.98 (2018-09-28) by Xiang Zhou and team (C) 2012-2018
Reading Files ...

Results:

output$ cat g98.assoc.txt | sort -gk10 | head
chr     rs      ps      n_miss  allele1 allele0 af      logl_H1 l_mle   p_lrt
2       .       216384  155     A       G       0.060   -3.354064e+03   1.000000e+05    1.646839e-07
1       .       29609974        25      T       A       0.094   -3.356651e+03   1.000000e+05    2.410560e-06
1       .       28431194        26      A       G       0.114   -3.356769e+03   1.000000e+05    2.727838e-06
4       .       521246  262     G       A       0.057   -3.357233e+03   1.000000e+05    4.423640e-06
1       .       29610020        34      C       T       0.100   -3.358208e+03   1.000000e+05    1.226556e-05
3       .       10188537        280     A       G       0.052   -3.358253e+03   1.000000e+05    1.285749e-05
3       .       11036701        428     T       C       0.074   -3.358303e+03   1.000000e+05    1.354579e-05
1       .       29609929        27      T       A       0.090   -3.358513e+03   1.000000e+05    1.688385e-05
2       .       4249391 146     G       A       0.163   -3.358633e+03   1.000000e+05    1.914765e-05

output$ cat g96.assoc.txt | sort -gk9 | head
chr     rs      ps      n_miss  allele1 allele0 af      l_mle   p_lrt
5       .       18590327        24      A       C       0.466   1.000000e+05    2.788929e-10
1       .       24339560        29      G       T       0.433   1.000000e+05    3.300386e-10
5       .       18592889        33      T       A       0.257   1.000000e+05    5.178977e-10
5       .       3188327 19      T       C       0.231   1.000000e+05    7.064749e-10
5       .       23193229        212     T       C       0.465   1.000000e+05    3.301685e-09
5       .       18600223        29      T       G       0.498   1.000000e+05    3.936730e-09
5       .       23184412        168     G       A       0.311   1.000000e+05    5.107127e-09
5       .       18590501        43      G       C       0.163   1.000000e+05    5.329060e-09
5       .       18600802        84      A       T       0.530   1.000000e+05    6.104173e-09

I am now uploading the data to my google drive and can share a link soon in a private mail.

Thanks for the help,
Yoav

@pjotrp
Copy link
Member

pjotrp commented Dec 4, 2018

@voichek please do share.

@pjotrp pjotrp self-assigned this Dec 4, 2018
@pjotrp pjotrp added the bug label Dec 4, 2018
@pjotrp pjotrp added this to the 0.99 release milestone Dec 4, 2018
@voichek
Copy link
Author

voichek commented Dec 5, 2018

@pjotrp I have sent you a mail a few days ago with the link (to [email protected])

@pjotrp
Copy link
Member

pjotrp commented Dec 5, 2018

@voichek it is not good practise to put E-mail addresses online.

@pjotrp
Copy link
Member

pjotrp commented Dec 5, 2018

@voichek thanks for sharing. I agree there is something wrong with the results. I just reproduced the issue.
Not sure about the reason, but

beta estimate in the null model =   3.38458e-13

is the main difference. I'll need to git bisect until I get to a commit that gives the difference.

I'll need to look into this and will make it a priority after 15/12. Can't really do it earlier because I'll be at http://2018.biohackathon.org/.

Suggest you use 0.96 with this dataset for now.

@pjotrp
Copy link
Member

pjotrp commented Dec 5, 2018

v0.97 gives the same result as v0.98. So it is between 0.96 and 0.97 something changed.

@voichek
Copy link
Author

voichek commented Dec 5, 2018

@voichek it is not good practise to put E-mail addresses online.

I am sorry for that, I didn't know it was a problem

@pjotrp
Copy link
Member

pjotrp commented Dec 9, 2018

I created a small test dataset which reproduces the problem with

in bim file :%s/\./\=printf("%d", line('.'))/g
~/opt/plink-ng/bin/plink-ng --bfile snps.plink --maf 0.05 --geno 0.05 --write-snplist
head -2000 plink.snplist > plink-2000.snplist
~/opt/plink-ng/bin/plink --bfile snps.plink --extract plink-2000.snplist --out 2000 --make-bed

The first line gives a unique name to each SNP. Now the head looks like

==> output/gemma-0.96-guix.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      l_mle   p_lrt
1       32      363     46      G       C       0.069   1.000000e+05    2.532838e-01
1       118     1645    47      T       G       0.172   1.000000e+05    2.386120e-01
1       119     1646    41      G       A       0.171   1.000000e+05    1.824645e-01
1       134     2309    21      A       T       0.277   1.000000e+05    9.233790e-01
1       135     2310    25      A       C       0.278   1.000000e+05    9.337231e-01
1       144     2554    30      T       C       0.061   1.000000e+05    4.128753e-01
1       148     2673    18      T       C       0.087   1.000000e+05    4.771134e-01
1       162     3102    31      G       A       0.311   1.000000e+05    1.224502e-01
1       182     4648    18      A       C       0.295   1.000000e+05    6.451410e-01

==> output/gemma-0.97.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      logl_H1 l_mle   p_lrt
1       32      363     46      G       C       0.069   -3.366741e+03   1.000000e+05    1.517668e-01
1       118     1645    47      T       G       0.172   -3.367567e+03   1.000000e+05    5.251397e-01
1       119     1646    41      G       A       0.171   -3.367638e+03   1.000000e+05    6.097905e-01
1       134     2309    21      A       T       0.277   -3.365637e+03   1.000000e+05    3.896258e-02
1       135     2310    25      A       C       0.278   -3.365487e+03   1.000000e+05    3.266581e-02
1       144     2554    30      T       C       0.061   -3.367360e+03   1.000000e+05    3.662623e-01
1       148     2673    18      T       C       0.087   -3.367667e+03   1.000000e+05    6.522588e-01
1       162     3102    31      G       A       0.311   -3.365939e+03   1.000000e+05    5.575837e-02
1       182     4648    18      A       C       0.295   -3.365339e+03   1.000000e+05    2.750539e-02

Now I can run git bisect to find the exact commit that creates the difference.

@voichek
Copy link
Author

voichek commented Dec 9, 2018

@pjotrp I had an idea, can it be that it just calculate the associations for the wrong phenotype in the file? there are a few phenotypes and I am trying to take the first one...

@pjotrp
Copy link
Member

pjotrp commented Dec 9, 2018

Certainly a possibility. It is easy to test by changing the number. i'll try tomorrow.

@pjotrp
Copy link
Member

pjotrp commented Dec 10, 2018

So, commit 4c21290 which is official 0.96 renders NaN on my machine weather I build with atlas or openblas. Sadly 0.96 was created in a time we can not reproduce builds. Annoying. I am trying a more recent commit.

@pjotrp
Copy link
Member

pjotrp commented Dec 10, 2018

So df1e049 is the commit that changed away from 0.96 results:

==> output/gemma-0.96-last.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      l_mle   p_lrt
1       32      363     46      G       C       0.069   1.000000e+05    2.532838e-01
1       118     1645    47      T       G       0.172   1.000000e+05    2.386120e-01
1       119     1646    41      G       A       0.171   1.000000e+05    1.824645e-01
1       134     2309    21      A       T       0.277   1.000000e+05    9.233790e-01

==> output/gemma-0.97-last.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      logl_H1 l_mle   p_lrt
1       32      363     46      G       C       0.069   -3.366741e+03   1.000000e+05    1.517668e-01
1       118     1645    47      T       G       0.172   -3.367567e+03   1.000000e+05    5.251397e-01
1       119     1646    41      G       A       0.171   -3.367638e+03   1.000000e+05    6.097905e-01
1       134     2309    21      A       T       0.277   -3.365637e+03   1.000000e+05    3.896258e-02

==> output/gemma-0.97.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      logl_H1 l_mle   p_lrt
1       32      363     46      G       C       0.069   -3.366741e+03   1.000000e+05    1.517668e-01
1       118     1645    47      T       G       0.172   -3.367567e+03   1.000000e+05    5.251397e-01
1       119     1646    41      G       A       0.171   -3.367638e+03   1.000000e+05    6.097905e-01
1       134     2309    21      A       T       0.277   -3.365637e+03   1.000000e+05    3.896258e-02

==> output/gemma-bisect.assoc.txt <==
chr     rs      ps      n_miss  allele1 allele0 af      se      logl_H1 l_mle   p_lrt
1       119     1646    41      G       A       0.171   0.000000e+00    -3.366880e+03   1.000000e+05    1.824645e-01
1       134     2309    21      A       T       0.277   0.000000e+00    -3.367764e+03   1.000000e+05    9.233790e-01
1       135     2310    25      A       C       0.278   0.000000e+00    -3.367765e+03   1.000000e+05    9.337231e-01
1       144     2554    30      T       C       0.061   0.000000e+00    -3.367433e+03   1.000000e+05    4.128753e-01

And this is the commit that fails f8d6ad4

@pjotrp
Copy link
Member

pjotrp commented Dec 10, 2018

Built and compiled with

~/.config/guix/current/bin/guix environment -C guix --ad-hoc gcc gdb gfortran:lib gsl eigen lapack atlas openblas zlib bash ld-wrapper perl ldc
make EIGEN_INCLUDE_PATH=$GUIX_ENVIRONMENT/include/eigen3 WITH_OPENBLAS=1 FORCE_DYNAMIC=1 -j 8

pjotrp added a commit to genenetwork/GEMMA that referenced this issue Dec 10, 2018
@pjotrp
Copy link
Member

pjotrp commented Dec 10, 2018

@voichek thanks for reporting. There was a regression in f8d6ad4 that was never caught because I had no missing genotypes in the test files for plink. I'll make a binary available which you can test.

@pjotrp pjotrp closed this as completed in 48341d1 Dec 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants