-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
514 lines (437 loc) · 20.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>FaST-LMM and PySnpTools Project Home and Bibliography</title>
<!-- Bootstrap -->
<link href="css/bootstrap-4.3.1.css" rel="stylesheet">
</head>
<body>
<div class="container mt-2">
<div class="row">
<div class="col-12">
<div class="jumbotron">
<h1 class="text-center">FaST-LMM & PySnpTools</h1>
<h3 class="text-center">Project Home &
Bibliography </h3>
<p class="text-center">Established: October 14, 2006<br>
Last Update: November 3, 2024</p>
<img src="DNA-StrandNIST.1200x400.jpg" alt="" class="img-fluid">
</div>
</div>
</div>
</div>
<div class="container">
<div class="row">
<div class="text-center col-md-6 col-12">
<h3>FaST-LMM</h3>
<p class="text-left">FaST-LMM, which stands for Factored
Spectrally Transformed Linear Mixed Models, is a program for
performing genome-wide association studies (GWAS) on
datasets of all sizes, up to one millions samples. </p>
<p class="text-left">Learn more about Python FaST-LMM and
install from:</p>
<ul>
<li>
<p class="text-left"><a href="https://pypi.org/project/fastlmm/">PyPi</a> or <a
href="https://github.com/fastlmm/FaST-LMM">GitHub</a></p>
</li>
</ul>
<p class="text-left">FaST-LMM runs on Python 3.10, 3.11, 3.12, & 3.13.
It runs on Linux (x64 or ARM), Windows (x64), and Mac (x64 or ARM).
<br>
</p>
<p class="text-left"><em>A older C++ version, including <a
href="https://www.microsoft.com/en-us/download/details.aspx?id=52614">Windows
binary</a>, <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52588">Linux
binary</a>, and <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52559">source</a>,
supports univariate GWAS and limited epistatic testing.</em></p>
</div>
<div class="text-center col-md-6 col-12">
<h3>PySnpTools</h3>
<p class="text-left">PySnpTools is a Python library for
reading and manipulating genetic data. It efficiently reads
genetic PLINK formats (including *.bed/bim/fam files) and the BGEN format. It
also efficiently reads parts of files, reads
kernel data, standardizes data, manipulates data in-memory, and scales to cluster-sized
data.</p>
<p class="text-left">PySnpTools runs on Python 3.10, 3.11, 3.12, & 3.13.
It runs on Linux (x64 and ARM) and Windows. On Mac, it runs on both x64 and ARM.<br>
</p>
</p>
<p class="text-left">Learn more about PySnpTools and install
from:</p>
<ul>
<li>
<p class="text-left"> <a href="https://pypi.org/project/pysnptools/">PyPi</a> or <a
href="https://github.com/fastlmm/PySnpTools">GitHub</a></p>
</li>
</ul>
</div>
<div class="text-center col-md-6 col-12">
<h3>bed-reader</h3>
<p class="text-left">Read and write the PLINK BED format, simply and efficiently. Available for Python or Rust.
</p>
<p class="text-left">Learn more about bed-reader and install
from:</p>
<ul>
<li>
<p class="text-left">Python: <a href="https://pypi.org/project/bed-reader/">PyPi</a> or <a
href="https://github.com/fastlmm/bed-reader">GitHub</a>
(Python 3.10, 3.11, 3.12, & 3.13. Linux [x64, ARM], Windows [x64], & Mac [x64 and ARM])<br>
</p>
</li>
<li>
<p class="text-left">Rust: <a href="https://crates.io/crates/bed-reader">crates.io</a></p>
</li>
</ul>
</div>
</div>
</div>
<hr>
<section>
<h2 class="text-center">Contact</h2>
<div class="container">
<div class="row">
<ul>
<li> Email the developers at<a href="mailtto:%[email protected]">
[email protected]</a>.</li>
<li>Join the Python user discussion and announcement list <a>via email</a> (or use <a
href="https://mail.python.org/mailman3/lists/fastlmm-user.python.org">web
sign up</a>).</li>
<li>Rust <a href="https://github.com/fastlmm/bed-reader/discussions/">discussion of bed-reader</a>. </li>
<li>Open an issue on GitHub for <a href="https://github.com/fastlmm/FaST-LMM/issues">FaST-LMM</a>,
<a href="https://github.com/fastlmm/PySnpTools/issues">PySnpTools</a> or <a
href="https://github.com/fastlmm/bed-reader/issues">bed-reader</a>.
</li>
</ul>
</div>
</div>
</section>
<hr>
<section>
<h2 class="text-center">Full Annotated Bibliography</h2>
<div class="container">
<div class="row">
<p><strong>Univariate GWAS</strong></p>
<dl>
<li>
<font size="-1">[U1]</font> H. Kang, N. Zaitlen, C.
Wade, A. Kirby, D. Heckerman, M. Daly, and E. Eskin, <a
href="http://www.genetics.org/cgi/content/full/178/3/1709">Efficient
Control of Population Structure in Model Organism
Association Mapping</a>, <i>Genetics</i>,
178:1709-1723, March, 2008 (doi:
10.1534/genetics.107.080101).
</li>
<ul>
<li>Describes early efforts to make linear mixed models
more computationally efficient.</li>
</ul>
<li>
<font size="-1">[U2]</font>C. Lippert<strong><sup>*</sup></strong>,
J. Listgarten<strong><sup>*</sup></strong>, Y. Liu, C.M.
Kadie, R.I. Davidson, D. Heckerman<strong><sup>*</sup></strong>. <a
href="http://www.nature.com/nmeth/journal/v8/n10/abs/nmeth.1681.html">FaST
linear mixed models for genome-wide association studies</a>. <em>Nature
Methods</em>, 8: 833-835, Oct 2011
(doi:10.1038/nmeth.1681). (<sup>*</sup>equal
contributions)
</li>
<ul>
<li>Shows how exact linear-mixed-model computations can be
performed in time and memory <em>linear</em> in the
number of individuals when the number of SNPs used in
the similarity matrix is less than the number of
individuals (<em>i.e.,</em> when the similarity matrix
is low rank). This work also describes an approach to
select SNPs to achieve this condition with
linkage-disequilibrium-based pruning. In addition, this
work shows that computations are quadratic in time and
memory when the similarity matrix is full rank.</li>
</ul>
<li>
<font size="-1">[U3]</font> J. Listgarten<strong><sup>*</sup></strong>,
C. Lippert<strong><sup>*</sup></strong>, C.M. Kadie, R.I.
Davidson, E. Eskin, D. Heckerman<strong><sup>*</sup></strong>.
<a href="http://www.nature.com/nmeth/journal/v9/n6/abs/nmeth.2037.html">Improved
linear mixed models for genome-wide association studies</a>. <em>Nature
Methods</em>, 9: 525-526, June 2012
(doi:10.1038/nmeth.2037). (<sup>*</sup>equal
contributions)
</li>
<ul>
<li>Describes a method for selecting SNPs for the
linear-mixed-model similarity matrix by identifying SNPs
that are predictive of the phenotype. A later
publication [U6] shows this approach yields poor control
of type I error, whereas the original selection method
in [U2] performs well. This work also shows that the
inclusion of irrelevant SNPs in the similarity matrix
leads to inflated test statistics and reduced power, a
phenomenon called “dilution”. Although an incorrect
explanation for dilution is offered here, a correction
is given in [U5]. Finally, there is a bug in the
analysis of the synthetic data, which makes the
prediction-based selection method appear to perform
better than it actually does.</li>
</ul>
<li>
<font size="-1">[U4]</font> J. Listgarten<strong><sup>*</sup></strong>,
C. Lippert<strong><sup>*</sup></strong>, D. Heckerman<strong><sup>*</sup></strong>.
<a href="http://www.nature.com/ng/journal/v45/n5/abstract/ng.2620.html">FaST-LMM-Select
for addressing confounding from spatial structure and
rare variants</a>. <em>Nature Genetics </em>(2013)
doi:10.1038/ng.2620 (<sup>*</sup>equal contributions)
</li>
<ul>
<li>Shows how the feature-selection method in [U3]
addresses an open problem in statistical genetics that
had been published in Nature Genetics. Based on results
in [U6], however, we recommend that the selection
approach in [U2] be used instead.</li>
</ul>
<li>
<font size="-1">[U5]</font> C. Lippert<strong><sup>*</sup></strong>,
Gerald Quon, Eun Youg Kang, Carl M. Kadie, J. Listgarten<strong><sup>*</sup></strong>,
D. Heckerman<strong><sup>*</sup></strong>. <a
href="http://www.nature.com/srep/2013/130509/srep01815/full/srep01815.html">The
benefits of selecting phenotype-specific variants for
applications of mixed models in genomics</a>. <em>Scientific
Reports</em>(2013) doi:10.1038/srep01815 (<sup>*</sup>equal
contributions)
</li>
<ul>
<li>Describes additional experiments regarding the
feature-selection method in [U3] as applied to GWAS and
prediction. Again, based on the results in [U6], we
recommend that the selection approach in [U2] be used
instead.</li>
</ul>
<li>
<font size="-1">[U6]</font> C. Widmer*, C. Lippert*, O.
Weissbrod, N. Fusi, C.M. Kadie, R.I. Davidson, J.
Listgarten, and D. Heckerman*. <a
href="http://www.nature.com/srep/2014/141112/srep06874/full/srep06874.html">Further
Improvements to Linear Mixed Models for Genome-Wide
Association Studies</a>. <em>Scientific Reports</em>,
4, 6874, Nov 2014 (doi:10.1038/srep06874). (<sup>*</sup>equal
contributions)
</li>
<ul>
<li>Describes the latest version of FaST-LMM. It shows
that selecting SNPs for the linear-mixed-model
similarity matrix through pruning via linkage
disequilibrium (as in [U2]) works well to control type I
error, whereas selecting SNPs that are predictive of the
phenotype (as in [U3]) does not.</li>
</ul>
<li>
<font size="-1">[U7]</font> C. Lippert and D. Heckerman.
<a href="http://xrds.acm.org/article.cfm?aid=2788502">Computational
and statistical issues in personalized medicine</a>. <em>XRDS</em>
21, 24-27, Summer 2015 (doi:10.1145/2788502).
</li>
<ul>
<li>Describes statistical issues in GWAS with linear mixed
models from a graphical-model perspective.</li>
</ul>
<li>
<font size="-1"><span style="font-size:13.5pt;font-family:"Times New
Roman",serif;color:black">
<o:p></o:p>
</span> [U8]</font>
C. Kadie, D. Heckerman. <a href="https://www.biorxiv.org/content/early/2018/01/03/154682">Ludicrous
Speed Linear Mixed Models for Genome-Wide Association
Studies</a>. <i>BioRXiv</i>, Jan 2018.
</li>
<ul>
<li>Shows how to scale the FaST-LMM in [U2] to 1 million
samples on a cluster.<br>
</li>
</ul>
<li>
<font size="-1">[U9]</font> D. Heckerman. <a href="https://dl.acm.org/citation.cfm?id=3309720">Toward
accounting for hidden common causes when inferring cause
and effect from observational data</a>. <i>ACM
Transactions on Intelligent Systems and Technology</i>,
10, Sept 2019 (doi: 10.1145/3309720).
</li>
<ul>
<li>Describes how linear mixed models account for a hidden confounder
by aggregating small observed signals that reveal the confounder.<br>
</li>
</ul>
</dl>
<br>
<strong>Set Tests for GWAS</strong><br>
<br>
<ul>
<li>
<font size="-1">[S1]</font> Listgarten<strong><sup>*</sup></strong>,
C. Lippert<strong><sup>*</sup></strong>, Eun Youg Kang,
Jing Xiang, Carl M. Kadie, D. Heckerman<strong><sup>*</sup></strong>. <a
href="http://bioinformatics.oxfordjournals.org/content/29/12/1526">A
powerful and efficient set test for genetic markers that
handles confounders.</a> <em>Bioinformatics</em>,
29:1526-1533, April 2013
(doi:10.1093/bioinformatics/btt177). (<sup>*</sup>equal
contributions)
</li>
<ul>
<li>Shows that the LRT can be more powerful than a score
test for set association tests. This work is limited to
similarity matrices that are low rank and includes an
efficient algorithm for this case. This limitation is
relaxed in [S2].</li>
</ul>
<li>
<font size="-1">[S2]</font> C. Lippert, Jing Xiang,
Danilo Horta, Christian Widmer, Carl M. Kadie, D.
Heckerman*, J. Listgarten. <a href="http://bioinformatics.oxfordjournals.org/content/30/22/3206">Greater
power and computational efficiency for kernel-based
association testing of sets of genetic variants</a>. <em>Bioinformatics</em>,
2014 (doi: 10.1093/bioinformatics/btu504). (*corresponding
author)
</li>
<ul>
<li>Makes theoretical arguments and demonstrates
empirically that the LRT is often more powerful than the
traditionally-used score test (e.g. SKAT). It also has
exposition on how to do a number of algebraic
computations for set tests with either a low- or
full-rank background kernel efficiently.</li>
</ul>
</ul>
<p><strong>Data Transformations/Pre-processing for GWAS</strong></p>
<ul>
<li>
<font size="-1">[D1]</font> N. Fusi*, C. Lippert, N. D.
Lawrence and O. Stegle*. <a
href="http://www.nature.com/ncomms/2014/140919/ncomms5890/full/ncomms5890.html">Warped
linear mixed models for the genetic analysis of
transformed phenotypes</a>. <em>Nature Communications</em>,
2014.
</li>
<ul>
<li>Shows how monotonically transforming the phenotype can
increase power in genome-wide association studies and
increase the accuracy of heritability estimation and
phenotype prediction.</li>
</ul>
<li>
<font size="-1">[D2]</font> O. Weissbrod, C. Lippert, D.
Geiger, and D. Heckerman. <a
href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3285.html">Accurate
liability estimation improves power in ascertained
case-control studies</a>. <em>Nature Methods</em>,
Feb 2015 (doi:10.1038/nmeth.3285).
</li>
<ul>
<li>Describes an approach to pre-process ascertained
case-control-study data that leads to improved power
when analyzed with a linear mixed model.</li>
</ul>
</ul>
<p><strong>Epigenetic Cellular Heterogeneity Correction</strong></p>
<ul>
<li>
<font size="-1">[C1]</font> Zou, C. Lippert, D.
Heckerman, M. Aryee, Jennifer Listgarten. <a
href="http://www.nature.com/nmeth/journal/v11/n3/abs/nmeth.2815.html">Epigenome-wide
association studies without the need for cell-type
composition</a>. <em>Nature Methods</em>,
doi:10.1038/NMETH.2815.
</li>
<ul>
<li>Shows how FaST-LMM, with the inclusion of principal
components (PCs) as covariates, can correct for the
confounding effects of multiple cell types. Although a
method for selecting PCs is presented here, the method
in [U6] is now recommended.</li>
</ul>
</ul>
<p><strong>Epistatic Genome-Wide Association</strong></p>
<ul>
<li>
<font size="-1">[E1]</font> Lippert<strong><sup>*</sup></strong>,
J. Listgarten<strong><sup>*</sup></strong>, Robert
Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, D.
Heckerman<strong><sup>*</sup></strong>. <a
href="http://www.nature.com/srep/2013/130122/srep01099/full/srep01099.html">An
Exhaustive Epistatic SNP Association Analysis on
Expanded Wellcome Trust Data</a>, <em>Scientific
Reports</em>, 2013, doi:10.1038/srep01099 (<sup>*</sup>equal
contributions)
</li>
<ul>
<li>Presents results for all pairwise-epistatic tests for
all phenotypes in the WTCCC1 data, using a linear mixed
model with a low-rank similarity matrix based on the
feature-selection method in [U3]. As described, based on
the results in [U6], we now recommend that the
feature-selection method in [U2] be used instead.</li>
</ul>
</ul>
<p><strong>GWAS for “Functional Traits” such as
Longitudinal Traits</strong></p>
<ul>
<li>
<font size="-1">[F1]</font> Fusi and J.
Listgarten. Leveraging Non-Linear Genetic
Effects on Functional Traits for GWAS, <em>Proceedings
of RECOMB 2016.</em>
</li>
<ul>
<li>Introduces a model for performing GWAS
for vector-valued traits which vary smoothly in
time. The framework is expressive
and computationally efficient, but the null model
is not nested inside of the alternative model,
something we are currently addressing in ongoing
work.</li>
</ul>
</ul>
<p><strong>Heritability Estimation</strong></p>
<ul>
<li>
<font size="-1">[H1]</font> N. Furlotte, D. Heckerman,
and C. Lippert. <a
href="http://www.nature.com/jhg/journal/vaop/ncurrent/full/jhg201415a.html">Quantifying
the uncertainty in heritability</a>. <em>Journal
of Human Genetics</em> 27, March 2014 (doi:
10.1038/jhg.2014.15).
</li>
<ul>
<li>Applies the spectral-decomposition trick from FaST-LMM
[2] to speed up Bayesian estimates of heritability.</li>
</ul>
<li>
<font size="-1">[H2]</font> Heckerman, D. Gurdasani, C.
Kadie, C. Pomilla, T. Carstensen, H. Martin, K. Ekoru,
R.N. Nsubuga, G. Ssenyomo A. Kamali, P. Kaleebu, C.
Widmer, and M.S. Sandhu. <a href="http://www.pnas.org/content/113/27/7377.abstract">Linear
mixed model for heritability estimation that explicitly
addresses environmental variation</a>. <em>PNAS</em>,
113: 7377–7382 (doi: 10.1073/pnas.1510497113).
</li>
<ul>
<li>Describes a way to generalize linear mixed models to
take spatial location into account when jointly modeling
the influences of genomics and environment on traits.</li>
</ul>
</ul>
<p> </p>
</div>
</div>
</section>
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="js/jquery-3.3.1.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="js/popper.min.js"></script>
<script src="js/bootstrap-4.3.1.js"></script>
</body>
</html>