index.html

﻿<!DOCTYPE html>
<html lang="en">

<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>FaST-LMM and PySnpTools Project Home and Bibliography</title>
  <!-- Bootstrap -->
  <link href="css/bootstrap-4.3.1.css" rel="stylesheet">
</head>

<body>
  <div class="container mt-2">
    <div class="row">
      <div class="col-12">
        <div class="jumbotron">
          <h1 class="text-center">FaST-LMM &amp; PySnpTools</h1>
          <h3 class="text-center">Project Home &amp;
            Bibliography&nbsp;</h3>
          <p class="text-center">Established: October 14, 2006<br>
            Last Update: November 3, 2024</p>
          <img src="DNA-StrandNIST.1200x400.jpg" alt="" class="img-fluid">
        </div>
      </div>
    </div>
  </div>
  <div class="container">
    <div class="row">
      <div class="text-center col-md-6 col-12">
        <h3>FaST-LMM</h3>
        <p class="text-left">FaST-LMM, which stands for Factored
          Spectrally Transformed Linear Mixed Models, is a program for
          performing genome-wide association studies (GWAS) on
          datasets of all sizes, up to one millions samples. </p>
        <p class="text-left">Learn more about Python FaST-LMM and
          install from:</p>
        <ul>
          <li>
            <p class="text-left"><a href="https://pypi.org/project/fastlmm/">PyPi</a> or <a
                href="https://github.com/fastlmm/FaST-LMM">GitHub</a></p>
          </li>
        </ul>
        <p class="text-left">FaST-LMM runs on Python 3.10, 3.11, 3.12, & 3.13.
           It runs on Linux (x64 or ARM), Windows (x64), and Mac (x64 or ARM).
          <br>
        </p>
        <p class="text-left"><em>A older C++ version, including <a
              href="https://www.microsoft.com/en-us/download/details.aspx?id=52614">Windows


              binary</a>, <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52588">Linux


              binary</a>, and <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52559">source</a>,
            supports univariate GWAS and limited epistatic testing.</em></p>
      </div>
      <div class="text-center col-md-6 col-12">
        <h3>PySnpTools</h3>
        <p class="text-left">PySnpTools is a Python library for
          reading and manipulating genetic data. It efficiently reads
          genetic PLINK formats (including *.bed/bim/fam files) and the BGEN format. It
          also efficiently reads parts of files, reads
          kernel data, standardizes data, manipulates data in-memory, and scales to cluster-sized
          data.</p>
        <p class="text-left">PySnpTools runs on Python 3.10, 3.11, 3.12, & 3.13.
          It runs on Linux (x64 and ARM) and Windows. On Mac, it runs on both x64 and ARM.<br>
        </p>
        </p>
        <p class="text-left">Learn more about PySnpTools and install
          from:</p>
        <ul>
          <li>
            <p class="text-left"> <a href="https://pypi.org/project/pysnptools/">PyPi</a> or <a
                href="https://github.com/fastlmm/PySnpTools">GitHub</a></p>
          </li>
        </ul>
      </div>
      <div class="text-center col-md-6 col-12">
        <h3>bed-reader</h3>
        <p class="text-left">Read and write the PLINK BED format, simply and efficiently. Available for Python or Rust.
        </p>

        <p class="text-left">Learn more about bed-reader and install
          from:</p>
        <ul>
          <li>
            <p class="text-left">Python: <a href="https://pypi.org/project/bed-reader/">PyPi</a> or <a
                href="https://github.com/fastlmm/bed-reader">GitHub</a>
                (Python 3.10, 3.11, 3.12, & 3.13. Linux [x64, ARM], Windows [x64], & Mac [x64 and ARM])<br>
              </p>
          </li>
          <li>
            <p class="text-left">Rust: <a href="https://crates.io/crates/bed-reader">crates.io</a></p>
          </li>
        </ul>
      </div>
    </div>
  </div>
  <hr>
  <section>
    <h2 class="text-center">Contact</h2>
    <div class="container">
      <div class="row">
        <ul>
          <li> Email the developers at<a href="mailtto:%20fastlmm-dev@python.org">
              fastlmm-dev@python.org</a>.</li>
          <li>Join the Python user discussion and announcement list <a>via email</a> (or use&nbsp;<a
              href="https://mail.python.org/mailman3/lists/fastlmm-user.python.org">web


              sign up</a>).</li>
          <li>Rust <a href="https://github.com/fastlmm/bed-reader/discussions/">discussion of bed-reader</a>.&nbsp;</li>
          <li>Open an issue on GitHub for <a href="https://github.com/fastlmm/FaST-LMM/issues">FaST-LMM</a>,
            <a href="https://github.com/fastlmm/PySnpTools/issues">PySnpTools</a> or <a
              href="https://github.com/fastlmm/bed-reader/issues">bed-reader</a>.
          </li>
        </ul>
      </div>
    </div>
  </section>
  <hr>
  <section>
    <h2 class="text-center">Full Annotated Bibliography</h2>
    <div class="container">
      <div class="row">
        <p><strong>Univariate GWAS</strong></p>
        <dl>
          <li>
            <font size="-1">[U1]</font> H. Kang, N. Zaitlen, C.
            Wade, A. Kirby, D. Heckerman, M. Daly, and E. Eskin, <a
              href="http://www.genetics.org/cgi/content/full/178/3/1709">Efficient


              Control of Population Structure in Model Organism
              Association Mapping</a>, <i>Genetics</i>,
            178:1709-1723, March, 2008 (doi:
            10.1534/genetics.107.080101).
          </li>
          <ul>
            <li>Describes early efforts to make linear mixed models
              more computationally efficient.</li>
          </ul>
          <li>
            <font size="-1">[U2]</font>C. Lippert<strong><sup>*</sup></strong>,
            J. Listgarten<strong><sup>*</sup></strong>, Y. Liu, C.M.
            Kadie, R.I. Davidson, D. Heckerman<strong><sup>*</sup></strong>.&nbsp;<a
              href="http://www.nature.com/nmeth/journal/v8/n10/abs/nmeth.1681.html">FaST


              linear mixed models for genome-wide association studies</a>.&nbsp;<em>Nature


              Methods</em>, 8: 833-835, Oct 2011
            (doi:10.1038/nmeth.1681). (<sup>*</sup>equal
            contributions)
          </li>
          <ul>
            <li>Shows how exact linear-mixed-model computations can be
              performed in time and memory <em>linear</em> in the
              number of individuals when the number of SNPs used in
              the similarity matrix is less than the number of
              individuals (<em>i.e.,</em> when the similarity matrix
              is low rank). This work also describes an approach to
              select SNPs to achieve this condition with
              linkage-disequilibrium-based pruning. In addition, this
              work shows that computations are quadratic in time and
              memory when the similarity matrix is full rank.</li>
          </ul>
          <li>
            <font size="-1">[U3]</font> J. Listgarten<strong><sup>*</sup></strong>,
            C. Lippert<strong><sup>*</sup></strong>, C.M. Kadie, R.I.
            Davidson, E. Eskin, D. Heckerman<strong><sup>*</sup></strong>.
            <a href="http://www.nature.com/nmeth/journal/v9/n6/abs/nmeth.2037.html">Improved


              linear mixed models for genome-wide association studies</a>.&nbsp;<em>Nature


              Methods</em>, 9: 525-526, June 2012
            (doi:10.1038/nmeth.2037). (<sup>*</sup>equal
            contributions)
          </li>
          <ul>
            <li>Describes a method for selecting SNPs for the
              linear-mixed-model similarity matrix by identifying SNPs
              that are predictive of the phenotype. A later
              publication [U6] shows this approach yields poor control
              of type I error, whereas the original selection method
              in [U2] performs well. This work also shows that the
              inclusion of irrelevant SNPs in the similarity matrix
              leads to inflated test statistics and reduced power, a
              phenomenon called “dilution”. Although an incorrect
              explanation for dilution is offered here, a correction
              is given in [U5]. Finally, there is a bug in the
              analysis of the synthetic data, which makes the
              prediction-based selection method appear to perform
              better than it actually does.</li>
          </ul>
          <li>
            <font size="-1">[U4]</font> J. Listgarten<strong><sup>*</sup></strong>,
            C. Lippert<strong><sup>*</sup></strong>, D. Heckerman<strong><sup>*</sup></strong>.
            <a href="http://www.nature.com/ng/journal/v45/n5/abstract/ng.2620.html">FaST-LMM-Select


              for addressing confounding from spatial structure and
              rare variants</a>.&nbsp;<em>Nature Genetics </em>(2013)
            doi:10.1038/ng.2620 (<sup>*</sup>equal contributions)
          </li>
          <ul>
            <li>Shows how the feature-selection method in [U3]
              addresses an open problem in statistical genetics that
              had been published in Nature Genetics. Based on results
              in [U6], however, we recommend that the selection
              approach in [U2] be used instead.</li>
          </ul>
          <li>
            <font size="-1">[U5]</font> C. Lippert<strong><sup>*</sup></strong>,
            Gerald Quon, Eun Youg Kang, Carl M. Kadie, J. Listgarten<strong><sup>*</sup></strong>,
            D. Heckerman<strong><sup>*</sup></strong>.&nbsp;<a
              href="http://www.nature.com/srep/2013/130509/srep01815/full/srep01815.html">The


              benefits of selecting phenotype-specific variants for
              applications of mixed models in genomics</a>.&nbsp;<em>Scientific


              Reports</em>(2013) doi:10.1038/srep01815 (<sup>*</sup>equal


            contributions)
          </li>
          <ul>
            <li>Describes additional experiments regarding the
              feature-selection method in [U3] as applied to GWAS and
              prediction. Again, based on the results in [U6], we
              recommend that the selection approach in [U2] be used
              instead.</li>
          </ul>
          <li>
            <font size="-1">[U6]</font> C. Widmer*, C. Lippert*, O.
            Weissbrod, N. Fusi, C.M. Kadie, R.I. Davidson, J.
            Listgarten, and D. Heckerman*.&nbsp;<a
              href="http://www.nature.com/srep/2014/141112/srep06874/full/srep06874.html">Further


              Improvements to Linear Mixed Models for Genome-Wide
              Association Studies</a>. <em>Scientific Reports</em>,
            4, 6874, Nov 2014 (doi:10.1038/srep06874). (<sup>*</sup>equal


            contributions)
          </li>
          <ul>
            <li>Describes the latest version of FaST-LMM. It shows
              that selecting SNPs for the linear-mixed-model
              similarity matrix through pruning via linkage
              disequilibrium (as in [U2]) works well to control type I
              error, whereas selecting SNPs that are predictive of the
              phenotype (as in [U3]) does not.</li>
          </ul>
          <li>
            <font size="-1">[U7]</font> C. Lippert and D. Heckerman.
            <a href="http://xrds.acm.org/article.cfm?aid=2788502">Computational


              and statistical issues in personalized medicine</a>. <em>XRDS</em>
            21, 24-27, Summer 2015 (doi:10.1145/2788502).
          </li>
          <ul>
            <li>Describes statistical issues in GWAS with linear mixed
              models from a graphical-model perspective.</li>
          </ul>
          <li>
            <font size="-1"><span style="font-size:13.5pt;font-family:&quot;Times New
                  Roman&quot;,serif;color:black">
                <o:p></o:p>
              </span> [U8]</font>
            C. Kadie, D. Heckerman.&nbsp; <a href="https://www.biorxiv.org/content/early/2018/01/03/154682">Ludicrous


              Speed Linear Mixed Models for Genome-Wide Association
              Studies</a>. <i>BioRXiv</i>, Jan 2018.
          </li>
          <ul>
            <li>Shows how to scale the FaST-LMM in [U2] to 1 million
              samples on a cluster.<br>
            </li>
          </ul>
          <li>
            <font size="-1">[U9]</font> D. Heckerman.&nbsp; <a href="https://dl.acm.org/citation.cfm?id=3309720">Toward
              accounting for hidden common causes when inferring cause
              and effect from observational data</a>.&nbsp; <i>ACM
              Transactions on Intelligent Systems and Technology</i>,
            10, Sept 2019 (doi: 10.1145/3309720).
          </li>
          <ul>
            <li>Describes how linear mixed models account for a hidden confounder
              by aggregating small observed signals that reveal the confounder.<br>
            </li>
          </ul>
        </dl>
        <br>
        <strong>Set Tests&nbsp;for GWAS</strong><br>
        <br>
        <ul>
          <li>
            <font size="-1">[S1]</font> Listgarten<strong><sup>*</sup></strong>,
            C. Lippert<strong><sup>*</sup></strong>, Eun Youg Kang,
            Jing Xiang, Carl M. Kadie, D. Heckerman<strong><sup>*</sup></strong>.&nbsp;<a
              href="http://bioinformatics.oxfordjournals.org/content/29/12/1526">A
              powerful and efficient set test for genetic markers that
              handles confounders.</a> <em>Bioinformatics</em>,
            29:1526-1533, April 2013
            (doi:10.1093/bioinformatics/btt177). (<sup>*</sup>equal
            contributions)
          </li>
          <ul>
            <li>Shows that the LRT can be more powerful than a score
              test for set association tests. This work is limited to
              similarity matrices that are low rank and includes an
              efficient algorithm for this case. This limitation is
              relaxed in [S2].</li>
          </ul>
          <li>
            <font size="-1">[S2]</font> C. Lippert, Jing Xiang,
            Danilo Horta, Christian Widmer, Carl M. Kadie, D.
            Heckerman*, J. Listgarten. <a href="http://bioinformatics.oxfordjournals.org/content/30/22/3206">Greater


              power and computational efficiency for kernel-based
              association testing of sets of genetic variants</a>.&nbsp;<em>Bioinformatics</em>,
            2014 (doi: 10.1093/bioinformatics/btu504). (*corresponding
            author)
          </li>
          <ul>
            <li>Makes theoretical arguments and demonstrates
              empirically that the LRT is often more powerful than the
              traditionally-used score test (e.g. SKAT). It also has
              exposition on how to do a number of algebraic
              computations for set tests with either a low- or
              full-rank background kernel efficiently.</li>
          </ul>
        </ul>
        <p><strong>Data Transformations/Pre-processing for GWAS</strong></p>
        <ul>
          <li>
            <font size="-1">[D1]</font> N. Fusi*, C. Lippert, N. D.
            Lawrence and O. Stegle*. <a
              href="http://www.nature.com/ncomms/2014/140919/ncomms5890/full/ncomms5890.html">Warped


              linear mixed models for the genetic analysis of
              transformed phenotypes</a>. <em>Nature Communications</em>,
            2014.
          </li>
          <ul>
            <li>Shows how monotonically transforming the phenotype can
              increase power in genome-wide association studies and
              increase the accuracy of heritability estimation and
              phenotype prediction.</li>
          </ul>
          <li>
            <font size="-1">[D2]</font> O. Weissbrod, C. Lippert, D.
            Geiger, and D. Heckerman.&nbsp; <a
              href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3285.html">Accurate


              liability estimation improves power in ascertained
              case-control studies</a>.&nbsp; <em>Nature Methods</em>,
            Feb 2015 (doi:10.1038/nmeth.3285).
          </li>
          <ul>
            <li>Describes an approach to pre-process ascertained
              case-control-study data that leads to improved power
              when analyzed with a linear mixed model.</li>
          </ul>
        </ul>
        <p><strong>Epigenetic Cellular Heterogeneity Correction</strong></p>
        <ul>
          <li>
            <font size="-1">[C1]</font> Zou, C. Lippert, D.
            Heckerman, M. Aryee, Jennifer Listgarten.&nbsp;<a
              href="http://www.nature.com/nmeth/journal/v11/n3/abs/nmeth.2815.html">Epigenome-wide


              association studies without the need for cell-type
              composition</a>.&nbsp;<em>Nature Methods</em>,
            doi:10.1038/NMETH.2815.
          </li>
          <ul>
            <li>Shows how FaST-LMM, with the inclusion of principal
              components (PCs) as covariates, can correct for the
              confounding effects of multiple cell types. Although a
              method for selecting PCs is presented here, the method
              in [U6] is now recommended.</li>
          </ul>
        </ul>
        <p><strong>Epistatic Genome-Wide Association</strong></p>
        <ul>
          <li>
            <font size="-1">[E1]</font> Lippert<strong><sup>*</sup></strong>,
            J. Listgarten<strong><sup>*</sup></strong>, Robert
            Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, D.
            Heckerman<strong><sup>*</sup></strong>.&nbsp;<a
              href="http://www.nature.com/srep/2013/130122/srep01099/full/srep01099.html">An


              Exhaustive Epistatic SNP Association Analysis on
              Expanded Wellcome Trust Data</a>, <em>Scientific
              Reports</em>, 2013, doi:10.1038/srep01099 (<sup>*</sup>equal


            contributions)
          </li>
          <ul>
            <li>Presents results for all pairwise-epistatic tests for
              all phenotypes in the WTCCC1 data, using a linear mixed
              model with a low-rank similarity matrix based on the
              feature-selection method in [U3]. As described, based on
              the results in [U6], we now recommend that the
              feature-selection method in [U2] be used instead.</li>
          </ul>
        </ul>
        <p><strong>GWAS for&nbsp;“Functional Traits”&nbsp;such as
            Longitudinal Traits</strong></p>
        <ul>
          <li>
            <font size="-1">[F1]</font> Fusi and J.
            Listgarten.&nbsp;&nbsp;Leveraging Non-Linear Genetic
            Effects on Functional Traits for GWAS,&nbsp;<em>Proceedings


              of RECOMB 2016.</em>
          </li>
          <ul>
            <li>Introduces a model for performing GWAS
              for&nbsp;vector-valued traits which vary smoothly in
              time.&nbsp;The&nbsp;framework is expressive
              and&nbsp;computationally efficient, but the null model
              is not nested inside of the&nbsp;alternative model,
              something we are currently&nbsp;addressing in ongoing
              work.</li>
          </ul>
        </ul>
        <p><strong>Heritability Estimation</strong></p>
        <ul>
          <li>
            <font size="-1">[H1]</font> N. Furlotte, D. Heckerman,
            and C. Lippert.&nbsp; <a
              href="http://www.nature.com/jhg/journal/vaop/ncurrent/full/jhg201415a.html">Quantifying


              the uncertainty in heritability</a>.&nbsp; <em>Journal
              of Human Genetics</em> 27, March 2014 (doi:
            10.1038/jhg.2014.15).
          </li>
          <ul>
            <li>Applies the spectral-decomposition trick from FaST-LMM
              [2] to speed up Bayesian estimates of heritability.</li>
          </ul>
          <li>
            <font size="-1">[H2]</font> Heckerman, D. Gurdasani, C.
            Kadie, C. Pomilla, T. Carstensen, H. Martin, K. Ekoru,
            R.N. Nsubuga, G. Ssenyomo A. Kamali, P. Kaleebu, C.
            Widmer, and M.S. Sandhu. <a href="http://www.pnas.org/content/113/27/7377.abstract">Linear


              mixed model for heritability estimation that explicitly
              addresses environmental variation</a>. <em>PNAS</em>,
            113: 7377–7382 (doi: 10.1073/pnas.1510497113).
          </li>
          <ul>
            <li>Describes a way to generalize linear mixed models to
              take spatial location into account when jointly modeling
              the influences of genomics and environment on traits.</li>
          </ul>
        </ul>
        <p> </p>
      </div>
    </div>
  </section>
  <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
  <script src="js/jquery-3.3.1.min.js"></script>
  <!-- Include all compiled plugins (below), or include individual files as needed -->
  <script src="js/popper.min.js"></script>
  <script src="js/bootstrap-4.3.1.js"></script>
</body>

</html>