Skip to content

Latest commit

 

History

History
113 lines (93 loc) · 5.88 KB

README.md

File metadata and controls

113 lines (93 loc) · 5.88 KB

Regression with Summary Statistics (RSS)

DOI

Overview

Multiple regression analyses often assume that the response and covariates of each individual are observed, and use them to infer the regression coefficients. Here, motivated by the applications in genetics, we assume that these individual-level data are not available, but instead the summary statistics of univariate regression (essentially, the effect size estimates and their standard errors) are provided. We also assume that information on the correlation structure among covariates is available. The aim is to infer the multiple regression coefficients using the marginal regression summary statistics.

This work is motivated by applications in genome-wide association studies (GWAS). When fitting the multiple regression model to individual-level data of GWAS, the covariates are the genotypes typed at different genetic variants (typically SNPs), the response is the quantitative phenotype (e.g. height or blood lipid level), and the regression coefficients are the effects of each SNP on phenotype. Due to privacy and logistical issues, the individual-level data are often not easily available. In contrast, the GWAS summary statistics (from standard single-SNP analysis) are widely available in the public domain (e.g. GIANT and PGC). Moreover, the correlation among covariates (genotypes of SNPs), known as linkage disequilibrium, also can be obtained from public databases (e.g. the 1000 Genomes Project). When the protected individual-level data are not available, can we perform "multiple-SNP" analysis using these public assets?

Here we provide a generally-applicable framework for the multiple-SNP analyses using GWAS single-SNP summary data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results. We then combine the RSS likelihood with suitable priors to perform Bayesian inference for the regression coefficients.

License

The repository is licensed under the MIT License.

Support

  1. Get started from some short tutorials.
  2. Refer to FAQ for answers to some common questions.
  3. Create a new issue to report bugs and/or request features.

Citation

Collaboration

Here we have developed a likelihood function of multiple regression coefficients based on univariate regression summary data, which opens the door to a wide range of statistical machinery for inference. Using this likelihood, we have implemented Bayesian methods to estimate SNP heritability, detect genetic association, assess gene set or network enrichment, prioritize trait-associated genes and infer genetic architecture. Please check our progress updates regularly.

If you have specific applications that use GWAS summary data as input, and want to build new statistical methods based on the RSS likelihood, please feel free to contact us. We are glad to help!

Contact

Xiang Zhu, Ph.D.
Matthew Stephens Lab
Department of Statistics
University of Chicago