FactorGo

FactorGo is a scalable variational factor analysis model that learns pleiotropic factors using GWAS summary statistics!

We present Factor analysis model in Genetic assOciation (FactorGo) to learn latent pleiotropic factors using GWAS summary statistics. Our model is implemented using Just-in-time (JIT) via JAX in python, which generates and compiles heavily optimized C++ code in real time and operates seamlessly on CPU, GPU or TPU. FactorGo is a command line tool and please see example below and full documentation.

For pubished paper, please see:

Zhang, Z., Jung, J., Kim, A., Suboc, N., Gazal, S., and Mancuso, N. (2023). A scalable approach to characterize pleiotropy across thousands of human diseases and complex traits using GWAS summary statistics. Am. J. Hum. Genet. 110, 1863–1874. (https://www.cell.com/ajhg/abstract/S0002-9297(23)00353-1)

We are currently working on more detailed documentations. Feel free to contact me ([email protected]) if you need help on running our tool and further analysis. I am happy to schedule zoom call if needed.

Installation | Example | Notes | Support | Other Software

FactorGo model

FactorGo assumes the true genetic effect can be decomposed into latent pleiotropic factors. Briefly, we model test statistics at $p$ independent variants from the ith GWAS $Z_i \approx \sqrt{N}_i \hat{\beta}_i$ as a linear combination of $k$ shared latent variant loadings $L \in R^{p \times k}$ with trait-specific factor scores $f_i \in R^{k \times 1}$ as

$$Z_i = \sqrt{N}_i \beta_i + \epsilon_i = \sqrt{N}_i (L f_i + \mu) + \epsilon_i $$

where $N_i$ is the sample size for the $i^{th}$ GWAS , $\mu$ is the intercept and $\epsilon_i \sim N(0, \tau^{-1}I_p)$ reflects residual heterogeneity in statistical power across studies with precision scalar . Given $Z = \{Z_i\}^n_{i=1}$, and model parameters $L$, $F$, $\mu$, $\tau$, we can compute the likelihood as

$$\mathcal{L}(L, F, \mu, \tau | Z) = \prod_i \mathcal{N}_p ( \sqrt{N_i} (L f_i + \mu), \tau^{-1} I_p)$$

To model our uncertainty in $L$, $F$, $\mu$, we take a full Bayesian approach in the lower dimension latent space similar to a Bayesian PCA model [1]_ as,

$$\Pr(F) = \prod_{i=1}^{n} \mathcal{N}_k (f_i | 0, I_k)$$

$$\Pr(L | \alpha) = \prod_{j=1}^{p} \mathcal{N}_k (l^j | 0, diag(\alpha^{-1}))$$

$$\Pr(\mu) = \mathcal{N}_p (\mu | 0, \phi^{-1} I_p)$$

where $\alpha \in R^{k \times 1}{>0} (\phi > 0)$ controls the prior precision for variant loadings (intercept). To avoid overfitting, and “shut off” uninformative factors when $k$ is misspecified, we use automatic relevance determination (ARD) [1] and place a prior over $\alpha$ as

$$\Pr(\alpha | \alpha_a, \alpha_b) = \prod_{q=1}^{k} G(\alpha_q | \alpha_a, \alpha_b)$$

$$\Pr(\tau | \tau_a, \tau_b) = G(\tau | \tau_a, \tau_b)$$

Lastly, we place a prior over the shared residual variance across GWAS studies as $\tau \sim G(a , b)$. We impose broad priors by setting hyperparameters $\phi = a_k = b_k= a_{\tau} = b_{\tau} = 10^{-5}$.

Installation

We recommend first create a conda environment and have pip installed.

# download use http address
git clone https://github.com/mancusolab/FactorGo.git
# or use ssh agent
git clone [email protected]:mancusolab/FactorGo.git

cd factorgo
pip install .

Example

For iilustration, we use example data stored in /example/data, including Z score summary statistics file and sample size file.

To run factorgo command line tool, we specify the following input files and flags:

GWAS Zscore file: n20_p1k.Zscore.tsv.gz
Sample size file: n20_p1k.SampleN.tsv
-k 5: estimate 5 latent factors
--scale: the snp columns of Zscore matrix is center and standardized
-o: output directory and prefix

For all available flags, please use factorgo -h.

factorgo \
    ./example/data/n20_p1k.Zscore.tsv.gz \
    ./example/data/n20_p1k.SampleN.tsv \
    -k 5 \
    --scale \
    -o ./example/result/demo_test

The output contains five result files:

demo_test.Wm.tsv.gz: posterior mean of loading matrix W (pxk)
demo_test.Zm.tsv.gz: posterior mean of factor score Z (nxk)
demo_test.Wvar.tsv.gz: posterior variance of loading matrix W (kx1)
demo_test.Zvar.tsv.gz: posterior variance of factor score Z (nxk)
demo_test.factor.tsv.gz: contains the following three columns

| a) factor index (ordered by R2), | b) posterior mean of ARD precision parameters, | c) variance explained by each factor (R2)

Notes

The default computation device for factorgo is CPU. To switch to GPU device, you can specify the platform (cpu/gpu/tpu) using the flag -p gpu for example:

factorgo \
    ./example/data/n20_p1k.Zscore.tsv.gz \
    ./example/data/n20_p1k.SampleN.tsv \
    -k 5 \
    --scale \
    -p gpu \ # use gpu device
    -o ./example/result/demo_test

factorgo uses JAX with Just In Time compilation to achieve high-speed computation. However, there are some issues for JAX with Mac M1 chip. To solve this, users need to initiate conda using miniforge, and then install factorgo using pip in the desired environment.

References

[1] Bishop, C.M. (1999). Variational principal components. 509–514.

Support

Please report any bugs or feature requests in the Issue Tracker. If you have any questions or comments please contact [email protected] and/or [email protected].

Other Softwares

Feel free to use other software developed by Mancuso Lab:

SuShiE: a Bayesian fine-mapping framework for molecular QTL data across multiple ancestries.
MA-FOCUS: a Bayesian fine-mapping framework using TWAS statistics across multiple ancestries to identify the causal genes for complex traits.
SuSiE-PCA: a scalable Bayesian variable selection technique for sparse principal component analysis
twas_sim: a Python software to simulate TWAS statistics.
HAMSTA: a Python software to estimate heritability explained by local ancestry data from admixture mapping summary statistics.

Note

This project has been set up using PyScaffold 4.1.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
docs		docs
example		example
src/factorgo		src/factorgo
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
AUTHORS.rst		AUTHORS.rst
CHANGELOG.rst		CHANGELOG.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE.txt		LICENSE.txt
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FactorGo

FactorGo model

Installation

Example

Notes

References

Support

Other Softwares

Note

About

Releases

Packages

Contributors 2

Languages

License

mancusolab/FactorGo

Folders and files

Latest commit

History

Repository files navigation

FactorGo

FactorGo model

Installation

Example

Notes

References

Support

Other Softwares

Note

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages