Skip to content

jaceybronte/gene_dependency_representations

 
 

Repository files navigation

Gene Dependency Representations

Goal

Current cancer treatments tend to be toxic and leave patients with lifelong side-effects. The future of drug development is based on synthetic lethality, where the combination of two genetic events results in cell death. It is used in molecular targeted cancer therapy, with the first example of a molecular targeted therapeutic exploiting a synthetic lethal exposed by an inactivated tumor suppressor gene (BRCA1 and 2) receiving FDA approval in 2016 (PARP inhibitor). The benefits of synthetic lethality-based treatment strategies include success against the majority of cancer mutations, simple identification of treatment-responding patients due to its selective nature of specific cancer cell genetic mutations, and reduced toxicity compared to traditional chemotherapy.

The goal of this project is to discover multivariate gene vulnerability patterns in cancer. Using cancer cell line data from DepMap, we can find multivariate gene vulnerability patterns that can be applied to the development of novel cancer treatments. ​ We apply machine learning to gene knockout data to discover multivariate gene vulnerabilities. We will apply statistical anaylses to determine the differences in multivariate gene vulnerabilities between pediatric and adult cancers. Once we discover significant multigene vulnerabilities patterns, we hope to inform drug discovery to develop cancer treatments targeting these vulnerabilities.

Data

Access

All data are publicly available.

Source: Cancer Dependency Map resource.

Repository Structure:

This repository is structured as follows:

Order Module Description
0.data-download Download required files Download gene effect data and cell line information, and download gene QC and construct gene filtering dictionary
1.data-exploration Explore and visualize data Create figures to visualize cell line information and split gene effect data into balanced test and train dataframes
2.train-VAE Train Beta VAE and Beta TC VAE models Optimize hyperparameters and train Beta Variational Autoencoder/Beta Total Correlation Variational Autoencoder with optimal hyperparameters and previously created test and train dataframes
3.analysis Analyze Beta VAE and Beta TC VAE Outputs Generate heatmaps to visualize death windows by cell line and by genes, run Gene Set Enrichment Analysis with BVAE and BTCVAE synthesized data, and analyze extracted BVAE/BTCVAE latent space data to compare similarity of cancer between different demographics

Environment Setup

Perform the following steps to set up the gene_dependency_representations environment necessary for processing data in this repository.

Step 1: Create Gene Dependency Representations Environment

# Run this command to create the proper conda environment (conda version 24.5.0)

conda env create --yes --file environment.yml

Step 2: Activate Gene Dependency Representations Environment

# Run this command to activate the conda environment for Gene Dependency Representations

conda activate gene_dependency_representations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 85.0%
  • HTML 13.8%
  • Python 1.2%