This repository contains code to fully replicate the analysis of Cancer phylogenetics using single-cell RNA-seq data (Moravec at al. 2021). Alternatively, it can be used to perform a similar analysis on a new dataset.
Note that the analysis assumes a relatively uniform cell populations, otherwise the discretization method using Highest Density Interval will not work.
- Linux operation system
- at least 30 GB RAM
- about 400 GB of free space for intermediate files and results
R, python3, Cellranger, bamtofastq, GATK, VCFtools, IQtree, BEAST2
phyloRNA, beter, data.table, devtools
Original data published at GEO database under the accession number GSE163210.
Human reference genome GRCh38v15, annotation and known variants.
Code from this repository.
Once you have installed required software and prepared your data, navigate into the analysis directory and type:
Rscript run.r
After few days, the analysis should finish.
Pre-processed fasta files, trees and tests of phylogenetic clustering can be seen in the processed_files
branch. These files are tracked with Git Large File Storage (LFS) extension.
- install required software
- download R and Python packages
- download data and reference genome
- run the analysis
If anything is unclear or you need help with the analysis, raise an issue.