-
Notifications
You must be signed in to change notification settings - Fork 16
/
README.Rmd
100 lines (66 loc) · 4.4 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
## Overview
infercna aims to provide functions for inferring CNA values from scRNA-seq data and related queries.
* `infercna()` to infer copy-number alterations from single-cell RNA-seq data
* `refCorrect()` to convert relative CNA values to absolute values
+ computed in `infercna()` if reference cells are provided
* `cnaPlot()` to plot a heatmap of CNA values
* `cnaScatterPlot()` to visualise malignant and non-malignant cell subsets
* `cnaCor()` a parameter to identify cells with high CNAs
+ computed in `cnaScatterPlot()`
* `cnaSignal()` a second parameter to identify cells with high CNAs
+ computed in `cnaScatterPlot()`
* `findMalignant()` to find malignant subsets of cells
* `findClones()` to identify genetic subclones
* `fitBimodal()` to fit a bimodal gaussian distribution
+ used in `findMalignant()`
+ used in `findClones()`
* `filterGenes()` to filter genes by their genome features
* `splitGenes()` to split genes by their genome features
* `orderGenes()` to order genes by their genomic position
* `useGenome()` to change the default genome configured with infercna
* `addGenome()` to configure infercna with a new genome specified by the user
*See Reference tab for a full list and documentation pages.*
## Installation
To install `infercna`:
``` r
# install.packages("devtools")
devtools::install_github("jlaffy/infercna")
```
## References
The methodology behind infercna has been tried and tested in several high-impact publications. It was actually in the earliest of these papers (last listed) that the idea to infer CNAs from single-cell RNA-sequencing data was first formulated.
* [An Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma (Neftel, Laffy et al., 2019, *Cell*)](https://doi.org/10.1016/j.cell.2019.06.024)
* [Developmental and oncogenic programs in H3K27M gliomas dissected by single-cell RNA-seq (Filbin, Tirosh et al., 2018, *Science*)](https://doi.org/10.1126/science.aao4750)
* [Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq (Venteicher, Tirosh et al., 2017, *Science*)](https://doi.org/10.1126/science.aai8478)
* [Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma (Tirosh, Venteicher et al., 2016, *Nature*)](https://doi.org/10.1038/nature20123)
* [Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma (Patel, Tirosh et al., 2014, *Science*)](https://doi.org/10.1126/science.1254257)
## Data requirements
The bare minimum for use in infercna is:
* a single-cell expression matrix of genes by cells
* **not** centered
* normalised for sequencing depth and gene length (e.g. one of TPM, RPKM, CPM, etc).
* optionally in log space. e.g. `log2(TPM/10 + 1)`
* Note: also see `infercna::TPM` and `infercna::logTPM`
If you would like to compute absolute (rather than *relative*) CNA values, you should additionally provide:
* a list of length two or more containing reference cell IDs of normal cells. For example list(macrophages, oligodendrocytes).
1. see example reference `infercna::refCells`
Finally, if your genome is not available in the current implementation of infercna, you should additionally provide:
* a genome dataframe, containing the columns: `symbol`, `chromosome_name`, `start_position`, `arm`.
## Example data
infercna is built with two example datasets of scRNA-seq data from two patients with Glioblastoma, `infercna::bt771` and `infercna::mgh125`, along with two normal reference groups, `infercna::refCells`. The matrices are stored as sparse matrices and you can use `infercna::useData()` to load them as normal matrices. These patients are taken from a much larger cohort of 28 Glioblastoma samples. You can look at the complete study [here](Neftel*, Laffy* et al. 2019, Cell) and can download the complete dataset via the [Single Cell Portal](https://portals.broadinstitute.org/single_cell/study/SCP393/single-cell-rna-seq-of-adult-and-pediatric-glioblastoma).
## Future implementations
Future implementations will include:
* more default genomes to choose from
* option to correct CNA values (to absolute) when just *one* reference is available.
* more stuff...