Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: RSEM outputs - counts matrix for isoforms #137

Closed
TBrownmiller opened this issue Jul 16, 2024 · 5 comments · Fixed by #149
Closed

Feature request: RSEM outputs - counts matrix for isoforms #137

TBrownmiller opened this issue Jul 16, 2024 · 5 comments · Fixed by #149
Assignees
Labels
ccbrpipeliner/7 enhancement New feature or request good first issue Good for newcomers RENEE RepoName
Milestone

Comments

@TBrownmiller
Copy link

Hello,

Would it be possible to add to the RSEM its feature to generate a counts matrix for the isoforms data?
Within the RSEM documentation it states it has the ability to do this using the following command:
rsem-generate-data-matrix sampleA.[genes/isoforms].results sampleB.[genes/isoforms].results ... > output_name.counts.matrix

I was able to do this manually by loading RSEM as a module in biowulf so there is no rush for this, but I thought it would be useful since many downstream tools use count matrices

Thanks!

@kopardev kopardev added the RENEE RepoName label Jul 16, 2024
@kelly-sovacool kelly-sovacool added enhancement New feature or request good first issue Good for newcomers ccbrpipeliner/7 labels Jul 26, 2024
@kelly-sovacool kelly-sovacool self-assigned this Aug 5, 2024
@kelly-sovacool kelly-sovacool added this to the 2024-08 milestone Aug 5, 2024
@kelly-sovacool
Copy link
Member

kelly-sovacool commented Aug 5, 2024

Hi @TBrownmiller, thanks for your request. RENEE outputs both gene and isoform counts -- the isoform count matrix is DEG_ALL/RSEM.isoforms.expected_count.all_samples.txt. Is this what you're looking for?

@TBrownmiller
Copy link
Author

Sort of. I think its a difference in formatting of the outputs. One of the R packages I use (EBSeq) that is usually directly compatible with RSEM outputs asks for a matrix file (file extension ".MATRIX") format, but the RENEE generated outputs are either a txt or tsv format which aren't directly compatible.

@kelly-sovacool
Copy link
Member

Gotcha. We'll make this available in the next release of RENEE -- v2.6.

@samarth8392
Copy link
Contributor

Hello @TBrownmiller ,
Just to follow up on your enquiry, I was wondering what error you receive when you try using the RSEM.isoforms.expected_count.all_samples.txt file in EBSeq?

From the package vignette, it says, The object Data should be a G − by − S matrix containing the expression values for each gene and each sample, where G is the number of genes and S is the number of samples. These values should exhibit
raw counts, without normalization across samples.

And the RSEM.isoforms.expected_count.all_samples.txt output file looks like:

gene_id GeneName        transcript_id   sample1 sample2
ENSG00000277411.1       5S_rRNA ENST00000614916.1       0.0     0.0
ENSG00000273730.1       5_8S_rRNA       ENST00000619779.1       11.88   12.62
...
ENSG00000268895.6       A1BG-AS1        ENST00000595302.1       33.26   12.7
ENSG00000268895.6       A1BG-AS1        ENST00000594950.5       0.0     0.0
ENSG00000268895.6       A1BG-AS1        ENST00000593960.6       21.17   10.29

You can create a new data matrix with just rownames and expression values. Try the following code:

library(dplyr)
library(tibble)
library(EBSeq)

df <- read.table("RSEM.isoforms.expected_count.all_samples.txt", header=T)
gene.matrix <- df %>% 
mutate(gene=paste(gene_id,GeneName,transcript_id, sep="_") %>%
select(-c(gene_id,GeneName,transcript_id)) %>%
column_to_rownames("gene") %>%
as.matrix()

The gene.matrix should work with EBSeq.

Let us know if that works.

@kelly-sovacool
Copy link
Member

@samarth8392 thanks for posting the R code to transform the count table into a matrix.

Vishal and I discussed this issue and decided to go ahead and add a rule to create the matrix with rsem -- it runs very quickly and doesn't add much overhead at all. This way our users won't have to transform the other output themselves. See #149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ccbrpipeliner/7 enhancement New feature or request good first issue Good for newcomers RENEE RepoName
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants