Question about Gene Expression Training Preprocessing #161

wkl1990 · 2023-05-05T18:14:17Z

Hi Dave (@davek44 ),

I recently read your 2018 Basenji paper, where you referred to cell-type-specific gene expression. In the paper, you mentioned that you made predictions in the 128-bp bin containing each transcription start site (TSS), and for each gene outside the training set, you summed their various TSS values to compute accuracy statistics.

I was wondering if you could clarify whether you filtered the bigwig data outside the TSS or the training set outside the TSS. I'm new to Basenji and would greatly appreciate your help in understanding this aspect of preprocessing.

Thank you!

davek44 · 2023-05-06T16:39:15Z

I'm not sure what you mean by "filter the bigwig data". We train on the whole genome, other than highly repetitive and unmappable regions.

wkl1990 · 2023-05-06T23:10:26Z

Hello @davek44 , thank you for your response. To clarify, do you mean training the model on the entire genome but only making predictions on the TSS region? Additionally, I am curious about how you generated the bigwig file for the expression data. Were they created in the same way as the DNase data, directly from the bam file? If I use regular RNA-seq data, would I just keep the TSS reads to generate the bigwig signal?

davek44 · 2023-05-18T17:13:33Z

We train on the entire genome, and we make predictions across entire sequences. The model doesn't understand the concept of a TSS. You, the analyst, need to go in afterwards and pull out predictions at TSS if that's what you're interested in.

All BigWig files were created using a similar workflow from BAM files.

You cannot use RNA-seq. Only 5' RNA sequencing techniques like CAGE, GRO-seq, or PRO-seq will work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Gene Expression Training Preprocessing #161

Question about Gene Expression Training Preprocessing #161

wkl1990 commented May 5, 2023

davek44 commented May 6, 2023

wkl1990 commented May 6, 2023

davek44 commented May 18, 2023

Question about Gene Expression Training Preprocessing #161

Question about Gene Expression Training Preprocessing #161

Comments

wkl1990 commented May 5, 2023

davek44 commented May 6, 2023

wkl1990 commented May 6, 2023

davek44 commented May 18, 2023