Updated analysis: independent-samples module #89

logstar · 2021-07-02T15:02:02Z

What analysis module should be updated and why?

The independent-samples module needs to be updated for the following purposes.

Create independent sample lists by cohort and also by cancer_group. Output a merged table for each (experimental_strategy, primary/primary-plus/relapse) like before.

The reason for this update is that certain primary samples are not captured if we use all samples for creating the lists. Certain patients have primary samples in different cohorts, so only one cohort of their primary samples will be randomly retained in previous lists. For example,

> library(tidyverse)
> hdf <- read_tsv('data/histologies.tsv', guess_max = 100000)
Parsed with column specification:
cols(
  .default = col_character(),
  OS_days = col_double(),
  age_last_update_days = col_double(),
  normal_fraction = col_double(),
  tumor_fraction = col_double(),
  tumor_ploidy = col_double()
)
See spec(...) for full column specifications.
> hdf %>%
  filter(tumor_descriptor %in% c("Initial CNS Tumor", "Primary Tumor"),
         cancer_group == 'Neuroblastoma') %>%
  group_by(Kids_First_Participant_ID) %>%
  summarise(n_samples = n(),
            Kids_First_Biospecimen_ID = paste(Kids_First_Biospecimen_ID, collapse = '&'),
            cohort = paste(cohort, collapse = '&'),
            cancer_group = paste(cancer_group, collapse = '&'),
            age_at_diagnosis_days = paste(age_at_diagnosis_days, collapse = '&')) %>%
  arrange(desc(n_samples)) %>%
  head()

Kids_First_Participant_ID	n_samples	Kids_First_Biospecimen_ID	cohort	cancer_group	age_at_diagnosis_days
PASWYR	4	BS_8XDZQKSD&BS_MPE34NYZ&TARGET-30-PASWYR-01A-01R&TARGET-30-PASWYR-01A-01D	GMKF&GMKF&TARGET&TARGET	Neuroblastoma&Neuroblastoma&Neuroblastoma&Neuroblastoma	958&958&958&958
PASXHE	4	BS_EZRVK9ZQ&BS_V4VGG98Y&TARGET-30-PASXHE-01A-01R&TARGET-30-PASXHE-01A-01D	GMKF&GMKF&TARGET&TARGET	Neuroblastoma&Neuroblastoma&Neuroblastoma&Neuroblastoma	1438&1438&1438&1438
PASXIE	4	BS_2N95EW0G&BS_9TKGBJH7&TARGET-30-PASXIE-01A-01R&TARGET-30-PASXIE-01A-01D	GMKF&GMKF&TARGET&TARGET	Neuroblastoma&Neuroblastoma&Neuroblastoma&Neuroblastoma	837&837&837&837
PASXRG	4	BS_9JBYGRQW&BS_P6FPBJM8&TARGET-30-PASXRG-01A-01R&TARGET-30-PASXRG-01A-01D	GMKF&GMKF&TARGET&TARGET	Neuroblastoma&Neuroblastoma&Neuroblastoma&Neuroblastoma	1278&1278&1278&1278
PASXRJ	4	BS_1BKHK7AY&BS_KXRFQF5N&TARGET-30-PASXRJ-01A-01R&TARGET-30-PASXRJ-01A-01D	GMKF&GMKF&TARGET&TARGET	Neuroblastoma&Neuroblastoma&Neuroblastoma&Neuroblastoma	583&583&583&583
PATBMM	4	BS_3DJBSNGE&BS_D7442ACV&TARGET-30-PATBMM-01A-01R&TARGET-30-PATBMM-01A-01D	GMKF&GMKF&TARGET&TARGET	Neuroblastoma&Neuroblastoma&Neuroblastoma&Neuroblastoma	1112&1112&1112&1112

(Update Fri Jul 2 15:27:11 2021 by YZ) The not-captured GMKF Neuroblastoma primary samples might be causing the ALK mutation frequency discrepancy between PediatricOpenTargets and pedcbio, as described by @jharenza at d3b-center/OpenPedCan-analysis#45 (review)

Update experimental_strategy to %in% c("WGS", "WXS", "Targeted Sequencing") in 01-generate-independent-specimens.R and 00-repeated-samples.Rmd. In v6 release, "Targeted Sequencing", "Targeted-Capture" are harmonized to "Targeted Sequencing", as requested in histologies.tsv experimental_strategy update #62
Update tumor_descriptor in independent-samples.R and independent_rna_samples.R to the following. The changes are requested in histologies.tsv tumor_descriptor update #61

primary_descs <- c("Initial CNS Tumor", "Primary Tumor")
relapse_descs <- c("Recurrence", "Progressive", "Progressive Disease Post Mortem")

What changes need to be made? Please provide enough detail for another participant to make the update.

Run the same procedure on each cohort and cancer_group. Include cohort and cancer_group fields in the result lists. Combine all result lists into one list.

Update experimental_strategy and tumor_descriptor where they are used.

What input data should be used? Which data were used in the version being updated?

histologies.tsv, which is updated in v6.

When do you expect the revised analysis will be completed?

1-3 days.

Who will complete the updated analysis?

@runjin326

cc: @jharenza

The text was updated successfully, but these errors were encountered:

logstar · 2021-07-08T16:14:05Z

The code and results are updated in d3b-center/OpenPedCan-analysis#48.

The README.md needs to be updated for the each-cohort procedure. We would probably need to clarify that in each-cohort function calls, we actually find independent samples from each cohort and each cancer_group.

jharenza · 2021-07-12T23:41:55Z

The code and results are updated in d3b-center/OpenPedCan-analysis#48.

The README.md needs to be updated for the each-cohort procedure. We would probably need to clarify that in each-cohort function calls, we actually find independent samples from each cohort and each cancer_group.

@runjin326 can you work on this?

runjin326 · 2021-07-12T23:43:15Z

@jharenza, absolutely! I will do that :)

logstar · 2021-07-13T16:18:19Z

The README.md is updated in d3b-center/OpenPedCan-analysis#51.

logstar assigned logstar and runjin326 and unassigned logstar Jul 2, 2021

logstar mentioned this issue Jul 2, 2021

Calculate TPM mean/z-score/SD/quantile summary statistics within each cancer group and cohort d3b-center/OpenPedCan-analysis#27

Merged

5 tasks

runjin326 mentioned this issue Jul 7, 2021

Independent Samples Analyses Updated d3b-center/OpenPedCan-analysis#48

Merged

5 tasks

runjin326 mentioned this issue Jul 13, 2021

README updated for independent samples module d3b-center/OpenPedCan-analysis#51

Merged

3 tasks

logstar closed this as completed Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated analysis: independent-samples module #89

Updated analysis: independent-samples module #89

logstar commented Jul 2, 2021 •

edited

Loading

logstar commented Jul 8, 2021

jharenza commented Jul 12, 2021

runjin326 commented Jul 12, 2021

logstar commented Jul 13, 2021

Updated analysis: independent-samples module #89

Updated analysis: independent-samples module #89

Comments

logstar commented Jul 2, 2021 • edited Loading

What analysis module should be updated and why?

What changes need to be made? Please provide enough detail for another participant to make the update.

What input data should be used? Which data were used in the version being updated?

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

logstar commented Jul 8, 2021

jharenza commented Jul 12, 2021

runjin326 commented Jul 12, 2021

logstar commented Jul 13, 2021

logstar commented Jul 2, 2021 •

edited

Loading