Annotate SNV table with mutation frequencies #45

logstar · 2021-06-30T23:48:27Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Annotate each non-synonymous variant in snv-consensus-plus-hotspots.maf.tsv.gz with mutation frequencies per (cohort, cancer_group, primary/relapse).

What was your approach?

Subset snv-consensus-plus-hotspots.maf.tsv.gz to keep only samples with sample_type == 'Tumor'.

Subset snv-consensus-plus-hotspots.maf.tsv.gz to keep only non-synonymous variants with the following code.

Variant_Classification %in% c('Frame_Shift_Del',
                              'Frame_Shift_Ins',
                              'Splice_Site',
                              'Nonsense_Mutation',
                              'Nonstop_Mutation',
                              'In_Frame_Del',
                              'In_Frame_Ins',
                              'Missense_Mutation',
                              'Fusion',
                              'Multi_Hit',
                              'Multi_Hit_Fusion',
                              'Hom_Deletion',
                              'Hem_Deletion',
                              'Amp',
                              'Del',
                              'Translation_Start_Site')

Create a Variant_ID for each variant by concatenating Chromosome, Start_Position, Reference_Allele, and Tumor_Seq_Allele2 with '_'.

Add Gene_full_name and Protein_RefSeq_ID columns to each variant with annotations obtained from mygene.info.

For each cancer_group, get each cohort and all cohorts. Call each cancer_group and cohort(s) combination as a cancer_group_cohort. For example,

cancer_group	cohort	n_samples
Neuroblastoma	CBTN	2
Neuroblastoma	GMKF	541
Neuroblastoma	TARGET	889
Neuroblastoma	CBTN&GMKF&TARGET	1432

For each cancer_group_cohort with n_samples >= 5, compute Frequency_in_overall_dataset, Frequency_in_primary_tumors, and Frequency_in_relapse_tumors as following:

Frequency_in_overall_dataset:
- For each unique variant, count the number of patients (identified by Kids_First_Participant_ID) that have the variant, and call this number Total_mutations.
- Count the total number of patients in the cancer_group_cohort, and call this number Patients_in_dataset.
- Frequency_in_overall_dataset = Total_mutations / Patients_in_dataset.
Frequency_in_primary_tumors:
- For each unique variant, count the number of samples (identified by Kids_First_Biospecimen_ID) that are in the ../independent-samples/results/independent-specimens.wgs.primary.tsv, and call this number Total_primary_tumors_mutated.
- Count the total number of samples in the cancer_group_cohort that are also in the ../independent-samples/results/independent-specimens.wgs.primary.tsv, and call this number Primary_tumors_in_dataset.
- Frequency_in_primary_tumors = Total_primary_tumors_mutated / Primary_tumors_in_dataset.
Frequency_in_relapse_tumors:
- For each unique variant, count the number of samples (identified by Kids_First_Biospecimen_ID) that are in the ../independent-samples/results/independent-specimens.wgs.relapse.tsv, and call this number Total_relapse_tumors_mutated.
- Count the total number of samples in the cancer_group_cohort that are also in the ../independent-samples/results/independent-specimens.wgs.relapse.tsv, and call this number Relapse_tumors_in_dataset.
- Frequency_in_relapse_tumors = Total_relapse_tumors_mutated / Relapse_tumors_in_dataset.

Format the SNV mutation frequency table according to the latest spreadsheet that is attached in d3b-center/ticket-tracker-OPC#64.

Merge the SNV mutation frequency tables of all cancer_group_cohorts.

What GitHub issue does your pull request address?

Proposed Analysis: Annotate SNV table of mutation frequencies ticket-tracker-OPC#64
Proposed Analysis: Create mutation frequencies for Ped OT platform ticket-tracker-OPC#8. This issue is no longer compatible with the purpose of this module. This module intends to compute mutation frequencies for each variant, but this issue intents to compute the mutation frequencies for each gene. This issue is listed here for future reference.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The get_cg_ch_mut_freq_tbl function for computing frequencies.

The get_cg_cs_tbl function for getting cacner_group_cohorts.

Is there anything that you want to discuss further?

Is it OK to retain the chr prefix in the Chromosome when creating Identifer hg38 (See ClinVar ex) (Variant_ID in the result table)? The example value is 10_102599545_G_A in the spreadsheet, but it will be chr10_102599545_G_A in the output table.

I will add RMTL and COSMIC annotations when CosmicMutantExportCensus.tsv and RMTL_Expanded_3_RMTL_for_OT.csv are available in future data releases.

I will work on d3b-center/ticket-tracker-OPC#82 to add oncoKB annotations to the result table.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes.

Results

What types of results are included (e.g., table, figure)?

Table

What is your summary of the results?

 wc -l results/snv-consensus-annotated-mut-freq.tsv 
  130644 results/snv-consensus-annotated-mut-freq.tsv

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile. Note: wait to be merged at Update Dockerfile #36.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

@taylordm

Squashed commit of the following: commit d87986b7ce1a517f4807430ce6beaac5950b50ca Author: logstar <[email protected]> Date: Wed Jun 30 17:21:59 2021 -0400 Rename mutation-frequencies to snv-frequencies Rename module. commit b2d2fd5c391b43214825e7b458d0edcb5ac22f1a Author: logstar <[email protected]> Date: Wed Jun 30 17:13:18 2021 -0400 Annotate SNV table with mutation frequencies Issues addressed: - <d3b-center/ticket-tracker-OPC#64> - <d3b-center/ticket-tracker-OPC#8>. This issue is no longer compatible with the purpose of this module. This module intends to compute mutation frequencies for each variant, but this issue intents to compute the mutation frequencies for each gene. This issue is listed here for future reference. commit 84cacf28927121037f4b9ba895e5baa5d12c7b31 Author: logstar <[email protected]> Date: Wed Jun 30 16:23:20 2021 -0400 [WIP] Update run-mutation-frequencies.sh commit 29ae8ef19f2339ae08f78c26ab42e6cf75d3556e Author: logstar <[email protected]> Date: Wed Jun 30 16:14:50 2021 -0400 [WIP] Generate annotated SNV frequency table commit 2cb06741ca192f77a3043d03574649a184459b11 Author: logstar <[email protected]> Date: Wed Jun 30 14:54:39 2021 -0400 [WIP] Replace NA with blank string Also replaced HotSpot value 1 with Y and 0 with N. commit 57776f61e576a5e3e2672370370fd1090f3aa478 Author: logstar <[email protected]> Date: Wed Jun 30 14:02:13 2021 -0400 [WIP] Use mygene.info to query gene IDs mygene.info seems to be actively maintained. The query results are more comprehensive than [biomaRt](https://bioconductor.org/packages/release/bioc/html/biomaRt.html). Relevant URLs: - <http://mygene.info/about> - <https://bioconductor.org/packages/release/bioc/html/mygene.html> mygene.info is suggested by @taylordm and @jharenza. commit 76bb0f5236378648adce429e45d3827009735b58 Author: logstar <[email protected]> Date: Wed Jun 30 10:55:30 2021 -0400 [WIP] Generate SNV frequency tables Issue addressed: d3b-center/ticket-tracker-OPC#64

Add snv-frequencies module in the table.

When there is no primary or relapse samples, mutation frequencies are 0/0 = NaN in R. Replace 'NaN%' in the output with '', in order to be consistent with missing values in other columns.

logstar · 2021-07-01T03:44:54Z

Update to this PR:

Replace 'NaN%' frequencies with ''

When there is no primary or relapse sample in a cancer_group_cohort, mutation frequencies are computed with 0 / 0, which takes the value of NaN in R.

In the result table, replace 'NaN%' with empty string '' for the NaN frequencies, in order to be consistent with missing values in other columns.

jharenza

Hi @logstar! Thanks for the quick PR for this table!

Now that #34 is in, will you update this with the files in v6? This will change your CBTN dataset/cohort to PBTA and will condense a few cancer groups. In addition, per our call on Wednesday, we want to add the EFO and MONDO columns to this file, and those can be added with a merge of efo-mondo-map.tsv. Note, this is not yet complete - we only have about half of the cancer groups in there so far, but it is a start.

For CBTN&GMKF&TARGET in Dataset, can you recode this to PedOT - we will use this general designation to mean inclusive of all datasets.

I also noticed that you have some NAs in SIFT, PolyPhen, and some other columns when I read this into R using read_tsv(), but it looks like if I do:

freq <- read_tsv("~/Documents/GitHub/ped-ot/OpenPedCan-analysis/analyses/snv-frequencies/results/snv-consensus-annotated-mut-freq.tsv",
                 na=character(0))

the blanks are retained, so I wanted to note that here because when we do the JSON conversions, which may be in staggered PRs, we will have to add that argument.

Overall, the format looks good, however, I am worried about the results - for instance in neuroblastoma, the expected number of ALK mutations in primary tumors should be ~10%, but we are only seeing ~5.85% and this doesn't align with the current GMKF pedcbio study which has a frequency of 9%. There are some mutations in pedcbio we are not capturing here, but I have a suspicion that the study loaded in pedcbio is only using Strelka2 data and not a consensus of 4 callers, so, I have to check into this a bit. Regardless, even with consensus, this number should be higher. cc @migbro @yuankunzhu

logstar · 2021-07-02T03:45:26Z

Thank you for the review @jharenza !

Hi @logstar! Thanks for the quick PR for this table!

Now that #34 is in, will you update this with the files in v6? This will change your CBTN dataset/cohort to PBTA and will condense a few cancer groups. In addition, per our call on Wednesday, we want to add the EFO and MONDO columns to this file, and those can be added with a merge of efo-mondo-map.tsv. Note, this is not yet complete - we only have about half of the cancer groups in there so far, but it is a start.

Yes. I will update this with the v6 data release files and add EFO and MONDO columns.

For CBTN&GMKF&TARGET in Dataset, can you recode this to PedOT - we will use this general designation to mean inclusive of all datasets.

Sure. Will do.

I also noticed that you have some NAs in SIFT, PolyPhen, and some other columns when I read this into R using read_tsv(), but it looks like if I do:
freq <- read_tsv("~/Documents/GitHub/ped-ot/OpenPedCan-analysis/analyses/snv-frequencies/results/snv-consensus-annotated-mut-freq.tsv",
                 na=character(0))
the blanks are retained, so I wanted to note that here because when we do the JSON conversions, which may be in staggered PRs, we will have to add that argument.

Good catch! I will add it in the README.md. It will be helpful for anyone who wants to read the tsv table.

For JSON conversion, I was wondering if I could add jsonlite::write_json(result_tibble, 'result.json') in the end of the R script to output an additional JSON file, so we can skip the read_tsv step and staggered PRs.

Overall, the format looks good, however, I am worried about the results - for instance in neuroblastoma, the expected number of ALK mutations in primary tumors should be ~10%, but we are only seeing ~5.85% and this doesn't align with the current GMKF pedcbio study which has a frequency of 9%. There are some mutations in pedcbio we are not capturing here, but I have a suspicion that the study loaded in pedcbio is only using Strelka2 data and not a consensus of 4 callers, so, I have to check into this a bit. Regardless, even with consensus, this number should be higher. cc @migbro @yuankunzhu

I see a similar ~6.10% ALK overall SNV frequency in my updated oncoprint-landscape module result table.

Add note on how to read snv-consensus-annotated-mut-freq.tsv without converting blank strings to NAs.

Need the Dockerfile update to install bioc mygene.

logstar · 2021-07-02T18:45:41Z

Updates to this PR:

Added EFO, MONDO, and RMTL. For RMTL, add a single column combining rmtl and version in the format of rmtl (version), i.e. Relevant Molecular Target (RMTL version 1) or Non-Relevant Molecular Target (RMTL version 1).
Renamed cohort PBTA&GMKF&TARGET with PedOT.
Updated README.md to note na=character(0).

gzip JSON to save space.

logstar · 2021-07-03T00:34:05Z

Update to this PR:

Output SNV mutation frequency table in JSON format.

Output m_mut_freq_tbl in 01-snv-frequencies.R with the following code:

jsonlite::write_json(
  m_mut_freq_tbl,
  file.path(tables_dir, 'snv-consensus-annotated-mut-freq.json'))

gzip the JSON file in run-snv-frequencies.sh to save space with the following code:

# --no-name option stops the filename and timestamp from being stored in the
# output file. So rerun will have the same file.
gzip --no-name results/snv-consensus-annotated-mut-freq.json

There is one duplicate of ENSG-HUGO mapping. Also added additional assertions.

Move RMTL column after Gene_symbol. Rename Variant_ID to Variant_ID_hg38.

Use `%>% mutate_all(function(x) replace_na(x, replace = ''))` to replace NA/NaNs in all columns with ''. Remove the added column that is not used in the script.

In run-snv-frequencies.sh, remove existing results dir before running 01-snv-frequencies.R, so gzip will not ask for confirmation to overwrite.

kgaonkar6 · 2021-07-06T21:08:07Z

Hi @logstar just wanted to ask if you envision 1 module for each data type like snv-frequencies, cnv-frequencies fusion-frequencies etc OR can we rename this module so we can retain frequency/json files here for all data types?

logstar · 2021-07-06T21:33:33Z

Hi @logstar just wanted to ask if you envision 1 module for each data type like snv-frequencies, cnv-frequencies fusion-frequencies etc OR can we rename this module so we can retain frequency/json files here for all data types?

Thank you for asking. I think 1 module for each data type might be more efficient to implement, document, maintain, and review.

If we need to combine the tables into one in the future, we could add another module for combining snv, cnv, and fusion frequency tables together into one.

If there is any reason that we should have one module for all data types, we could also proceed with that.

@jharenza

Removed the `rm -r results` part in run-rna-seq-expression-summary-stats.sh, as suggested by @jharenza at d3b-center#27 (comment)

kgaonkar6

The code overall is well documented and implements all the checks 👍

Do you think we could divide the 01 script to have all your functions in 1 or multiple utils scripts in this module to gather the counts given a Kids_First_Biospecimen_ID and a Variant_ID/Alt_ID and another 01 script to read mutation/annotation files, call this function and left_join the counts?

Just wanted to discuss this since I think this would a good way to have reusable function for other data types ( I did play around your code and seem to have one for fusion. which I can add as PR for your review) . Thoughts?

Update README.md for recent patches.

logstar · 2021-07-07T12:40:50Z

The code overall is well documented and implements all the checks 👍

Do you think we could divide the 01 script to have all your functions in 1 or multiple utils scripts in this module to gather the counts given a Kids_First_Biospecimen_ID and a Variant_ID/Alt_ID and another 01 script to read mutation/annotation files, call this function and left_join the counts?

Just wanted to discuss this since I think this would a good way to have reusable function for other data types ( I did play around your code and seem to have one for fusion. which I can add as PR for your review) . Thoughts?

Thank you for the review @kgaonkar6 !

I agree that we could reuse the workflow design of this module. I will also extract a function to generate the frequency related columns and put it in utils, in order to reuse it to generate the gene-level tables that we discussed this morning. If you need the function immediately, you could adapt the linked code for your own purposes, https://github.com/logstar/OpenPedCan-analysis/blob/f70645b6c7e4eb15ea29e45e9ebf0adeb5798b9b/analyses/snv-frequencies/01-snv-frequencies.R#L146-L227.

However, I woud prefer to use the code from other modules by adapting them rather than directly resuing via source('../another-module/utils/some_function.R'), so I do not have rely on any code that has no guarantee to not change interface in the future.

test_num_to_pct_chr.R tests the num_to_pct_chr function in 01-snv-frequencies.R.

Test get_cohort_set_value in tests/test_get_cohort_set_value.R.

get_pcb_pot_csi is tested in tests/test_get_pcb_pot_csi.R.

Test get_pcb_pot_plot_url in tests/test_get_pcb_pot_plot_url.R.

Put the imported function in the same environment as the import_function being called, so the imported functions can call other imported functions.

Require Kids_First_Biospecimen_ID and Kids_First_Participant_ID columns exist and have no NA in overall_histology_df, primary_histology_df, and relapse_histology_df.

get_opr_mut_freq_tbl is the core function to compute mutation frequencies. get_opr_mut_freq_tbl is called by get_cg_ch_var_level_mut_freq_tbl and get_cg_ch_gene_level_mut_freq_tbl to compute mutation frequencies. get_cg_ch_var_level_mut_freq_tbl and get_cg_ch_gene_level_mut_freq_tbl are not tested, because they only format the mutation frequency tables generated by get_opr_mut_freq_tbl.

logstar · 2021-07-23T18:13:11Z

@jharenza @kgaonkar6 I have added unit testing for this PR and removed test case comments in 01-snv-frequencies.R.

All tests passed in the Docker image. See the commits from c58c8d5 for more details.

$ bash run-tests.sh
✔ |  OK F W S | Context
✔ |  23       | tests/test_get_cohort_set_value.R
✔ |  31       | tests/test_get_opr_mut_freq_tbl.R [0.8 s]
✔ |  33       | tests/test_get_pcb_pot_csi.R
✔ |  42       | tests/test_get_pcb_pot_plot_url.R
✔ |  21       | tests/test_helper_import_function.R
✔ |  12       | tests/test_num_to_pct_chr.R

══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 1.1 s

OK:       162
Failed:   0
Warnings: 0
Skipped:  0
Done running run-tests.sh

I will work on d3b-center/ticket-tracker-OPC#122 before incorporating the annotator API into this PR. If this PR is merged, I will create a new PR for refactoring the annotation part in this module.

In import_function, envir = parent.frame() makes the environment(imported_function) to be the same as the import function being called. Even though the eval documentation says the default envir parameter is parent.frame(), leaving envir = parent.frame() in the eval call will surprisingly make the environment(imported_function) to be the same as the environment of the import_function call. The tests also fail without specifying envir = parent.frame(). Maybe this is caused by the place where parent.frame() is evaluated? If specified, parent.frame() is evaluated in the calling function; if not specified, parent.frame() is evaluated in the eval call environment? Examples executed in terminal R, in order to avoid RStudio customizations. > print(environment()) <environment: R_GlobalEnv> > > foo <- function() { + print(parent.frame()) + print(environment()) + return(eval(quote(function() { return(2) }))) + } > > bar <- foo() <environment: R_GlobalEnv> <environment: 0x5645ead67670> > print(environment(bar)) <environment: 0x5645ead67670> > > baz <- function() { + print(parent.frame()) + print(environment()) + return(eval(quote(function() { return(2) }), + envir = parent.frame())) + } > > qux <- baz() <environment: R_GlobalEnv> <environment: 0x5645ead5fec0> > print(environment(qux)) <environment: R_GlobalEnv>

logstar · 2021-07-26T15:35:57Z

Added WIP label to this PR.

I will use the annotator API to add annotations to the SNV tables. This will reduce the number of lines in analyses/snv-frequencies/01-snv-frequencies.R, which may reduce the workload for the reviewers to review this PR.

I will keep using v6 data and independent sample lists in this PR, and I will create a ticket for rerunning the snv-frequency module using the v7 data and independent sample lists.

logstar · 2021-07-26T17:25:23Z

@jharenza @kgaonkar6 Sorry for changing the plan.

This PR will not be further changed for integrating the annotator API, because the v6 independent sample lists and the annotator API cannot be both available in this PR without including the analyses/long-format-table-utils directory in the "Files changed" tab of this PR.

I will create a new PR for integrating the annotator API and rerunning for v7, after the v7 data release is available.

Following is the clarification for the forced push:

The forced push is to completely revert merging with the lft-utils-ann-r-cli branch. I merged the lft-utils-ann-r-cli rather than the dev branch, because dev branch has v7 independent lists, in order to get the annotator API. However, the files added in lft-utils-ann-r-cli showed up in the "Files changed" tab, even though the lft-utils-ann-r-cli has already been merged to the base dev branch of this PR, which would make the "Files changed" tab and the commit history confusing.

The forced push completely reverted this PR to the state of this commit 861be07, which is the latest one before merging with the lft-utils-ann-r-cli branch. Following are the commands used for this forced push.

I assumed no one has pulled the snv-freq branch from my own fork after the merging was pushed and before the forced push. Otherwise, a forced pull may be necessary to sync with this PR/branch.

A merge can also be reverted with git --revert, but it may cause complex issues to the commit history.

$ git reset 861be07959c8590c25711d0c16e55fa92b016e8e
Unstaged changes after reset:
M       analyses/README.md

$ git st
On branch snv-freq
Your branch is behind 'origin/snv-freq' by 102 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   README.md

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        long-format-table-utils/

$ git reset --hard
HEAD is now at 861be07 Specify envir = parent.frame() in eval call

# long-format-table-utils is not tracked, so rm -r is necessary to remove it
$ rm -r long-format-table-utils

$ git st
On branch snv-freq
Your branch is behind 'origin/snv-freq' by 102 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working tree clean

$ git push -f
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/logstar/OpenPedCan-analysis.git
 + 4f1781f...861be07 snv-freq -> snv-freq (forced update)

jharenza

Really nice job on this - two minor comments about removing redundant json files and renaming the variant level file, and once addressed, this can be merged.

analyses/README.md

analyses/snv-frequencies/run-snv-frequencies.sh

analyses/snv-frequencies/01-snv-frequencies.R

Remove results/gene-level-snv-consensus-annotated-mut-freq.json.gz and results/var-level-snv-consensus-annotated-mut-freq.json.gz.

@jharenza

As suggested by @jharenza at <d3b-center#45 (comment)>, use variant-level, rather than var-level, in the variant-level filenames.

logstar added 2 commits June 30, 2021 17:49

Update analyses/README.md

58c7ed0

Add snv-frequencies module in the table.

logstar requested review from jharenza and kgaonkar6 June 30, 2021 23:48

Replace 'NaN%' frequencies with ''

38629de

When there is no primary or relapse samples, mutation frequencies are 0/0 = NaN in R. Replace 'NaN%' in the output with '', in order to be consistent with missing values in other columns.

logstar mentioned this pull request Jul 1, 2021

Proposed Analysis: add oncoKB annotation to consensus MAF d3b-center/ticket-tracker-OPC#82

Closed

jharenza requested changes Jul 2, 2021

View reviewed changes

logstar mentioned this pull request Jul 2, 2021

Updated analysis: independent-samples module d3b-center/ticket-tracker-OPC#89

Closed

logstar added 6 commits July 2, 2021 15:46

Update README.md

6349398

Add note on how to read snv-consensus-annotated-mut-freq.tsv without converting blank strings to NAs.

[WIP] Rerun snv-frequencies module using v6 data

3922b22

[WIP] Rename cohort PBTA&GMKF&TARGET with PedOT

4e3a155

Add EFO, MONDO, and RMTL to SNV freq tsv

2f68367

Merge branch 'dev' into snv-freq

6547926

Need the Dockerfile update to install bioc mygene.

Print 'Done' after running 01-snv-frequencies.R

fe5c059

Output SNV mutation frequency table in JSON format

21c5d4d

gzip JSON to save space.

logstar added 4 commits July 3, 2021 05:44

Remove duplicates in ensg_hugo_rmtl_df

b136d38

There is one duplicate of ENSG-HUGO mapping. Also added additional assertions.

Rearrange and rename output table columns

c005538

Move RMTL column after Gene_symbol. Rename Variant_ID to Variant_ID_hg38.

Refactor 01-snv-frequencies.R

f47aafa

Use `%>% mutate_all(function(x) replace_na(x, replace = ''))` to replace NA/NaNs in all columns with ''. Remove the added column that is not used in the script.

Remove existing results dir before running Rscript

bbeed0b

In run-snv-frequencies.sh, remove existing results dir before running 01-snv-frequencies.R, so gzip will not ask for confirmation to overwrite.

jharenza mentioned this pull request Jul 6, 2021

Calculate TPM mean/z-score/SD/quantile summary statistics within each cancer group and cohort #27

Merged

5 tasks

Remove rm -r results in shell script

f043b3e

Removed the `rm -r results` part in run-rna-seq-expression-summary-stats.sh, as suggested by @jharenza at d3b-center#27 (comment)

kgaonkar6 reviewed Jul 7, 2021

View reviewed changes

Update README.md

f70645b

Update README.md for recent patches.

logstar added 8 commits July 22, 2021 23:27

Move num_to_pct_chr test cases to a test file

57ed514

test_num_to_pct_chr.R tests the num_to_pct_chr function in 01-snv-frequencies.R.

Move get_cohort_set_value tests to a test file

d8c0fea

Test get_cohort_set_value in tests/test_get_cohort_set_value.R.

Move get_pcb_pot_csi tests to a test file

20e9528

get_pcb_pot_csi is tested in tests/test_get_pcb_pot_csi.R.

Move get_pcb_pot_plot_url tests to a test file

fdd250d

Test get_pcb_pot_plot_url in tests/test_get_pcb_pot_plot_url.R.

Correct context of test_helper_import_function.R

17c04f0

Put imported function in the importing environment

16e5bca

Put the imported function in the same environment as the import_function being called, so the imported functions can call other imported functions.

Require biospec and patient IDs in sample metadata

ae30770

Require Kids_First_Biospecimen_ID and Kids_First_Participant_ID columns exist and have no NA in overall_histology_df, primary_histology_df, and relapse_histology_df.

logstar removed the work in progress label Jul 23, 2021

logstar requested review from jharenza and kgaonkar6 July 23, 2021 18:13

logstar added 2 commits July 23, 2021 14:54

Update README.md

ec62e1b

logstar added the work in progress label Jul 26, 2021

logstar force-pushed the snv-freq branch from 4f1781f to 861be07 Compare July 26, 2021 16:47

logstar added ready for review and removed work in progress labels Jul 26, 2021

jharenza approved these changes Jul 27, 2021

View reviewed changes

analyses/README.md Show resolved Hide resolved

analyses/snv-frequencies/run-snv-frequencies.sh Outdated Show resolved Hide resolved

analyses/snv-frequencies/01-snv-frequencies.R Show resolved Hide resolved

jharenza added the merge next label Jul 27, 2021

logstar added 2 commits July 27, 2021 14:53

Remove JSON files from output

4958039

Remove results/gene-level-snv-consensus-annotated-mut-freq.json.gz and results/var-level-snv-consensus-annotated-mut-freq.json.gz.

Rename var-level files to variant-level

3b5e635

As suggested by @jharenza at <d3b-center#45 (comment)>, use variant-level, rather than var-level, in the variant-level filenames.

logstar merged commit a6a1d0e into d3b-center:dev Jul 27, 2021

logstar deleted the snv-freq branch July 27, 2021 21:12

This was referenced Jul 27, 2021

Proposed Analysis: Create mutation frequencies for Ped OT platform d3b-center/ticket-tracker-OPC#8

Closed

Updated analysis: add pedcbio links to SNV tables d3b-center/ticket-tracker-OPC#83

Closed

logstar mentioned this pull request Sep 1, 2021

Updated analysis: Annotate SNV table of gene-level mutation frequencies d3b-center/ticket-tracker-OPC#91

Closed

1 task

runjin326 mentioned this pull request Sep 16, 2021

Proposed Analysis: Annotate SNV table of mutation frequencies d3b-center/ticket-tracker-OPC#64

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate SNV table with mutation frequencies #45

Annotate SNV table with mutation frequencies #45

logstar commented Jun 30, 2021 •

edited

Loading

logstar commented Jul 1, 2021

jharenza left a comment

logstar commented Jul 2, 2021

logstar commented Jul 2, 2021

logstar commented Jul 3, 2021

kgaonkar6 commented Jul 6, 2021

logstar commented Jul 6, 2021

kgaonkar6 left a comment •

edited

Loading

logstar commented Jul 7, 2021

logstar commented Jul 23, 2021

logstar commented Jul 26, 2021

logstar commented Jul 26, 2021

jharenza left a comment

Annotate SNV table with mutation frequencies #45

Annotate SNV table with mutation frequencies #45

Conversation

logstar commented Jun 30, 2021 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

logstar commented Jul 1, 2021

jharenza left a comment

Choose a reason for hiding this comment

logstar commented Jul 2, 2021

logstar commented Jul 2, 2021

logstar commented Jul 3, 2021

kgaonkar6 commented Jul 6, 2021

logstar commented Jul 6, 2021

kgaonkar6 left a comment • edited Loading

Choose a reason for hiding this comment

logstar commented Jul 7, 2021

logstar commented Jul 23, 2021

logstar commented Jul 26, 2021

logstar commented Jul 26, 2021

jharenza left a comment

Choose a reason for hiding this comment

logstar commented Jun 30, 2021 •

edited

Loading

kgaonkar6 left a comment •

edited

Loading