Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CLIs and WDL for python gCNV pipeline. #3925

Merged
merged 1 commit into from
Jan 5, 2018
Merged

Conversation

samuelklee
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Dec 6, 2017

Codecov Report

Merging #3925 into master will increase coverage by 0.016%.
The diff coverage is 88.519%.

@@               Coverage Diff               @@
##              master     #3925       +/-   ##
===============================================
+ Coverage     78.769%   78.786%   +0.016%     
+ Complexity     16501     16490       -11     
===============================================
  Files           1065      1075       +10     
  Lines          58788     59026      +238     
  Branches        9578      9597       +19     
===============================================
+ Hits           46307     46504      +197     
- Misses          8754      8788       +34     
- Partials        3727      3734        +7
Impacted Files Coverage Δ Complexity Δ
.../formats/collections/ModeledSegmentCollection.java 69.767% <ø> (ø) 3 <0> (ø) ⬇️
...ls/copynumber/utils/CombineSegmentBreakpoints.java 89.796% <ø> (ø) 15 <0> (ø) ⬇️
...formats/collections/HDF5SimpleCountCollection.java 90% <ø> (ø) 14 <0> (ø) ⬇️
...er/formats/collections/AllelicCountCollection.java 100% <ø> (ø) 5 <0> (ø) ⬇️
...collections/MultidimensionalSegmentCollection.java 42.857% <ø> (ø) 2 <0> (ø) ⬇️
...ber/formats/collections/SimpleCountCollection.java 100% <ø> (ø) 11 <0> (ø) ⬇️
...ormats/collections/CopyRatioSegmentCollection.java 80% <ø> (ø) 4 <0> (ø) ⬇️
...s/collections/AlleleFractionSegmentCollection.java 45.455% <ø> (ø) 2 <0> (ø) ⬇️
...der/tools/copynumber/models/CopyRatioModeller.java 88.525% <ø> (ø) 10 <0> (ø) ⬇️
...ools/copynumber/models/AlleleFractionModeller.java 92.754% <ø> (ø) 17 <0> (ø) ⬇️
... and 121 more

@mbabadi
Copy link
Contributor

mbabadi commented Dec 7, 2017

@samuelklee awh hehe... will get rid of that column

@mbabadi
Copy link
Contributor

mbabadi commented Dec 8, 2017

@samuelklee gCNV integration tests and WDL tests pass locally. There are a number of issues:

  • Somatic WDL tests fail on travis, and haven't tried them locally. Those tests were failing before I made changes -- it's unlikely to be related to my changes. Could you please take a look?
  • Germline WDL tests are extremely slow on travis, perhaps there is no g++ and it is using python fallback? I don't know how to see the cromwell log files on travis. @lbergelson any pro tips?
  • Python-related integration tests all fail on travis (though not locally w/ a properly setup python environment).
  • I noticed that the python env in Dockerfile is based on Miniconda2. We strictly require Python 3 for gCNV.

@samuelklee
Copy link
Contributor Author

samuelklee commented Dec 8, 2017

I’ll take a look at the somatic tests. They should be OK, probably just something related to kebab case changes. EDIT: Or hmm, maybe they weren't passing before. Something to do with annotated-interval validation, I think.

I think the WDL tests should be using the Docker, which has g++. Travis machines might be slower?

Integration tests will need to be in the python test group. Take a look at the python tests.

@samuelklee
Copy link
Contributor Author

samuelklee commented Dec 8, 2017

OK, I think I figured out what was going on. In 9b194a6 I changed PreprocessIntervals to drop intervals with all Ns (if you remember, this was giving me NaNs in AnnotateIntervals, which gCNV didn't like). But I must not have rebuilt the somatic WGS PoNs and updated the copies in the large test resources. I could've sworn that I tested the somatic pipeline locally, but perhaps I forgot to update the jar at some point.

Ideally, we should figure out some way to use the PoNs built by the panel WDL tests in the subsequent tests for the case/pair workflows.

@samuelklee
Copy link
Contributor Author

@mbabadi I pushed new PoNs, I think that should get somatic tests to pass. Sorry about that, hope you didn't go too far down the rabbit hole trying to debug!

If you need to make changes to the Docker or Travis environments, perhaps coordinate with @cmnbroad to make them directly in #3912, which is still open.

@mbabadi
Copy link
Contributor

mbabadi commented Dec 8, 2017

@samuelklee haha I just figured it out and was about to write to you. Hopefully the tests will pass now. The changes I made to travis yml were experimental -- I am reverting them.

@mbabadi
Copy link
Contributor

mbabadi commented Dec 8, 2017

(except for Miniconda2 -> Miniconda3)

@mbabadi
Copy link
Contributor

mbabadi commented Dec 8, 2017

@samuelklee darn; the wgs pon related tests are still failing and the germline WDL exceeds the time limit (it completes in ~ 10 mins on gsa5, which is already quite slow, and ~ 3 mins on my laptop!). Something strange is going on. I wish there was an easy way to inspect the cromwell logs; perhaps there's a way to set cromwell logging to stdout.

@mbabadi
Copy link
Contributor

mbabadi commented Dec 8, 2017

@samuelklee OK it is not a cromwell issue -- after your commit, the somatic denoising integration tests are failing locally too:
java.lang.IllegalArgumentException: Sample intervals must be identical to the original intervals used to build the panel of normals.

I wouldn't worry about it now. It is most likely related to test resource files.

@samuelklee
Copy link
Contributor Author

@mbabadi I think the somatic tests are fixed: https://travis-ci.org/broadinstitute/gatk/builds/313333659?utm_source=email&utm_medium=notification Did you pull my commit?

Feel free to cut out some of the test cases if the WDL tests are too slow. Especially now that there is no real distinction between WGS/WES besides -L (except for in PreprocessIntervals, which integration tests should cover).

- ``CNVGermlineCohortWorkflow.cohort_entity_id`` -- Name of the cohort. Will be used as a prefix for output filenames.
- ``CNVGermlineCohortWorkflow.contig_ploidy_priors`` -- TSV file containing prior probabilities for the ploidy of each contig, with column headers: CONTIG_NAME, PLOIDY_PRIOR_0, PLOIDY_PRIOR_1, ...
- ``CNVGermlineCohortWorkflow.gatk_docker`` -- GATK Docker image (e.g., ``broadinstitute/gatk:x.beta.x``).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this.

@samuelklee
Copy link
Contributor Author

Rebased now that sl_wgs_acnv, sl_wgs_acnv_headers, and sl_delete_legacy_cnv code are all in. Phew!

@samuelklee samuelklee force-pushed the sl_gcnv_ploidy_cli branch 7 times, most recently from 63f9f8a to 84d937a Compare January 2, 2018 16:22
@samuelklee
Copy link
Contributor Author

Rebased and squashed on top of sl_wgs_acnv_headers_docs. Here is the log of squashed commits, for reference:

commit 3eda4b18888f38249be39f99901d8453a4de50d6
Author: Samuel Lee <[email protected]>
Date:   Thu Dec 28 14:56:27 2017 -0500

    updated command lines for WDL tests for C29

commit 7ce1369943cce4ae9cfb5e96455d18d3960e9b77
Author: Samuel Lee <[email protected]>
Date:   Thu Dec 28 13:30:21 2017 -0500

    use C29 and decrease gcnv_max_training_epochs

commit 68772cba486b44ebc8cf8bfc2b600c1e8a406c61
Author: Mehrtash Babadi <[email protected]>
Date:   Sun Dec 17 19:20:05 2017 +0330

    documentation update of GermlineCNVCaller and DetermineGermlineContigPloidy

commit c032281f8c43a80e4ec8cb96eb66397ad2acf9b7
Author: Samuel Lee <[email protected]>
Date:   Fri Dec 15 18:14:35 2017 -0500

    Fixed imr kebab case in WDL, moved argument classes, removed GenomeLocParser, fixed up gCNV WDL readme.

commit be84a804f6ab6fbb815995db9c116d1db950ab8b
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 12 13:55:08 2017 -0500

    removed extra comma in gCNV Case WDL test JSON

commit cb379b866d425f12f5525ecb28ad0b636a528d44
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 12 12:58:59 2017 -0500

    added missing cpu parameters to gCNV Case WDL tasks

commit eed85f6c70f4a7f15e0765b5f15a1bf8541c151e
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 12 11:26:31 2017 -0500

    disabled some gCNV WDL tests

commit 6d8ca07fef41518b5b157fb9a214d4536c617156
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 12 10:54:54 2017 -0500

    fixed DenoiseReadCountsIntegrationTest files

commit adfbef12f2ab90f93b49a4f786979549648e1f22
Author: Mehrtash Babadi <[email protected]>
Date:   Mon Dec 11 02:22:56 2017 -0500

    removed CNV evaluation code from this branch

commit 18c8d31f39a1964474c5d7b12ee8cbfafc4ac9e2
Author: Mehrtash Babadi <[email protected]>
Date:   Sun Dec 10 00:19:58 2017 -0500

    GS VCF parser outputs dict for samples instead of list

commit b138be39cd8428342668ee6678079021006f983b
Author: Mehrtash Babadi <[email protected]>
Date:   Sun Dec 10 00:15:19 2017 -0500

    renaming

commit eab5c90b74b4eb6bd11acb0fd1e0fa58a3b5b0c7
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 16:23:40 2017 -0500

    exposed a global preemptible_attempts to gCNV workflows
    set OMP_NUM_THREADS and MKL_NUM_THREADS to the number of requested CPUs

commit ad6fe348d6a7896c169b2b0499e2a4bca34021ad
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 10:21:25 2017 -0500

    reverted log level in germline CNV tests

commit d9eb4e504baab834a9efc07cc3479176db2946ce
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 10:20:30 2017 -0500

    the proper python environment yml for mkl and open -- leads to orders of magnitude speedup!

commit fea6bf874e0b62262a3b1d239ce4d76792d5c416
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 09:31:43 2017 -0500

    revert

commit 456c53f88d01b603f4175d8896a0dac036af03f8
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 08:17:22 2017 -0500

    enabled openmp g++ linking in theano

commit e2afef14ddb957f2dbdea76fd783d3bfb8d7a64e
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 08:04:19 2017 -0500

    mkl

commit 43e2a65201286161fcd5bfe7dbb21ae888e19dac
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 06:56:20 2017 -0500

    added cpu argument for germline tasks

commit 4433a62c2173c7f29d0f264c084bbaf2f6738782
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 02:45:38 2017 -0500

    revert travis yml forks
    verbose logging germline wdl

commit ae05801e33c37b3bf2685fba202032a270804873
Author: Samuel Lee <[email protected]>
Date:   Fri Dec 8 00:55:14 2017 -0500

    updated somatic PoNs for PreprocessIntervals drop Ns

commit cff64984d9fb42364001bda4c73d54cf68d85a5c
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 00:37:24 2017 -0500

    sudo travis yml

commit 89025941febd2089d426cfa1e0f0aa6a6712e2a9
Author: Mehrtash Babadi <[email protected]>
Date:   Fri Dec 8 00:23:22 2017 -0500

    travis/Docker config update (g++-6, Miniconda3)
    python test group assignment

commit 31f96398106c2b8577b8c25d110abea3ebe7f836
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 20:44:53 2017 -0500

    WDL test bugfix

commit 9b2fb820536ec355bea0256471bd093d547f5c99
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 20:20:36 2017 -0500

    update WDL test JSON files

commit e3d97644d1a2c40a5c364a96f8b67246154179c9
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 20:18:14 2017 -0500

    assertions in inference task base
    removed a ASCII > 128 character in log messages

commit 526cf92e623a3bbd5f9d375132b6ca046fc47620
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 20:03:04 2017 -0500

    redirect tqdm progress bar to python logger

commit 2e45bd30968b921fae225de3901fb97ece690b0c
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 19:45:49 2017 -0500

    more arg related fixes

commit bb89a3bb338d88199881e8aca65f656f2acd7c0a
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 19:41:20 2017 -0500

    arg related bugfixes in WDL, python, and java CLIs

commit 23569787ee2c8cc6c9227a44170cbbd02fe4427f
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 17:21:05 2017 -0500

    fixed issue with python boolean argparse (they use weird semantics)

commit ae841c9ed4cd9b2ca1ac0e9082d175ff8ea98298
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 16:44:02 2017 -0500

    shorter gCNV WDL tests

commit 5466b806e36df16cad2d045be074e7f9afec0957
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 16:38:15 2017 -0500

    fixed arg issues in somatic WDL
    exposed all missing args to java side
    major update to germline WDLs
    all optional python args exposed to WDLs as optional args

commit 50cb6fd08de15469a9080cbb27ff30c8b7ee7e21
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 13:50:45 2017 -0500

    missing serialVersionUID

commit 5f0f31eab63b0e6f6105708ded7f86c96c830781
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 13:35:33 2017 -0500

    annotated intervals kebab case
    updated germline WDL workflows

commit 29cc6234dbfb8db12559217a650c6ceb170c5797
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 13:15:28 2017 -0500

    cleanup test files

commit 08a35bb4e65eceb735adcd41a91132e9a34d2b66
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 02:50:19 2017 -0500

    update WDL scripts

commit 12bcfa192ee6fa6da21239ebf5b513633efe974f
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 02:47:33 2017 -0500

    significant updates to GermlineCNVCaller
    integration tests for GermlineCNVCaller w/ sim data in both run modes

commit 151416a4af735ca721bd75e4b54a780c17ac9397
Author: Mehrtash Babadi <[email protected]>
Date:   Thu Dec 7 01:42:05 2017 -0500

    hybrid ADVI abstract argument collection w/ flexible default values
    hybrid ADVI argument collection for contig ploidy model
    hybrid ADVI argument collection for germline denoising and calling model

commit 56e21bf955d3dc0c52aceb384f28cf6173959de0
Author: Mehrtash Babadi <[email protected]>
Date:   Wed Dec 6 23:18:39 2017 -0500

    rewritten python-side coverage metadata table reader using pandas to fix the issues with comment line
    change criterion for cohort/case based on whether a contig-ploidy model is provided or not
    simulated test files for ploidy determination tool
    proper integration test for ploidy determination tool and all edge cases
    updated docs for ploidy determination tool

commit 7fa104b2e9170770cfc5b338835e41215d7fd39c
Author: Mehrtash Babadi <[email protected]>
Date:   Wed Dec 6 18:43:17 2017 -0500

    kabab case for gCNV-related tools
    removed short args (this also partially affected PlotDenoisedCopyRatios and PlotModeledSegments and their integration tests)

commit f02cb024331a986213cfd9fae2da706bbc5ddbd9
Author: Mehrtash Babadi <[email protected]>
Date:   Wed Dec 6 18:02:40 2017 -0500

    synced with mb_gcnv_python_kernel

commit 2963bbf8c90418d9b88545c93771ae51cf542db9
Author: Samuel Lee <[email protected]>
Date:   Wed Dec 6 11:38:05 2017 -0500

    Fixing typo in travis.yml

commit 6cf589999c716ec66404eb0a2ae4310dd130a772
Author: Samuel Lee <[email protected]>
Date:   Wed Dec 6 11:13:58 2017 -0500

    editable, full path

commit d998f2d5c2b33dd41e291be9bfeaea72fe479b8a
Author: Samuel Lee <[email protected]>
Date:   Wed Dec 6 10:56:24 2017 -0500

    revert Dockerfile, change yml

commit 930d7486b7d2cf918fcb16dd03394bb9c9f0611b
Author: Samuel Lee <[email protected]>
Date:   Wed Dec 6 10:34:46 2017 -0500

    more Dockerfile

commit 94112131526b514ef254bcc2c50a239dbae35aa1
Author: Samuel Lee <[email protected]>
Date:   Wed Dec 6 10:25:13 2017 -0500

    more Dockerfile

commit 7d2646240a65f5c0f09f5f25f3e19e9d9bf004d9
Author: Samuel Lee <[email protected]>
Date:   Wed Dec 6 10:06:11 2017 -0500

    more Dockerfile

commit f1235c25aeba85570b5ce389a34975f1b7b5ec3c
Author: Samuel Lee <[email protected]>
Date:   Wed Dec 6 09:39:46 2017 -0500

    Dockerfile edit

commit 3df84dd4693f28e4e8b34fd33f877e99749dffce
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 5 16:08:06 2017 -0500

    Update test PoNs

commit 2c3b20e62a1cba7af24c0b0846eb1629422f51e6
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 5 15:49:38 2017 -0500

    Update test files

commit c65c6e9144ef396792364ab2e06b7b436bb97684
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 5 15:30:59 2017 -0500

    Adding no-GC/do-GC WDL tests.

commit 56451843066a456d9cf8e6eac55ae4df2c518ec3
Author: Samuel Lee <[email protected]>
Date:   Tue Dec 5 12:51:17 2017 -0500

    Updates to handle SAM header changes from sl_wgs_acnv_headers and updates to mb_gcnv_python_kernel.

commit d02d04df684a2820308a1d1c2bfda4b7d1c5f05e
Author: Samuel Lee <[email protected]>
Date:   Mon Nov 13 12:52:33 2017 -0500

    Added CLIs and WDL for python gCNV pipeline.

commit 66ed74b68375d43514ef84658e7a6c771ed9053c
Author: Mehrtash Babadi <[email protected]>
Date:   Wed Nov 15 01:50:03 2017 -0500

    Polished code, ready for review
    
    gCNV computational kernel (initial release)
    
    renaming gammas_s to psi_s to uniformity (sample-specific unexplained variance)
    
    renamed determine_ploidy_and_depth.py to cohort_determine_ploidy_and_depth.py
    finite-temperature forward-backward algorithm
    in the ploidy model, replaced alpha_j (NB over-dispersion) with psi_j (unexplained variance) for uniformity. Also, added the possibility of sample-specific unexplained variance in the germline contig ploidy model
    
    updated I/O routines and CLIs according to team discussion
    
    updated I/O routines and CLIs according to team discussion
    
    changed the output layout of the ploidy determination tool
    refactored parts of io.py
    upped the version to 0.3 as it is not backwards compatible anymore
    
    case ploidy determination tool from a given ploidy model
    major code cleanup and refactoring of I/O module
    refactoring of common CLI script snippets
    
    removed all "targets"
    some code cleanup
    
    pad flat class bitmask w/ a given padding value in the hybrid q_c_expectation_mode
    option to disable annealing and keep the temperature fixed
    
    bugfix in finite-temperature forward-backward
    further refactoring of model I/O
    
    the option to take a previously trained model as starting point in cohort CLI
    the option to take previous calls as a starting point in cohort CLI
    
    option to save and load adamax moments
    
    import/export adamax bias correction tensor
    
    refactoring related to fancy opt I/O
    added average ploidy column to read depth
    updated docs of hybrid inference
    
    modeling intervals can span multiple contigs now; ploidy can change
    across contigs with no issue
    
    save/load adamax state to .npy instead of .tsv for speed
    
    part 1 of doc updates
    
    part 2 of doc updates
    
    part 3 of doc updates
    
    part 4 of doc updates
    
    bumped version to 0.5
    readme
    
    update readme
    
    last minute stylistic doc updates.

@samuelklee
Copy link
Contributor Author

Split into commits to make things easier to review. The first commit, which is from sl_wgs_acnv_headers_docs, can be ignored and should be removed after that branch is merged. This should be ready for review, but the tests may fail if they run too long; we can trim them more after review.

@samuelklee
Copy link
Contributor Author

@sooheelee There is a bit of Javadoc for you to review in the "Added Java CLIs for python gCNV" commit.

@sooheelee
Copy link
Contributor

Ok @samuelklee. I will take a look.

Copy link
Contributor Author

@samuelklee samuelklee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About halfway through the python, the following remain:

   244 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py
    63 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py
    73 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/fancy_model.py
  1168 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py
   364 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_ploidy.py
   211 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/theano_hmm.py
   757 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py
  

@@ -0,0 +1,210 @@
import os
Copy link
Contributor Author

@samuelklee samuelklee Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a strategy for syncing these scripts from src/main/python/org/broadinstitute/hellbender/tools/gcnv/bin to src/main/resources/org/broadinstitute/hellbender/tools/copynumber. Perhaps just move everything to the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest solution is to let the scripts be treated as Java resources and not distributing them with gcnvkernel. I will come back to this in a bit.

- werkzeug==0.12.2

- "--editable=/gatk/src/main/python/org/broadinstitute/hellbender/tools/gcnv/"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad Could you take a look at these updates to the environment and let us know if you have any issues? Also, if you have any thoughts about more general python issues (e.g., is src/main/python a good place for the package, or should we put everything in src/main/resources?), please feel free to comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuelklee The env changes mostly seem fine (@lucidtronix - are you using agparse ?), except for the last line. Rather than the conda .yml be dependent on the repo structure, we should remove that last line, and instead rely on the changes implemented in #3964. It basically bundles up everything under src/main/python/org/broadinstitute/hellbender/ into a zip file which is then pip installed as part of the conda env, along with all of the other dependencies.

The only reason that #3964 currently says "DO NOT MERGE" is because I included a commit with a dummy python package, and a corresponding test, so I could validate that it works on travis. Let me know if you want to cherry-pick the core change, or we could merge that PR as is if you want to rebase on top of it.

So, I think most python code should live under src. You could also have tool-specific code that is kept as a tool resource to make it easy to call from Java, but that would be optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that all makes sense. I’m fine with merging that PR and rebasing this one on top of it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, its merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a copy of this branch that is rebased on master to https://github.com/broadinstitute/gatk/tree/sl_gcnv_ploidy_cli_rebased to make sure we can use the changes by @cmnbroad mentioned above. @mbabadi see the last commit in that branch for an example of how we might reorganize things.

I removed the duplicate copies of the gcnv/bin scripts (since these are already in src/main/resources), moved the gcnvkernel package to the same level as the gatkpython dummy package, and used find_packages() to install both as a single gatkpythonpackages package. I retained a (slightly modified) copy of @mbabadi's original setup.py for gcnvkernel, which one can use to install that package separately should one choose to.

Not sure if this is how we were planning to organize things (and I must admit that I'm not familiar with the subtleties and conventions of python packaging), but it seems to work---we'll see if tests pass in that copy of the branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tests run successfully in that branch, so what I did indeed works. (Unfortunately, the tests don't pass, but that's because they are still running long...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll come back to this in a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick note @cmnbroad: argparse is a core python3 package now and does not need to be installed separately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, got it.

return padded_interval

def overlaps_with(self, other):
assert isinstance(other, Interval), "{0} is not of Interval type".format(other)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you perform this check here and not in other methods? If it's not absolutely necessary, I'd be fine with dispensing of checks of this kind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. That's a main caveat of dynamic typing. No checks and you may get weird errors; and no one forces you to check so the get codes inconsistent...

return "GC_CONTENT"


class NameAnnotation(IntervalAnnotation):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No harm, but I will drop it to make you happy :D

else:
return np.abs(self.get_midpoint() - other.get_midpoint())

def __eq__(self, other):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these comparison operators used? Does this imply that intervals should be lexicographically ordered by default? If so, let's remove if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not assume well-ordered intervals in gcnvkernel (no sort op, etc.). In the evaluation suite, I rely on IntervalTree's own ordering. So I think I can safely drop the comparison ops.


assert sys.version_info >= (3, 4), "gcnvkernel requires Python 3.4.x or later"

VERSIONFILE="gcnvkernel/_version.py"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace around =.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pycharm "Inspect code" also reveals some other PEP8 violations and unused local variables, which should be easy to clean up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I just took a template and edited. Fixed PEP8 violations.

@@ -0,0 +1,71 @@
import numpy as np
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the name of this file.

type=str,
required=True,
default=argparse.SUPPRESS,
help="Output path to write the ploidy model for future single-sample ploidy determination use")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to specify "single-sample" here, since case mode works on multiple cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to keep the copy in src/main/resources synced, if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return self.copy_number_update_reduced


class CopyNumberEmissionSampler(Sampler):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicated in the cohort task?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Removed.

gcnvkernel.HybridInferenceParameters.expose_args(parser)


def update_args_dict_from_exported_model(input_model_path: str,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment: Use of type hinting is great but somewhat inconsistent. Can you file an issue to go back and clean up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do :) it is hard to be too consistent since using most external modules (even numpy) breaks type inference. One conservative (though, verbose) strategy is to hint everything, as if we are coding in a statically typed language.

Copy link
Contributor

@sooheelee sooheelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the subset of changes on the "Added Java CLIs for python gCNV" commit. The two tools in the commit, DetermineGermlineContigPloidy and GermlineCNVCaller, are missing the BetaFeature tag and could provide usage examples that are more informative. For example, what is the rule of thumb for increasing Xmx memory per sample added to the cohort?

*
* <p>COHORT run-mode:</p>
* <pre>
* gatk-launch --javaOptions "-Xmx4g" DetermineGermlineContigPloidy \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--javaOptions "-Xmx4g" --> --javaOptions "-Xmx100g"?

Is this one of those cases where we should specify the Xmx parameter? Does the tool require a lot of compute? Meaning could the tool err without it being specified? If so, it would be great for the Xmx parameter to reflect a larger value. For example, I had to use -Xmx100g for 40 exome samples with GermlineCNVCaller. Can we provide a rule of thumb on what is sufficient memory?

If the tool no longer requires a lot of compute, then we omit Xmx.

import htsjdk.samtools.SAMSequenceDictionary;
import org.broadinstitute.barclay.argparser.Advanced;
import org.broadinstitute.barclay.argparser.Argument;
import org.broadinstitute.barclay.argparser.ArgumentCollection;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a new tool. Does it need the BetaFeature tag?

import org.broadinstitute.barclay.argparser.BetaFeature;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

* path and must be not provided again.</dd>
* </dl>
*
* <h3>Examples</h3>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples --> Usage examples

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*
* <p>CASE run-mode:</p>
* <pre>
* gatk-launch --javaOptions "-Xmx4g" DetermineGermlineContigPloidy \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above on Xmx

* than 10000 consecutive intervals spanning at least 10 - 50 mb.</p></dd>
* </dl>
*
* <h3>Examples</h3>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> Usage examples

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*
* <p>COHORT run-mode:</p>
* <pre>
* gatk-launch --javaOptions "-Xmx4g" GermlineCNVCaller \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on Xmx above. It would be great to have some rule of thumb on how much memory is needed for X number of WES/WGS samples.

Tool doc says --run-mode is also a required argument so we should include these in the example commands.

* </dl>
*
* <h3>Important Remarks</h3>
* <dl>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if the usage example commands showcased parameters we expect users to commonly tweak, given certain types of data. Can we that the default parameters are meant for whole-genome sequences? What would a WES command look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still running internal evaluations to determine the best values of parameters, many of which determine priors for the gCNV model. We'll likely set tweak some of the default values to work well for data generated at the Broad, but as @mbabadi warns elsewhere in the documentation, there is no getting around the user having to do some experimentation to set these for their own data.

More generally, as our tools and models become more complex, we cannot expect that they'll run out of the box with one-size-fits-all parameters on all data. Hopefully we can avoid giving this impression in documentation as well.

Copy link
Contributor

@sooheelee sooheelee Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirical example commands of what we use in production, plus a general description of the data type, would be great to showcase in the usage examples. We can update these down the road when you are all settled with evals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sooheelee @samuelklee While I agree with the complexity of the models and the inevitable non-universality of parameters, I am still hopeful that we can find decent defaults for WGS, WGS, and gene panel data. It requires extensive evaluation on Broad and non-Broad data and is beyond the scope of the first release IMO. At least, we will try to ship the first version with non-garbage default parameters :D

* -L intervals.interval_list \
* --model previous_model_path \
* --input normal_1.counts.hdf5 \
* ... \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And --run-mode.

import org.broadinstitute.barclay.argparser.Advanced;
import org.broadinstitute.barclay.argparser.Argument;
import org.broadinstitute.barclay.argparser.ArgumentCollection;
import org.broadinstitute.barclay.argparser.CommandLineProgramProperties;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this tool out of Beta status now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a complete python-based rewrite of the old Spark-based GermlineCNVCaller, so it should still be marked Beta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sooheelee sooheelee assigned sooheelee and unassigned sooheelee Jan 3, 2018
@sooheelee
Copy link
Contributor

I've put in my two cents for the two tool docs @samuelklee.

@samuelklee
Copy link
Contributor Author

Great, thanks! Also, I think I commented above that both tools will indeed be beta; @mbabadi will add these tags.

Copy link
Contributor Author

@samuelklee samuelklee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done with my review. Great job, @mbabadi! The python code looks very clean, and the WDL looks to be in great shape as well.

I am not sure how far along @MartonKN and @asmirnov239 are with their reviews, but let's try to get this merged ASAP. We need to get sl_preprocess in with some time to spare on Friday (since we will have to make a few last-minute WDL changes on top of that before cutting the prerelease), and this branch needs to go in before that one. Most of my comments here are minor and can be addressed after release if necessary, just be sure to file issues for anything you punt.

import theano.tensor as tt


class PositiveFlatTop(PositiveContinuous):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore, but it is a useful distribution and I think I will end up using it at some point. I'll keep it here for now.

mu > 0, value >= 0, alpha > 0)


def negative_binomial_gaussian_approx_logp(mu, alpha, value):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere? Same for the next two methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, I used vulture to find the following list of possibly unused methods. Please use your discretion to clean up:

src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/__init__.py:21: unused import 'IntervalListMask' (90% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/config.py:5: unused variable 'log_eps' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/inference/deterministic_annealing.py:24: unused function 'apply' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/inference/fancy_optimizers.py:78: unused attribute 'epsilon' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/inference/param_tracker.py:68: unused function 'clear' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/io/io_denoising_calling.py:42: unused function '_export_dict_to_json_file' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/io/io_intervals_and_counts.py:156: unused function 'write_interval_list_to_tsv_file' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/io/io_metadata.py:17: unused function 'write_sample_coverage_metadata' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:38: unused function 'poisson_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:76: unused function 'negative_binomial_gaussian_approx_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:96: unused function 'negative_binomial_smart_approx_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:111: unused function 'centered_heavy_tail_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:31: unused variable 'point' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:31: unused variable 'repeat' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:40: unused function '_repr_latex_' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:53: unused attribute '_default' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:56: unused variable 'point' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:56: unused variable 'repeat' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:62: unused function '_repr_latex_' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py:617: unused function 'get_init_psi_t' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py:641: unused function 'get_init_psi_t' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py:822: unused attribute 'model_config' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_ploidy.py:52: unused attribute 'unordered_contig_list' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/theano_hmm.py:185: unused variable 'alpha_updates' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/theano_hmm.py:197: unused variable 'beta_updates' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:13: unused class 'IntervalListMask' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:32: unused function 'get_masked_view' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:52: unused function 'keep_only_given_contigs' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:66: unused function 'drop_blacklisted_intervals' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:80: unused function 'drop_cohort_wide_uncovered_intervals' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:96: unused function 'drop_intervals_with_anomalous_coverage' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:74: unused function 'get_contig_total_count' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:78: unused function 'get_total_count' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:81: unused function 'generate_sample_coverage_metadata' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:128: unused function 'get_contig_ploidy_genotyping_quality' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:208: unused function 'get_average_ploidy' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:283: unused function 'get_sample_read_depth_array' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:287: unused function 'get_sample_contig_ploidy_array' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py:105: unused function 'flush' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py:315: unused attribute 'latest_caller_update_summary' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py:480: unused attribute 'latest_caller_update_summary' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_case_denoising_calling.py:52: unused attribute 'copy_number_update_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_case_denoising_calling.py:53: unused attribute 'copy_number_log_likelihoods_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_cohort_denoising_calling.py:52: unused attribute 'copy_number_update_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_cohort_denoising_calling.py:53: unused attribute 'copy_number_log_likelihoods_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_cohort_denoising_calling.py:55: unused attribute 'class_log_likelihood' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/types.py:26: unused variable 'TheanoScalar' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/utils/cli_commons.py:24: unused function '_get_default_metavar_for_optional' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/utils/cli_commons.py:27: unused function '_get_default_metavar_for_positional' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/utils/rls.py:18: unused attribute '_lambda' (60% confidence)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, this was pretty useful, thanks! the confidence levels are interesting -- it is impossible to know for sure whether an apparently unused attribute is truly unused. In fact, it has made a quite a few errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed poisson_logp but kept negative_binomial_gaussian_approx_logp (will make an issue about it -- there's a potential for 2~3x speedup if we can pull it off with the right hack).

@@ -0,0 +1,757 @@
import numpy as np
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #4043.

_logger.debug('The {0} for epoch {1} successfully finished in {2:.2f}s'.format(
task_name, i_epoch, self._t1 - self._t0))

def _log_interrupt(self, task_name: str, i_epoch: int):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why do you take special precautions for KeyboardInterrupt? Is this a TQDM thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very useful for interactive mode. I don't think it would have any effect if the scripts are run via PythonScriptExecutor since stdin is not piped (is it @cmnbroad?)

Copy link
Collaborator

@cmnbroad cmnbroad Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Technically I think you could write to stdin and the underlying process controller would honor it, but the PythonScriptExecutor doesn't rely on it (since you always run a script, module, or command line command). Thats what StreamingPythonScriptExecutor is for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad Perfect, thank you!

@@ -0,0 +1,211 @@
import numpy as np
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #4043. Let's also add some tests for the HMM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited your issue.

return validated_contig_ploidy_prior_map, num_ploidy_states

@staticmethod
def get_contig_ploidy_prior_map_from_tsv_file(input_path: str,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be in io_ploidy.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return PloidyModelConfig(**relevant_kwargs)


class PloidyWorkspace:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: if we split off the inference package, it would be useful to have a simple model as a pedagogical example for developers. Such a model could highlight tricks like the sharing of tensors here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree, and I think it is a good idea to do so. Will make an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nevermind, you already made it.

return fact * np.ones((self.denoising_model_config.max_bias_factors,), dtype=types.floatX)


class DenoisingModel(GeneralizedContinuousModel):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to say, this looks great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!! ;-)

@mbabadi
Copy link
Contributor

mbabadi commented Jan 5, 2018

@samuelklee Thanks for the PR review, I have implemented most changes and will make issues for the remaining momentarily. Waiting for the travis tests to pass.

@mbabadi
Copy link
Contributor

mbabadi commented Jan 5, 2018

@sooheelee Thank you for the docs review! I applied all of the changes. Will coordinate with you regarding the tutorial and memory allocation tips after the release.

@samuelklee
Copy link
Contributor Author

OK, just a few more minor doc comments from me. Also the issue about preemptible_attempts above. Otherwise good to rebase and merge! I will take care of trimming tests and reorganizing the packaging in another PR as we discussed.

@samuelklee
Copy link
Contributor Author

@asmirnov239 @MartonKN Comment now (or hold your peace until after release!) We will merge soon otherwise.

@droazen
Copy link
Contributor

droazen commented Jan 5, 2018

@samuelklee Note that --javaOptions is now --java-options

@mbabadi mbabadi force-pushed the sl_gcnv_ploidy_cli branch 2 times, most recently from 6c4594a to 3ee825d Compare January 5, 2018 17:47
@samuelklee
Copy link
Contributor Author

Thanks for catching that @droazen, but I think we removed those for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants