-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added CLIs and WDL for python gCNV pipeline. #3925
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3925 +/- ##
===============================================
+ Coverage 78.769% 78.786% +0.016%
+ Complexity 16501 16490 -11
===============================================
Files 1065 1075 +10
Lines 58788 59026 +238
Branches 9578 9597 +19
===============================================
+ Hits 46307 46504 +197
- Misses 8754 8788 +34
- Partials 3727 3734 +7
|
@samuelklee awh hehe... will get rid of that column |
@samuelklee gCNV integration tests and WDL tests pass locally. There are a number of issues:
|
I’ll take a look at the somatic tests. They should be OK, probably just something related to kebab case changes. EDIT: Or hmm, maybe they weren't passing before. Something to do with annotated-interval validation, I think. I think the WDL tests should be using the Docker, which has g++. Travis machines might be slower? Integration tests will need to be in the python test group. Take a look at the python tests. |
OK, I think I figured out what was going on. In 9b194a6 I changed PreprocessIntervals to drop intervals with all Ns (if you remember, this was giving me NaNs in AnnotateIntervals, which gCNV didn't like). But I must not have rebuilt the somatic WGS PoNs and updated the copies in the large test resources. I could've sworn that I tested the somatic pipeline locally, but perhaps I forgot to update the jar at some point. Ideally, we should figure out some way to use the PoNs built by the panel WDL tests in the subsequent tests for the case/pair workflows. |
@mbabadi I pushed new PoNs, I think that should get somatic tests to pass. Sorry about that, hope you didn't go too far down the rabbit hole trying to debug! If you need to make changes to the Docker or Travis environments, perhaps coordinate with @cmnbroad to make them directly in #3912, which is still open. |
@samuelklee haha I just figured it out and was about to write to you. Hopefully the tests will pass now. The changes I made to travis yml were experimental -- I am reverting them. |
(except for Miniconda2 -> Miniconda3) |
@samuelklee darn; the wgs pon related tests are still failing and the germline WDL exceeds the time limit (it completes in ~ 10 mins on gsa5, which is already quite slow, and ~ 3 mins on my laptop!). Something strange is going on. I wish there was an easy way to inspect the cromwell logs; perhaps there's a way to set cromwell logging to stdout. |
@samuelklee OK it is not a cromwell issue -- after your commit, the somatic denoising integration tests are failing locally too: I wouldn't worry about it now. It is most likely related to test resource files. |
@mbabadi I think the somatic tests are fixed: https://travis-ci.org/broadinstitute/gatk/builds/313333659?utm_source=email&utm_medium=notification Did you pull my commit? Feel free to cut out some of the test cases if the WDL tests are too slow. Especially now that there is no real distinction between WGS/WES besides -L (except for in PreprocessIntervals, which integration tests should cover). |
scripts/cnv_wdl/germline/README.md
Outdated
- ``CNVGermlineCohortWorkflow.cohort_entity_id`` -- Name of the cohort. Will be used as a prefix for output filenames. | ||
- ``CNVGermlineCohortWorkflow.contig_ploidy_priors`` -- TSV file containing prior probabilities for the ploidy of each contig, with column headers: CONTIG_NAME, PLOIDY_PRIOR_0, PLOIDY_PRIOR_1, ... | ||
- ``CNVGermlineCohortWorkflow.gatk_docker`` -- GATK Docker image (e.g., ``broadinstitute/gatk:x.beta.x``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this.
92ae1bc
to
3c1ca91
Compare
Rebased now that sl_wgs_acnv, sl_wgs_acnv_headers, and sl_delete_legacy_cnv code are all in. Phew! |
63f9f8a
to
84d937a
Compare
Rebased and squashed on top of sl_wgs_acnv_headers_docs. Here is the log of squashed commits, for reference:
|
84d937a
to
6511ca3
Compare
Split into commits to make things easier to review. The first commit, which is from sl_wgs_acnv_headers_docs, can be ignored and should be removed after that branch is merged. This should be ready for review, but the tests may fail if they run too long; we can trim them more after review. |
@sooheelee There is a bit of Javadoc for you to review in the "Added Java CLIs for python gCNV" commit. |
Ok @samuelklee. I will take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About halfway through the python, the following remain:
244 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py
63 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py
73 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/fancy_model.py
1168 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py
364 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_ploidy.py
211 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/theano_hmm.py
757 src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py
@@ -0,0 +1,210 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a strategy for syncing these scripts from src/main/python/org/broadinstitute/hellbender/tools/gcnv/bin to src/main/resources/org/broadinstitute/hellbender/tools/copynumber. Perhaps just move everything to the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest solution is to let the scripts be treated as Java resources and not distributing them with gcnvkernel. I will come back to this in a bit.
scripts/gatkcondaenv.yml
Outdated
- werkzeug==0.12.2 | ||
|
||
- "--editable=/gatk/src/main/python/org/broadinstitute/hellbender/tools/gcnv/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad Could you take a look at these updates to the environment and let us know if you have any issues? Also, if you have any thoughts about more general python issues (e.g., is src/main/python a good place for the package, or should we put everything in src/main/resources?), please feel free to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samuelklee The env changes mostly seem fine (@lucidtronix - are you using agparse ?), except for the last line. Rather than the conda .yml be dependent on the repo structure, we should remove that last line, and instead rely on the changes implemented in #3964. It basically bundles up everything under src/main/python/org/broadinstitute/hellbender/ into a zip file which is then pip installed as part of the conda env, along with all of the other dependencies.
The only reason that #3964 currently says "DO NOT MERGE" is because I included a commit with a dummy python package, and a corresponding test, so I could validate that it works on travis. Let me know if you want to cherry-pick the core change, or we could merge that PR as is if you want to rebase on top of it.
So, I think most python code should live under src. You could also have tool-specific code that is kept as a tool resource to make it easy to call from Java, but that would be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that all makes sense. I’m fine with merging that PR and rebasing this one on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, its merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a copy of this branch that is rebased on master to https://github.com/broadinstitute/gatk/tree/sl_gcnv_ploidy_cli_rebased to make sure we can use the changes by @cmnbroad mentioned above. @mbabadi see the last commit in that branch for an example of how we might reorganize things.
I removed the duplicate copies of the gcnv/bin scripts (since these are already in src/main/resources), moved the gcnvkernel package to the same level as the gatkpython dummy package, and used find_packages() to install both as a single gatkpythonpackages package. I retained a (slightly modified) copy of @mbabadi's original setup.py for gcnvkernel, which one can use to install that package separately should one choose to.
Not sure if this is how we were planning to organize things (and I must admit that I'm not familiar with the subtleties and conventions of python packaging), but it seems to work---we'll see if tests pass in that copy of the branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like tests run successfully in that branch, so what I did indeed works. (Unfortunately, the tests don't pass, but that's because they are still running long...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll come back to this in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick note @cmnbroad: argparse
is a core python3 package now and does not need to be installed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, got it.
return padded_interval | ||
|
||
def overlaps_with(self, other): | ||
assert isinstance(other, Interval), "{0} is not of Interval type".format(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why you perform this check here and not in other methods? If it's not absolutely necessary, I'd be fine with dispensing of checks of this kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. That's a main caveat of dynamic typing. No checks and you may get weird errors; and no one forces you to check so the get codes inconsistent...
return "GC_CONTENT" | ||
|
||
|
||
class NameAnnotation(IntervalAnnotation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No harm, but I will drop it to make you happy :D
else: | ||
return np.abs(self.get_midpoint() - other.get_midpoint()) | ||
|
||
def __eq__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these comparison operators used? Does this imply that intervals should be lexicographically ordered by default? If so, let's remove if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not assume well-ordered intervals in gcnvkernel
(no sort op, etc.). In the evaluation suite, I rely on IntervalTree
's own ordering. So I think I can safely drop the comparison ops.
|
||
assert sys.version_info >= (3, 4), "gcnvkernel requires Python 3.4.x or later" | ||
|
||
VERSIONFILE="gcnvkernel/_version.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whitespace around =
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pycharm "Inspect code" also reveals some other PEP8 violations and unused local variables, which should be easy to clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I just took a template and edited. Fixed PEP8 violations.
@@ -0,0 +1,71 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the name of this file.
type=str, | ||
required=True, | ||
default=argparse.SUPPRESS, | ||
help="Output path to write the ploidy model for future single-sample ploidy determination use") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to specify "single-sample" here, since case mode works on multiple cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to keep the copy in src/main/resources synced, if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return self.copy_number_update_reduced | ||
|
||
|
||
class CopyNumberEmissionSampler(Sampler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this duplicated in the cohort task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Removed.
gcnvkernel.HybridInferenceParameters.expose_args(parser) | ||
|
||
|
||
def update_args_dict_from_exported_model(input_model_path: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment: Use of type hinting is great but somewhat inconsistent. Can you file an issue to go back and clean up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do :) it is hard to be too consistent since using most external modules (even numpy
) breaks type inference. One conservative (though, verbose) strategy is to hint everything, as if we are coding in a statically typed language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the subset of changes on the "Added Java CLIs for python gCNV" commit. The two tools in the commit, DetermineGermlineContigPloidy and GermlineCNVCaller, are missing the BetaFeature tag and could provide usage examples that are more informative. For example, what is the rule of thumb for increasing Xmx memory per sample added to the cohort?
* | ||
* <p>COHORT run-mode:</p> | ||
* <pre> | ||
* gatk-launch --javaOptions "-Xmx4g" DetermineGermlineContigPloidy \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--javaOptions "-Xmx4g"
--> --javaOptions "-Xmx100g"
?
Is this one of those cases where we should specify the Xmx parameter? Does the tool require a lot of compute? Meaning could the tool err without it being specified? If so, it would be great for the Xmx parameter to reflect a larger value. For example, I had to use -Xmx100g for 40 exome samples with GermlineCNVCaller. Can we provide a rule of thumb on what is sufficient memory?
If the tool no longer requires a lot of compute, then we omit Xmx.
import htsjdk.samtools.SAMSequenceDictionary; | ||
import org.broadinstitute.barclay.argparser.Advanced; | ||
import org.broadinstitute.barclay.argparser.Argument; | ||
import org.broadinstitute.barclay.argparser.ArgumentCollection; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be a new tool. Does it need the BetaFeature tag?
import org.broadinstitute.barclay.argparser.BetaFeature;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
* path and must be not provided again.</dd> | ||
* </dl> | ||
* | ||
* <h3>Examples</h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples --> Usage examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* | ||
* <p>CASE run-mode:</p> | ||
* <pre> | ||
* gatk-launch --javaOptions "-Xmx4g" DetermineGermlineContigPloidy \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above on Xmx
* than 10000 consecutive intervals spanning at least 10 - 50 mb.</p></dd> | ||
* </dl> | ||
* | ||
* <h3>Examples</h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> Usage examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* | ||
* <p>COHORT run-mode:</p> | ||
* <pre> | ||
* gatk-launch --javaOptions "-Xmx4g" GermlineCNVCaller \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on Xmx above. It would be great to have some rule of thumb on how much memory is needed for X number of WES/WGS samples.
Tool doc says --run-mode
is also a required argument so we should include these in the example commands.
* </dl> | ||
* | ||
* <h3>Important Remarks</h3> | ||
* <dl> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if the usage example commands showcased parameters we expect users to commonly tweak, given certain types of data. Can we that the default parameters are meant for whole-genome sequences? What would a WES command look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still running internal evaluations to determine the best values of parameters, many of which determine priors for the gCNV model. We'll likely set tweak some of the default values to work well for data generated at the Broad, but as @mbabadi warns elsewhere in the documentation, there is no getting around the user having to do some experimentation to set these for their own data.
More generally, as our tools and models become more complex, we cannot expect that they'll run out of the box with one-size-fits-all parameters on all data. Hopefully we can avoid giving this impression in documentation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empirical example commands of what we use in production, plus a general description of the data type, would be great to showcase in the usage examples. We can update these down the road when you are all settled with evals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sooheelee @samuelklee While I agree with the complexity of the models and the inevitable non-universality of parameters, I am still hopeful that we can find decent defaults for WGS, WGS, and gene panel data. It requires extensive evaluation on Broad and non-Broad data and is beyond the scope of the first release IMO. At least, we will try to ship the first version with non-garbage default parameters :D
* -L intervals.interval_list \ | ||
* --model previous_model_path \ | ||
* --input normal_1.counts.hdf5 \ | ||
* ... \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And --run-mode
.
import org.broadinstitute.barclay.argparser.Advanced; | ||
import org.broadinstitute.barclay.argparser.Argument; | ||
import org.broadinstitute.barclay.argparser.ArgumentCollection; | ||
import org.broadinstitute.barclay.argparser.CommandLineProgramProperties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tool out of Beta status now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a complete python-based rewrite of the old Spark-based GermlineCNVCaller, so it should still be marked Beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I've put in my two cents for the two tool docs @samuelklee. |
Great, thanks! Also, I think I commented above that both tools will indeed be beta; @mbabadi will add these tags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done with my review. Great job, @mbabadi! The python code looks very clean, and the WDL looks to be in great shape as well.
I am not sure how far along @MartonKN and @asmirnov239 are with their reviews, but let's try to get this merged ASAP. We need to get sl_preprocess in with some time to spare on Friday (since we will have to make a few last-minute WDL changes on top of that before cutting the prerelease), and this branch needs to go in before that one. Most of my comments here are minor and can be addressed after release if necessary, just be sure to file issues for anything you punt.
import theano.tensor as tt | ||
|
||
|
||
class PositiveFlatTop(PositiveContinuous): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not anymore, but it is a useful distribution and I think I will end up using it at some point. I'll keep it here for now.
mu > 0, value >= 0, alpha > 0) | ||
|
||
|
||
def negative_binomial_gaussian_approx_logp(mu, alpha, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere? Same for the next two methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, I used vulture
to find the following list of possibly unused methods. Please use your discretion to clean up:
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/__init__.py:21: unused import 'IntervalListMask' (90% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/config.py:5: unused variable 'log_eps' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/inference/deterministic_annealing.py:24: unused function 'apply' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/inference/fancy_optimizers.py:78: unused attribute 'epsilon' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/inference/param_tracker.py:68: unused function 'clear' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/io/io_denoising_calling.py:42: unused function '_export_dict_to_json_file' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/io/io_intervals_and_counts.py:156: unused function 'write_interval_list_to_tsv_file' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/io/io_metadata.py:17: unused function 'write_sample_coverage_metadata' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:38: unused function 'poisson_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:76: unused function 'negative_binomial_gaussian_approx_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:96: unused function 'negative_binomial_smart_approx_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/commons.py:111: unused function 'centered_heavy_tail_logp' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:31: unused variable 'point' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:31: unused variable 'repeat' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:40: unused function '_repr_latex_' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:53: unused attribute '_default' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:56: unused variable 'point' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:56: unused variable 'repeat' (100% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/dists.py:62: unused function '_repr_latex_' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py:617: unused function 'get_init_psi_t' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py:641: unused function 'get_init_psi_t' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_denoising_calling.py:822: unused attribute 'model_config' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/model_ploidy.py:52: unused attribute 'unordered_contig_list' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/theano_hmm.py:185: unused variable 'alpha_updates' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/models/theano_hmm.py:197: unused variable 'beta_updates' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:13: unused class 'IntervalListMask' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:32: unused function 'get_masked_view' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:52: unused function 'keep_only_given_contigs' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:66: unused function 'drop_blacklisted_intervals' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:80: unused function 'drop_cohort_wide_uncovered_intervals' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/preprocess/interval_list_mask.py:96: unused function 'drop_intervals_with_anomalous_coverage' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:74: unused function 'get_contig_total_count' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:78: unused function 'get_total_count' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:81: unused function 'generate_sample_coverage_metadata' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:128: unused function 'get_contig_ploidy_genotyping_quality' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:208: unused function 'get_average_ploidy' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:283: unused function 'get_sample_read_depth_array' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/structs/metadata.py:287: unused function 'get_sample_contig_ploidy_array' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py:105: unused function 'flush' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py:315: unused attribute 'latest_caller_update_summary' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/inference_task_base.py:480: unused attribute 'latest_caller_update_summary' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_case_denoising_calling.py:52: unused attribute 'copy_number_update_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_case_denoising_calling.py:53: unused attribute 'copy_number_log_likelihoods_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_cohort_denoising_calling.py:52: unused attribute 'copy_number_update_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_cohort_denoising_calling.py:53: unused attribute 'copy_number_log_likelihoods_s' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/tasks/task_cohort_denoising_calling.py:55: unused attribute 'class_log_likelihood' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/types.py:26: unused variable 'TheanoScalar' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/utils/cli_commons.py:24: unused function '_get_default_metavar_for_optional' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/utils/cli_commons.py:27: unused function '_get_default_metavar_for_positional' (60% confidence)
src/main/python/org/broadinstitute/hellbender/tools/gcnv/gcnvkernel/utils/rls.py:18: unused attribute '_lambda' (60% confidence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, this was pretty useful, thanks! the confidence levels are interesting -- it is impossible to know for sure whether an apparently unused attribute is truly unused. In fact, it has made a quite a few errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed poisson_logp
but kept negative_binomial_gaussian_approx_logp
(will make an issue about it -- there's a potential for 2~3x speedup if we can pull it off with the right hack).
@@ -0,0 +1,757 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #4043.
_logger.debug('The {0} for epoch {1} successfully finished in {2:.2f}s'.format( | ||
task_name, i_epoch, self._t1 - self._t0)) | ||
|
||
def _log_interrupt(self, task_name: str, i_epoch: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why do you take special precautions for KeyboardInterrupt
? Is this a TQDM thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very useful for interactive mode. I don't think it would have any effect if the scripts are run via PythonScriptExecutor
since stdin
is not piped (is it @cmnbroad?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Technically I think you could write to stdin and the underlying process controller would honor it, but the PythonScriptExecutor doesn't rely on it (since you always run a script, module, or command line command). Thats what StreamingPythonScriptExecutor is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad Perfect, thank you!
@@ -0,0 +1,211 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #4043. Let's also add some tests for the HMM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited your issue.
return validated_contig_ploidy_prior_map, num_ploidy_states | ||
|
||
@staticmethod | ||
def get_contig_ploidy_prior_map_from_tsv_file(input_path: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be in io_ploidy.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return PloidyModelConfig(**relevant_kwargs) | ||
|
||
|
||
class PloidyWorkspace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: if we split off the inference package, it would be useful to have a simple model as a pedagogical example for developers. Such a model could highlight tricks like the sharing of tensors here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree, and I think it is a good idea to do so. Will make an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nevermind, you already made it.
return fact * np.ones((self.denoising_model_config.max_bias_factors,), dtype=types.floatX) | ||
|
||
|
||
class DenoisingModel(GeneralizedContinuousModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to say, this looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!! ;-)
@samuelklee Thanks for the PR review, I have implemented most changes and will make issues for the remaining momentarily. Waiting for the travis tests to pass. |
@sooheelee Thank you for the docs review! I applied all of the changes. Will coordinate with you regarding the tutorial and memory allocation tips after the release. |
OK, just a few more minor doc comments from me. Also the issue about preemptible_attempts above. Otherwise good to rebase and merge! I will take care of trimming tests and reorganizing the packaging in another PR as we discussed. |
@asmirnov239 @MartonKN Comment now (or hold your peace until after release!) We will merge soon otherwise. |
@samuelklee Note that |
6c4594a
to
3ee825d
Compare
Thanks for catching that @droazen, but I think we removed those for now. |
3ee825d
to
99e5997
Compare
No description provided.