-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port of nvscorevariants into GATK, with a basic tool frontend #8004
Conversation
Github actions tests reported job failures from actions build 2935907552
|
Thanks, @droazen! @asmirnov239 has been looking at PyMC3 updates for gCNV, which will help unlock the conda environment. I understand he has a working branch, but needs to do more testing—perhaps he can comment further? |
Thanks @droazen! What data are you using to test the 2D model? And can we have access to your verification method? |
Github actions tests reported job failures from actions build 3002176541
|
Github actions tests reported job failures from actions build 3092731818
|
@zamirai I've incorporated your patch from https://github.com/NVIDIA-Genomics-Research/nvscorevariants/commit/937ffafb78b0f3e7df9b1edc3b08d11e3ebee35a into this PR. With this change, the 2D tests now pass, even when I reduce the epsilon to 0.01. Thanks for the fix! @asmirnov239 is now working on merging the new conda environment into the GATK conda environment and making the necessary updates to existing tools. This will likely require at least another few weeks. |
Github actions tests reported job failures from actions build 3092905417
|
d7470f1
to
643fa70
Compare
Rebased onto latest master |
Github actions tests reported job failures from actions build 3291375153
|
55d88f8
to
643fa70
Compare
Github actions tests reported job failures from actions build 3300297321
|
Github actions tests reported job failures from actions build 3300316784
|
643fa70
to
22902a1
Compare
Github actions tests reported job failures from actions build 11076165405
|
Github actions tests reported job failures from actions build 11244663091
|
Github actions tests reported job failures from actions build 11245220382
|
@lbergelson @KevinCLydon This branch now passes all tests, is rebased onto the latest master, and is (finally) using the official GATK Python environment rather than the custom NVIDIA-provided one. I had to add two additional Python dependencies to our environment, and make some small modifications to the Python code to account for a newer version of pytorch-lightning that was required. The final outstanding issue in this PR is that I had to temporarily comment out the Jacoco coverage report code in our build.gradle and dockertest.gradle files, due to a bizarre problem where Jacoco was attempting to read/parse the new Pytorch model files added in this branch. This will have to be resolved before we can merge (or we might have to permanently disable Jacoco if it can't be...). After this is merged, there will have to be a second PR that adds the CNN tools to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked through all the code, but I did find some stuff related to what we talked about earlier today
trainer = pl.Trainer(gradient_clip_val=1.0) | ||
|
||
test_dataset = ReferenceDataset(tensor_reader) | ||
test_loader = DataLoader(test_dataset, batch_size=64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a batch size argument defined in the argument parser, which I would assume would be used here, but it doesn't look like it's wired up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used for something else instead? Lets wire it through.
parser.add_argument('--seed', type=int, default=724, help='Seed to initialize the random number generator') | ||
parser.add_argument('--tmp-file', default='tmp.txt', help='The temporary VCF-like file where variants scores will be written') | ||
parser.add_argument('--output-file', required=True, help='Output VCF file') | ||
parser.add_argument('--gpus', type=int, nargs='+', help='Number of GPUs (int) or which GPUs (list)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I figured this one out. The gpus
arg was deprecated in lightning 1.7 and removed in 2.0. Best I can tell from reading the lightning docs, the way this information is now meant to be supplied to the Trainer
object is using two arguments: accelerator
for picking the type of processor (cpu, gpu, etc.), and devices
for specifying how many to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth wiring this up to the Java frontend, then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. Should we expose that? If you don't specify but you have a GPU available what does it do by default? We don't want it to ignore available compute.
else: | ||
sys.exit('Unknown tensor type!') | ||
model = get_model(args, model_file) | ||
trainer = pl.Trainer(gradient_clip_val=1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I figured out the add_argparse_args
thing. If I understand the docs correctly, it's basically there to enable you to skip writing out an exhaustive ArgumentParser
. Basically, it makes it so if you, as a person running this tool, use some args that aren't explicitly added to the ArgumentParser
, but they are in the list of args that can be passed to a Trainer
, lightning says "Oh, I recognize these," and then handles them as if you'd manually defined them. I think now, again if I'm reading the docs correctly, that's the default behavior for the ArgumentParser
, but we might have to explicitly pass them to the trainer? That doc section I linked doesn't include an example that makes that super clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment implies that perhaps we should pass **vars(args)
in to the Trainer here, but only if we actually want to expose any actual Trainer arguments as CLI arguments: https://lightning.ai/forums/t/how-to-combine-ptl-arguments-with-argumentparser/2440/2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's kind of neat. Not how GATK arg parsing works. So it would be hard to match that behavior in the GATK front end. I guess we could have an "--additional args" that just get passed through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new framework (as that comment explains), you need to explicitly add any arguments you want to expose to the Python arg parser.
@@ -0,0 +1,200 @@ | |||
from unittest import TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbergelson @KevinCLydon There are a number of Python-based unit tests under src/main/python/org/broadinstitute/hellbender/scorevariants/tests
-- ideally we should find a way to run these as well, perhaps in NVScoreVariantsIntegrationTest
via a PythonScriptExecutor
.
@KevinCLydon I addressed the comments that we had here. I didn't get the python unit test running yet though. (or fix jaccoco) |
Github actions tests reported job failures from actions build 11333175464
|
Github actions tests reported job failures from actions build 11335027058
|
@lbergelson Remember that after this is merged, before we can release we need a follow-up PR to add the CNN tools to the |
@droazen Good point, thank you. |
@lbergelson I've added a paragraph to the tool docs providing attribution to the original NVIDIA authors and collaborators. |
@lbergelson I think this is ready to go in, provided you're ok with the state of the jacoco build targets. |
@droazen Did you take a look at my minor wiring changes? If you don't see any issue with them then 👍 to merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droazen 👍
Minimal GATK port of nvscorevariants from https://github.com/NVIDIA-Genomics-Research/nvscorevariants
The tool runs successfully in both 1D and 2D modes, and a strict integration test passes for the 1D model. However, this PR has a number of outstanding issues that need to be resolved before it can be merged and replace the legacy CNNScoreVariants tool:
The conda environment in scripts/nvscorevariants_environment.yml needs to be incorporated into the main GATK conda environment
The integration test for the 2D model does not currently pass, despite using a much higher epsilon than the 1D test. Some of the scores differ by significant amounts vs. the CNNScoreVariants 2D output. We need to investigate why this is.
There is currently no training tool to train a new model, like there is for the legacy CNN tool.
@samuelklee and @mwalker174 , could you please comment on what it would take to incorporate the
scripts/nvscorevariants_environment.yml
conda environment into the main GATK conda environment, assuming we are free to remove/retire the CNN tool?@lbergelson and @zamirai, please do a general code review when you get a chance.