Port of nvscorevariants into GATK, with a basic tool frontend #8004

droazen · 2022-08-26T18:52:17Z

Minimal GATK port of nvscorevariants from https://github.com/NVIDIA-Genomics-Research/nvscorevariants

The tool runs successfully in both 1D and 2D modes, and a strict integration test passes for the 1D model. However, this PR has a number of outstanding issues that need to be resolved before it can be merged and replace the legacy CNNScoreVariants tool:

The conda environment in scripts/nvscorevariants_environment.yml needs to be incorporated into the main GATK conda environment
The integration test for the 2D model does not currently pass, despite using a much higher epsilon than the 1D test. Some of the scores differ by significant amounts vs. the CNNScoreVariants 2D output. We need to investigate why this is.
There is currently no training tool to train a new model, like there is for the legacy CNN tool.

@samuelklee and @mwalker174 , could you please comment on what it would take to incorporate the scripts/nvscorevariants_environment.yml conda environment into the main GATK conda environment, assuming we are free to remove/retire the CNN tool?

@lbergelson and @zamirai, please do a general code review when you get a chance.

gatk-bot · 2022-08-26T19:13:25Z

Github actions tests reported job failures from actions build 2935907552
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	11	2935907552.11	logs
cloud	8	2935907552.10	logs
unit	11	2935907552.13	logs
integration	11	2935907552.12	logs
conda	8	2935907552.3	logs
unit	8	2935907552.1	logs
variantcalling	8	2935907552.2	logs
integration	8	2935907552.0	logs

samuelklee · 2022-08-26T19:38:17Z

Thanks, @droazen! @asmirnov239 has been looking at PyMC3 updates for gCNV, which will help unlock the conda environment. I understand he has a working branch, but needs to do more testing—perhaps he can comment further?

zamirai · 2022-08-29T13:56:37Z

Thanks @droazen! What data are you using to test the 2D model? And can we have access to your verification method?

gatk-bot · 2022-09-06T17:49:53Z

Github actions tests reported job failures from actions build 3002176541
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	11	3002176541.11	logs
cloud	8	3002176541.10	logs
unit	11	3002176541.13	logs
integration	11	3002176541.12	logs
unit	8	3002176541.1	logs
integration	8	3002176541.0	logs
variantcalling	8	3002176541.2	logs
conda	8	3002176541.3	logs

gatk-bot · 2022-09-20T19:20:27Z

Github actions tests reported job failures from actions build 3092731818
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	8	3092731818.10	logs
cloud	11	3092731818.11	logs
unit	11	3092731818.13	logs
integration	11	3092731818.12	logs
conda	8	3092731818.3	logs
unit	8	3092731818.1	logs
integration	8	3092731818.0	logs
variantcalling	8	3092731818.2	logs

droazen · 2022-09-20T19:37:21Z

@zamirai I've incorporated your patch from https://github.com/NVIDIA-Genomics-Research/nvscorevariants/commit/937ffafb78b0f3e7df9b1edc3b08d11e3ebee35a into this PR. With this change, the 2D tests now pass, even when I reduce the epsilon to 0.01. Thanks for the fix!

@asmirnov239 is now working on merging the new conda environment into the GATK conda environment and making the necessary updates to existing tools. This will likely require at least another few weeks.

gatk-bot · 2022-09-20T19:50:09Z

Github actions tests reported job failures from actions build 3092905417
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	8	3092905417.10	logs
cloud	11	3092905417.11	logs
unit	11	3092905417.13	logs
integration	11	3092905417.12	logs
unit	8	3092905417.1	logs
conda	8	3092905417.3	logs
variantcalling	8	3092905417.2	logs
integration	8	3092905417.0	logs

droazen · 2022-10-20T16:39:27Z

Rebased onto latest master

gatk-bot · 2022-10-20T16:55:16Z

Github actions tests reported job failures from actions build 3291375153
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	8	3291375153.10	logs
cloud	11	3291375153.11	logs
unit	11	3291375153.13	logs
integration	11	3291375153.12	logs
unit	8	3291375153.1	logs
conda	8	3291375153.3	logs
variantcalling	8	3291375153.2	logs
integration	8	3291375153.0	logs

gatk-bot · 2022-10-21T20:51:45Z

Github actions tests reported job failures from actions build 3300297321
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	8	3300297321.10	logs
unit	11	3300297321.13	logs
cloud	11	3300297321.11	logs
conda	8	3300297321.3	logs
integration	11	3300297321.12	logs
unit	8	3300297321.1	logs
variantcalling	8	3300297321.2	logs
integration	8	3300297321.0	logs

gatk-bot · 2022-10-21T20:56:41Z

Github actions tests reported job failures from actions build 3300316784
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	8	3300316784.10	logs
cloud	11	3300316784.11	logs
unit	11	3300316784.13	logs
integration	11	3300316784.12	logs
conda	8	3300316784.3	logs
unit	8	3300316784.1	logs
variantcalling	8	3300316784.2	logs
integration	8	3300316784.0	logs

Port the patch from NVIDIA-Genomics-Research/nvscorevariants@937ffaf

gatk-bot · 2024-09-27T19:11:42Z

Github actions tests reported job failures from actions build 11076165405
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	17.0.6+10	11076165405.10	logs
unit	17.0.6+10	11076165405.12	logs
integration	17.0.6+10	11076165405.11	logs
unit	17.0.6+10	11076165405.1	logs
conda	17.0.6+10	11076165405.3	logs
variantcalling	17.0.6+10	11076165405.2	logs
integration	17.0.6+10	11076165405.0	logs

gatk-bot · 2024-10-08T23:07:11Z

Github actions tests reported job failures from actions build 11244663091
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
conda	17.0.6+10	11244663091.3	logs

gatk-bot · 2024-10-08T23:58:43Z

Github actions tests reported job failures from actions build 11245220382
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
conda	17.0.6+10	11245220382.3	logs

droazen · 2024-10-09T15:30:14Z

@lbergelson @KevinCLydon This branch now passes all tests, is rebased onto the latest master, and is (finally) using the official GATK Python environment rather than the custom NVIDIA-provided one. I had to add two additional Python dependencies to our environment, and make some small modifications to the Python code to account for a newer version of pytorch-lightning that was required.

The final outstanding issue in this PR is that I had to temporarily comment out the Jacoco coverage report code in our build.gradle and dockertest.gradle files, due to a bizarre problem where Jacoco was attempting to read/parse the new Pytorch model files added in this branch. This will have to be resolved before we can merge (or we might have to permanently disable Jacoco if it can't be...).

After this is merged, there will have to be a second PR that adds the CNN tools to the DeprecatedToolsRegistry, and actually removes the legacy tools. When we do this, we need to be careful not to remove the expected CNN output files used by the new NVScoreVariants integration tests.

KevinCLydon

I haven't looked through all the code, but I did find some stuff related to what we talked about earlier today

KevinCLydon · 2024-10-09T20:10:32Z

src/main/resources/org/broadinstitute/hellbender/tools/walkers/vqsr/nvscorevariants.py

+    trainer = pl.Trainer(gradient_clip_val=1.0)
+
+    test_dataset = ReferenceDataset(tensor_reader)
+    test_loader = DataLoader(test_dataset, batch_size=64)


There's a batch size argument defined in the argument parser, which I would assume would be used here, but it doesn't look like it's wired up.

Is it used for something else instead? Lets wire it through.

KevinCLydon · 2024-10-09T20:14:49Z

src/main/resources/org/broadinstitute/hellbender/tools/walkers/vqsr/nvscorevariants.py

+    parser.add_argument('--seed', type=int, default=724, help='Seed to initialize the random number generator')
+    parser.add_argument('--tmp-file', default='tmp.txt', help='The temporary VCF-like file where variants scores will be written')
+    parser.add_argument('--output-file', required=True, help='Output VCF file')
+    parser.add_argument('--gpus', type=int, nargs='+', help='Number of GPUs (int) or which GPUs (list)')


Okay, so I figured this one out. The gpus arg was deprecated in lightning 1.7 and removed in 2.0. Best I can tell from reading the lightning docs, the way this information is now meant to be supplied to the Trainer object is using two arguments: accelerator for picking the type of processor (cpu, gpu, etc.), and devices for specifying how many to use.

Probably worth wiring this up to the Java frontend, then!

Ah, makes sense. Should we expose that? If you don't specify but you have a GPU available what does it do by default? We don't want it to ignore available compute.

KevinCLydon · 2024-10-09T20:27:58Z

src/main/resources/org/broadinstitute/hellbender/tools/walkers/vqsr/nvscorevariants.py

+    else:
+        sys.exit('Unknown tensor type!')
+    model = get_model(args, model_file)
+    trainer = pl.Trainer(gradient_clip_val=1.0)


I think I figured out the add_argparse_args thing. If I understand the docs correctly, it's basically there to enable you to skip writing out an exhaustive ArgumentParser. Basically, it makes it so if you, as a person running this tool, use some args that aren't explicitly added to the ArgumentParser, but they are in the list of args that can be passed to a Trainer, lightning says "Oh, I recognize these," and then handles them as if you'd manually defined them. I think now, again if I'm reading the docs correctly, that's the default behavior for the ArgumentParser, but we might have to explicitly pass them to the trainer? That doc section I linked doesn't include an example that makes that super clear.

This comment implies that perhaps we should pass **vars(args) in to the Trainer here, but only if we actually want to expose any actual Trainer arguments as CLI arguments: https://lightning.ai/forums/t/how-to-combine-ptl-arguments-with-argumentparser/2440/2

Oh, that's kind of neat. Not how GATK arg parsing works. So it would be hard to match that behavior in the GATK front end. I guess we could have an "--additional args" that just get passed through?

In the new framework (as that comment explains), you need to explicitly add any arguments you want to expose to the Python arg parser.

droazen · 2024-10-10T13:05:18Z

src/main/python/org/broadinstitute/hellbender/scorevariants/tests/test_dataset.py

@@ -0,0 +1,200 @@
+from unittest import TestCase


@lbergelson @KevinCLydon There are a number of Python-based unit tests under src/main/python/org/broadinstitute/hellbender/scorevariants/tests -- ideally we should find a way to run these as well, perhaps in NVScoreVariantsIntegrationTest via a PythonScriptExecutor.

lbergelson · 2024-10-14T18:37:40Z

@KevinCLydon I addressed the comments that we had here. I didn't get the python unit test running yet though. (or fix jaccoco)

gatk-bot · 2024-10-14T19:19:49Z

Github actions tests reported job failures from actions build 11333175464
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
conda	17.0.6+10	11333175464.3	logs

gatk-bot · 2024-10-14T21:50:04Z

Github actions tests reported job failures from actions build 11335027058
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
conda	17.0.6+10	11335027058.3	logs

droazen · 2024-10-15T14:08:47Z

@lbergelson Remember that after this is merged, before we can release we need a follow-up PR to add the CNN tools to the DeprecatedToolsRegistry, and actually remove the CNN tools (minus the expected CNN output files required by NVScoreVariantsIntegrationTest).

lbergelson · 2024-10-15T14:26:11Z

@droazen Good point, thank you.

droazen · 2024-10-15T17:38:32Z

@lbergelson I've added a paragraph to the tool docs providing attribution to the original NVIDIA authors and collaborators.

droazen · 2024-10-15T19:04:50Z

@lbergelson I think this is ready to go in, provided you're ok with the state of the jacoco build targets.

lbergelson · 2024-10-15T20:22:30Z

@droazen Did you take a look at my minor wiring changes? If you don't see any issue with them then 👍 to merge!

lbergelson

@droazen 👍

…upts your data.

droazen assigned lbergelson, samuelklee and mwalker174 and unassigned lbergelson, mwalker174 and samuelklee Aug 26, 2022

droazen requested review from lbergelson, samuelklee and mwalker174 August 26, 2022 18:53

droazen self-assigned this Aug 26, 2022

droazen requested a review from asmirnov239 August 26, 2022 19:50

droazen force-pushed the dr_nvscorevariants branch from d7470f1 to 643fa70 Compare October 20, 2022 16:38

droazen force-pushed the dr_nvscorevariants branch from 55d88f8 to 643fa70 Compare October 21, 2022 20:39

droazen and others added 4 commits September 27, 2024 14:53

Port of nvscorevariants into GATK, with a basic tool frontend

ee25ca2

Fixing and improving test

ae44f57

Incorporate 2D model fix from NVIDIA

a203d87

Port the patch from NVIDIA-Genomics-Research/nvscorevariants@937ffaf

Reduce epsilon for 2D test to 0.01

22902a1

droazen force-pushed the dr_nvscorevariants branch from 643fa70 to 22902a1 Compare September 27, 2024 18:53

switch to lightning_fabric.utilities.seed.seed_everything

4a035f0

droazen added 2 commits October 8, 2024 23:37

Remove obsolete call to Trainer.from_argparse_args

55398eb

Cleanup & documentation

4f114fb

droazen changed the title ~~(Do not merge) Port of nvscorevariants into GATK, with a basic tool frontend~~ Port of nvscorevariants into GATK, with a basic tool frontend Oct 9, 2024

droazen requested a review from KevinCLydon October 9, 2024 15:30

droazen assigned KevinCLydon and lbergelson and unassigned droazen Oct 9, 2024

KevinCLydon reviewed Oct 9, 2024

View reviewed changes

droazen commented Oct 10, 2024

View reviewed changes

Responding to comments

2847975

turns out None isn't a valid value here

034cc80

fix

8847960

Add attribution to NVIDIA contributors in tool documentation

c8ba99f

lbergelson approved these changes Oct 15, 2024

View reviewed changes

lbergelson and others added 2 commits October 15, 2024 18:08

Unwire devices because if you set it to anything other than 1 it corr…

c18b0c3

…upts your data.

The --random-seed argument should be explicitly marked optional

9481393

droazen merged commit a377b07 into master Oct 17, 2024
20 checks passed

droazen deleted the dr_nvscorevariants branch October 17, 2024 17:48

Port of nvscorevariants into GATK, with a basic tool frontend #8004

Port of nvscorevariants into GATK, with a basic tool frontend #8004

Conversation

droazen commented Aug 26, 2022

gatk-bot commented Aug 26, 2022 • edited Loading

samuelklee commented Aug 26, 2022

zamirai commented Aug 29, 2022

gatk-bot commented Sep 6, 2022 • edited Loading

gatk-bot commented Sep 20, 2022 • edited Loading

droazen commented Sep 20, 2022

gatk-bot commented Sep 20, 2022 • edited Loading

droazen commented Oct 20, 2022

gatk-bot commented Oct 20, 2022 • edited Loading

gatk-bot commented Oct 21, 2022 • edited Loading

gatk-bot commented Oct 21, 2022 • edited Loading

gatk-bot commented Sep 27, 2024 • edited Loading

gatk-bot commented Oct 8, 2024

gatk-bot commented Oct 8, 2024

droazen commented Oct 9, 2024 • edited Loading

KevinCLydon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droazen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droazen Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

lbergelson commented Oct 14, 2024

gatk-bot commented Oct 14, 2024

gatk-bot commented Oct 14, 2024

droazen commented Oct 15, 2024

lbergelson commented Oct 15, 2024

droazen commented Oct 15, 2024

droazen commented Oct 15, 2024

lbergelson commented Oct 15, 2024

lbergelson left a comment

Choose a reason for hiding this comment

gatk-bot commented Aug 26, 2022 •

edited

Loading

gatk-bot commented Sep 6, 2022 •

edited

Loading

gatk-bot commented Sep 20, 2022 •

edited

Loading

gatk-bot commented Sep 20, 2022 •

edited

Loading

gatk-bot commented Oct 20, 2022 •

edited

Loading

gatk-bot commented Oct 21, 2022 •

edited

Loading

gatk-bot commented Oct 21, 2022 •

edited

Loading

gatk-bot commented Sep 27, 2024 •

edited

Loading

droazen commented Oct 9, 2024 •

edited

Loading

droazen Oct 9, 2024 •

edited

Loading

droazen Oct 10, 2024 •

edited

Loading