Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug in AlleleFiltering that ignored more than a single sample #8841

Conversation

ilyasoifer
Copy link
Collaborator

@ilyasoifer ilyasoifer commented May 18, 2024

This PR fixes an Issue raised by one of our customers that allele filtering did not work correctly when variant calling was done on multiple samples.

@@ -57,14 +60,14 @@ int getAlleleLikelihoodVsInverse(final AlleleLikelihoods<GATKRead, Allele> allel

final GenotypingLikelihoods<Allele> genotypingLikelihoods = genotypesModel.calculateLikelihoods(alleleList,
genotypingData, null, 0, null);
AFCalculationResult af = afCalc.fastCalculateDiploidBasedOnGLs(genotypingLikelihoods, genotypingEngine.getPloidyModel().totalPloidy());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed also some lines that were never used

Copy link
Contributor

@meganshand meganshand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with this part of the code, but it looks like this change makes sense in that it's conservatively providing the lowest PL across all samples. Because of this lack of confidence I wanted to double check that I understand how the test here works.

for (int i = 0; i < genotypingLikelihoods.numberOfSamples(); i++) {
final int[] pls = genotypingLikelihoods.sampleLikelihoods(i).getAsPLs();
perSamplePLs.add(Math.min(pls[1] - pls[0], pls[2] - pls[0]));
logger.debug(() -> String.format("GAL:: %s: %d %d %d", allele.toString(), pls[0], pls[1], pls[2]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you have multiple samples here you might want to add the sample name to the debug logger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, good idea


AlleleFiltering alleleFiltering = new AlleleFilteringHC(hcArgs, null, genotypingEngine);
AlleleLikelihoods<GATKRead, Haplotype> filtered_lks = alleleFiltering.filterAlleles(lks, 0, new HashSet<>());
Assert.assertEquals(filtered_lks.alleles(), lks.alleles());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this checks that no alleles were filtered, but I'm confused how this tests the change that you added. Since it's taking the minimum PL across samples now (rather than the minimum PL from the first sample) then isn't it only possible to filter more alleles with the new version compared to the old version? Did this test fail with the original code before your change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment below. Yes, this test failed in the old version.
I am actually more conservative filtering the alleles now I think.(allele needs to be weak in both samples to be removed)

@ilyasoifer
Copy link
Collaborator Author

ilyasoifer commented Jun 5, 2024

@meganshand - I apologize - I may have confused the semantics.
I am using "PL" in the way they are written in the VCF (-10*log(likelihood)), so high PLs mean low quality allele.
Taking minimum between the two samples means that we keep the allele if it is sufficiently strongly supported in either sample.
Could you suggest how to clarify this?

@meganshand
Copy link
Contributor

Ah of course. Sorry about that, I flipped PLs in my head. This looks good, I don't think you need to clarify further!

@ilyasoifer
Copy link
Collaborator Author

@meganshand - addressed your comment, please take a look, thanks!

@ilyasoifer ilyasoifer merged commit ab98a5d into broadinstitute:master Jun 13, 2024
16 of 17 checks passed
@ilyasoifer ilyasoifer deleted the ilyasoifer/BIOIN-1630-multiple-samples branch June 13, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants