Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to VcfComparator #8973

Merged
merged 2 commits into from
Sep 27, 2024
Merged

Updates to VcfComparator #8973

merged 2 commits into from
Sep 27, 2024

Conversation

meganshand
Copy link
Contributor

These changes add arguments specific to the update from GATK 4.5.0.0 -> 4.6.0.0 that special case sites that were flagged previously in the WARP tests. Most of the sites that can now be skipped are based on the no call changes that were expected with this update to JointCalling and ReblockGVCFs. There are also some small changes to HaplotypeCaller at low quality sites that are then dropped by ReblockGVCFs.

Additionally there were some expected changes to the Ultima pipelines in HaplotypeCaller and JointCalling which can now be skipped by the VcfComparator tool.

Finally if AD is 0 for non-ref reads (which can happen with DRAGEN input), then AS_QD has jitter added which is now accounted for.

changes for gq0 comparison

more changes

some vcfs have no gqs

clean up

change for NPE

clean up

whitespace
Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Some minor suggestions.

private boolean ignoreGq0 = false;

@Argument(fullName = "ignore-multi-allelics", optional = true, doc="Ignore sites where the AC length in the actual matches the actual number of alleles, but doesn't match the expected VC.")
private boolean ignoreSomeMultiAllelics = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Should this be ignoreMultiAllelics to be consistent with the fullName (or the other way around?).

@@ -530,7 +590,7 @@ private void checkAttributes(final Map<String, Object> actual, final Map<String,
throw makeVariantExceptionFromDifference(key, Double.toString(actualPerAlleleValue), Double.toString(expectedPerAlleleValue));
}
}
if (key.contains("AS_") && key.contains("RankSum")) {
if (key.contains("AS_") && key.contains("RankSum") && !key.contains("RAW")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

Suggested change
if (key.contains("AS_") && key.contains("RankSum") && !key.contains("RAW")) {
if (key.startsWith("AS_") && key.contains("RankSum") && !key.contains("RAW")) {

is a little more correct?

@@ -546,11 +606,25 @@ private void checkAttributes(final Map<String, Object> actual, final Map<String,
logger.warn("GATK version-specific NaN versus empty AS_RAW annotation discrepancy");
}
}
} else if (key.contains("AS_") && key.contains("RAW") && key.contains("RankSum")) {
if (((String) actualList.get(i)).isEmpty() && ((String) expectedList.get(i)).isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about using startsWith here.

@@ -515,7 +568,14 @@ private void checkAttributes(final Map<String, Object> actual, final Map<String,
}
}
//we've already gotten rid of stars... I think -- do we have to sort the alts? Maybe... I should be using my own AlleleSpecificAnnotationData!!!
final int iterationEnd = ignoreNonRefData && actualAlts.contains(Allele.NON_REF_ALLELE) ? expectedAlts.size()-1 : expectedAlts.size(); //exclusive
int iterationEnd = ignoreNonRefData && actualAlts.contains(Allele.NON_REF_ALLELE) ? expectedAlts.size()-1 : expectedAlts.size(); //exclusive
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could keep the final here by doing something like:
final iterationEnd = if (key.equals(...) {
3
} else if (key.equals(...) {
5
} else {
...
}

if (key.equals(GATKVCFConstants.RAW_GENOTYPE_COUNT_KEY)) {
iterationEnd = 3;
}
if (key.equals(GATKVCFConstants.RAW_MAPPING_QUALITY_WITH_DEPTH_KEY)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if ?

@meganshand meganshand merged commit 2e459a5 into master Sep 27, 2024
20 checks passed
@meganshand meganshand deleted the ms_vcf_comparator_updates branch September 27, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants