-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental tool that aligns sequencing reads to set of haplotypes / templates #8305
Experimental tool that aligns sequencing reads to set of haplotypes / templates #8305
Conversation
* FlowPairHMM and FlowBasedAlignment (FBA), but can be easily extended. | ||
* At present, there are two output formats that can be specified using parameter --output-format: extended and concise. | ||
* The extended format contains a readxhaplotype matrix that shows alignment score of each read versus each haplotype. | ||
* Condensed format will following columns (not nessarily in this order) for each processed read: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Condensed format will following columns (not nessarily in this order) for each processed read: | |
* Condensed format contains the following columns (not nessarily in this order) for each processed read: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the columns are in any order? It looks like they're in a fixed order from the writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* | ||
* <h3>Usage examples</h3> | ||
* <pre> | ||
* java -Xms3000m -jar ~{gitc_path}/GATK_ultima.jar FlowPairHMMAlignReadsToHaplotypes \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* java -Xms3000m -jar ~{gitc_path}/GATK_ultima.jar FlowPairHMMAlignReadsToHaplotypes \ | |
* gatk FlowPairHMMAlignReadsToHaplotypes \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* <pre> | ||
* java -Xms3000m -jar ~{gitc_path}/GATK_ultima.jar FlowPairHMMAlignReadsToHaplotypes \ | ||
* -H ~{haplotype_list} -O ~{base_file_name}.matches.tsv \ | ||
* -I ~{input_bam} "--flow-use-t0-tag -E FBA \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* -I ~{input_bam} "--flow-use-t0-tag -E FBA \ | |
* -I ~{input_bam} --flow-use-t0-tag -E FBA \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* -H ~{haplotype_list} -O ~{base_file_name}.matches.tsv \ | ||
* -I ~{input_bam} "--flow-use-t0-tag -E FBA \ | ||
* --flow-fill-empty-bins-value 0.00001 --flow-probability-threshold 0.00001 \ | ||
* --flow-likelihood-optimized-comp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* --flow-likelihood-optimized-comp" | |
* --flow-likelihood-optimized-comp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
private static final Logger logger = LogManager.getLogger(FlowPairHMMAlignReadsToHaplotypes.class); | ||
|
||
@Argument(fullName = "haplotypes", shortName = "H", doc="Fasta file with haplotypes") | ||
public GATKPath haplotypesFa; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can all be public static final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java does not like that - it says that the variable may not have been initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java does not like that - it says that the variable may not have been initialized
@Argument(fullName = "output-format", doc="concise or expanded output format: " + | ||
"expanded - output full read x haplotype, concise - output for each read best haplotype and " + | ||
"score differences from the next best and the reference haplotype", optional=true) | ||
public String outputFormat = "expanded"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be an enum. Or since there's only two options you could make it a boolean with "expanded" as the default (eg. Boolean conciseOutputFormat = false
)
Same comment below for the aligner
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixed!
public FlowBasedAlignmentArgumentCollection fbargs = new FlowBasedAlignmentArgumentCollection(); | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArgumentCollection |
|
||
|
||
LikelihoodEngineArgumentCollection likelihoodArgs = new LikelihoodEngineArgumentCollection(); | ||
SAMFileHeader sequenceHeader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make these private static final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...adinstitute/hellbender/tools/walkers/haplotypecaller/FlowBasedAlignmentLikelihoodEngine.java
Outdated
Show resolved
Hide resolved
//in general reads are already trimmed to the haplotype starts and ends so diff_left <= 0 and diff_right <= 0 | ||
fbr.applyBaseClipping(Math.max(0, diffLeft), Math.max(diffRight, 0), false); | ||
} | ||
// final int haplotypeStart = processedHaplotypes.get(0).getStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be uncommented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually is not necessary in the context of the GATK
@meganshand - all fixed, PTAL |
This tool is useful for alignment of the reads to a large set of templates that are very close in sequence. In this case the alignment is close to the problem that the HaplotypeCaller is solving - of calculating the likelihood of the read compared to a set of close haplotypes.
The tool outputs read x haplotype matrix for a set of reads and haplotypes using either FlowPairHMM or FlowBasedAlignmentEngine