-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Funcotator Exception: String index out of range #6651
Comments
The warnings the user is seeing are due to spanning deletion alleles which are currently not annotated with Funcotator. The bug here is what is causing the stack trace. It's in the protein sequence prediction code and I suspect that it has to do with the position of the variant relative to the exon/transcript boundaries. I have not been able to look at it yet, but thanks to the user posting the variants that are causing issues, it should be straight-forward to track down. |
Was this issue ever resolved, or was the problem clearly identified? I am currently experiencing this error, but any help would be appreciated. |
@twood1 This is still an open issue, but I know where in the code it's happening and what is going on. I just haven't had time to debug it. For now a workaround is to remove the variant causing the failure from your file. You can find this by looking at the variants that Funcotator outputs - the variant after the final output entry will be the one causing this failure. |
@jonn-smith Thanks for the prompt response jonn - is the code for the surrounding issue(s) open source? If so, could you point me towards the file? |
@twood1 No prob. Yup - it's all open source, but this particular part of the code may be a bit tricky to debug (which is why I haven't gotten to it yet). The issue is happening in Feel free to take a look, but this is one of my top priorities for bugs to fix next. |
So the issue you are describing is essentially completely independent from input parameters/options, minus the reference fasta and the input VCF. Is that correct? |
Correct - though it also depends on the Gencode data source which is tied to the reference. It really pulls the protein change info from the gencode transcript sequence, which is at the core of the issue. |
I have similar issue. java.lang.StringIndexOutOfBoundsException: String index out of range: -2 The annotation stop at chr11 34357581. The output also truncated after this position. chr11 34357581 . C CGGGACGTACAGCTCGACTCTGAAGACGCTGGAGGACTTGACCTTGGACTCCGGGT . At first I thought it may be due to the length of the indel, but funcotator seems working alright before that position (some of them even longer than chr11 34357581) Two weeks ago, I have another sample stop at chr 7 with |
@xmzhuo Interesting. Is this For everyone else - thanks for your patience. I'm starting to work on this issue this week so we should have a fix relatively soon (1-2 weeks). |
Hg38
…On Wed, Mar 31, 2021, 11:51 Jonn Smith ***@***.***> wrote:
@xmzhuo <https://github.com/xmzhuo> Interesting. Is this hg19 or hg38
data? I can add this to our tests.
For everyone else - thanks for your patience. I'm starting to work on this
issue this week so we should have a fix relatively soon (1-2 weeks).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6651 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADS37OVSNLHPC6EQLT4WEI3TGNAJHANCNFSM4NYY7SBQ>
.
|
Hi, everyone~ Is this problem solved now? It seems that I've encounted similiar problems. I'm using GATK4.2 and hg38 data. 11:43:25.661 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000441716.2 for variant: chr6:167976552-167976594(ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG* -> A): Variant overlaps transcript but is not completely contained within it. Funcotator cannot currently handle this case. Transcript: ENST00000441716.2 Variant: [VC Unknown @ chr6:167976552-167976594 Q. of type=INDEL alleles=[ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG*, A] attr={AS_FilterStatus=SITE, AS_SB_TABLE=[43, 26|2, 2], DP=94, ECNT=1, GERMQ=93, MBQ=[31, 20], MFRL=[288, 110], MMQ=[60, 60], MPOS=56, NALOD=1.37, NLOD=6.17, POPAF=4.6, ROQ=93, TLOD=10.97} GT=GT:AD:AF:DP:F1R2:F2R1:SB 0/1:46,4:0.07:50:14,3:10,0:28,18,2,2 0/0:23,0:0.041:23:8,0:5,0:15,8,0,0 filters= |
@daisyyr Thanks for posting your example here, this issue is still open so it has not been fixed yet. |
@jkobject This problem has to do with indels and predicted protein change sequences. I'm starting a refactor of how the predicted protein changes get created. When that's complete, this issue will be fixed. In the meantime, can you post the stack trace and share the example workspace you mention in #6289 ? |
I can, this only happens on 10 of our 2000 samples (only in WES) none of our 600 WGS seems to have the same issue. It is always on some small contig (you can see here range is 544, but all cases are small ranges like this one). Everything is the default mutect2 pipeline and params (e.g. gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta) : except the interval file: gs://ccleparams/region_file_wgs.list Here is the VCF file to annotate Here is the stacktrace:
|
@jkobject OK, thanks! |
my quickfix was to reduce the intervals to target regions of my WES (instead of using the full genome region) and give it to funcotator. Remark: The GATK mutect2 WDL does not give the default intervals to funcotator, only to mutect2. |
After running it on all my samples it actually only solved half of them... I will look into the try/catch fix |
Hi everyone! i partially solved the problem "WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr__:: due to alternate allele: ".
At the moment this works perfectly for me. If anyone has a better solution please upload it. Regards |
This request was created from a contribution made by Mark Godek on May 28, 2020 12:43 UTC.
Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360067471451-Funcotator-cannot-complete-funcotaion-for-variant-due-to-alternate-allele
--
I'm attempting to annotate germline variants after VQSR with Funcotator using GATK 4.1.4.1.
GATK command is:
gatk Funcotator \
-R ${REFERENCE_GENOME} \
-V ${OUT}/germline.filtered.vcf.gz \
-O ${OUT}/annotated.germline.vcf \
--output-file-format VCF \
--data-sources-path /mnt/data/rbueno/analysis_files/MedGenome_FamilialMPMs/Annotation_data_sources/funcotator_dataSources.v1.6.20190124s \
--ref-version hg19
I get many warnings and it terminates with a String index out of range error. Any help is appreciated.
The tail end of the output follows:
07:33:14.569 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756762-69756762 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:17.957 INFO VcfFuncotationFactory - dbSNP 9606_b150 cache hits/total: 521/453691
07:33:18.138 INFO Funcotator - Shutting down engine
[May 28, 2020 7:33:18 AM EDT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 34.35 minutes.
Runtime.totalMemory()=3822059520
java.lang.StringIndexOutOfBoundsException: String index out of range: 545
at java.lang.String.substring(String.java:1963)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.initializeForInsertion(ProteinChangeInfo.java:256)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.(ProteinChangeInfo.java:93)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.create(ProteinChangeInfo.java:371)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSequenceComparison(GencodeFuncotationFactory.java:2003)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createCodingRegionFuncotationForProteinCodingFeature(GencodeFuncotationFactory.java:1193)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createExonFuncotation(GencodeFuncotationFactory.java:1044)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:978)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:805)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:789)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:474)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:475)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:530)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:233)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:201)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:172)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:147)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:157)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:903)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:857)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
(created from Zendesk ticket #5792)
gz#5792
The text was updated successfully, but these errors were encountered: