Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Funcotator Exception: String index out of range #6651

Open
GATKSupportTeam opened this issue Jun 8, 2020 · 21 comments
Open

Funcotator Exception: String index out of range #6651

GATKSupportTeam opened this issue Jun 8, 2020 · 21 comments
Assignees

Comments

@GATKSupportTeam
Copy link
Collaborator

This request was created from a contribution made by Mark Godek on May 28, 2020 12:43 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360067471451-Funcotator-cannot-complete-funcotaion-for-variant-due-to-alternate-allele

--

I'm attempting to annotate germline variants after VQSR with Funcotator using GATK 4.1.4.1.

GATK command is:

gatk Funcotator \
-R ${REFERENCE_GENOME} \
-V ${OUT}/germline.filtered.vcf.gz \
-O ${OUT}/annotated.germline.vcf \
--output-file-format VCF \
--data-sources-path /mnt/data/rbueno/analysis_files/MedGenome_FamilialMPMs/Annotation_data_sources/funcotator_dataSources.v1.6.20190124s \
--ref-version hg19

I get many warnings and it terminates with a String index out of range error. Any help is appreciated.

 

The tail end of the output follows:

07:33:14.569 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756762-69756762 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:17.957 INFO VcfFuncotationFactory - dbSNP 9606_b150 cache hits/total: 521/453691
07:33:18.138 INFO Funcotator - Shutting down engine
[May 28, 2020 7:33:18 AM EDT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 34.35 minutes.
Runtime.totalMemory()=3822059520
java.lang.StringIndexOutOfBoundsException: String index out of range: 545
at java.lang.String.substring(String.java:1963)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.initializeForInsertion(ProteinChangeInfo.java:256)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.(ProteinChangeInfo.java:93)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.create(ProteinChangeInfo.java:371)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSequenceComparison(GencodeFuncotationFactory.java:2003)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createCodingRegionFuncotationForProteinCodingFeature(GencodeFuncotationFactory.java:1193)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createExonFuncotation(GencodeFuncotationFactory.java:1044)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:978)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:805)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:789)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:474)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:475)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:530)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:233)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:201)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:172)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:147)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:157)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:903)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:857)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)


(created from Zendesk ticket #5792)
gz#5792

@bhanugandham bhanugandham added this to the GATK-Priority-Backlog milestone Jun 8, 2020
@jonn-smith
Copy link
Collaborator

The warnings the user is seeing are due to spanning deletion alleles which are currently not annotated with Funcotator. The bug here is what is causing the stack trace.

It's in the protein sequence prediction code and I suspect that it has to do with the position of the variant relative to the exon/transcript boundaries.

I have not been able to look at it yet, but thanks to the user posting the variants that are causing issues, it should be straight-forward to track down.

@droazen droazen removed this from the GATK-Priority-Backlog milestone Jun 22, 2020
@twood1
Copy link

twood1 commented Dec 30, 2020

Was this issue ever resolved, or was the problem clearly identified? I am currently experiencing this error, but any help would be appreciated.

@jonn-smith
Copy link
Collaborator

@twood1 This is still an open issue, but I know where in the code it's happening and what is going on. I just haven't had time to debug it. For now a workaround is to remove the variant causing the failure from your file. You can find this by looking at the variants that Funcotator outputs - the variant after the final output entry will be the one causing this failure.

@twood1
Copy link

twood1 commented Dec 31, 2020

@jonn-smith Thanks for the prompt response jonn - is the code for the surrounding issue(s) open source? If so, could you point me towards the file?

@jonn-smith
Copy link
Collaborator

@twood1 No prob. Yup - it's all open source, but this particular part of the code may be a bit tricky to debug (which is why I haven't gotten to it yet).

The issue is happening in org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo but the problem is upstream of that when I'm extracting the sequence information from the reference to create the protein change strings.

Feel free to take a look, but this is one of my top priorities for bugs to fix next.

@twood1
Copy link

twood1 commented Dec 31, 2020

So the issue you are describing is essentially completely independent from input parameters/options, minus the reference fasta and the input VCF. Is that correct?

@jonn-smith
Copy link
Collaborator

Correct - though it also depends on the Gencode data source which is tied to the reference.

It really pulls the protein change info from the gencode transcript sequence, which is at the core of the issue.

@xmzhuo
Copy link

xmzhuo commented Mar 31, 2021

I have similar issue.

java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createAndFilterGencodeFuncotationsByTranscript(GencodeFuncotationFactory.java:281)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:338)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:138)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:113)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.lambda$enqueueAndHandleVariant$0(Funcotator.java:502)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:504)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:399)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:109)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:107)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:994)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

The annotation stop at chr11 34357581. The output also truncated after this position.

chr11 34357581 . C CGGGACGTACAGCTCGACTCTGAAGACGCTGGAGGACTTGACCTTGGACTCCGGGT .
PASS DP=208;ECNT=2;NLOD=8.8;N_ART_LOD=-1.486;POP_AF=2.5e-06;P_CONTAM=2.202e-10;P_GERMLINE=-51.18;TLOD=11.12
GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:171,5:0.033:94,3:77,2:37:329
,212:60:5:false:false:0|1:34357577_C_CCAT:0.02,0.02,0.028:0.0054,0.004127,0.99 0/0:29,0:0.014:13,0:16,0:0:340,0
:0:0:false:false:0|1:34357577_C_CCAT:.:.

At first I thought it may be due to the length of the indel, but funcotator seems working alright before that position (some of them even longer than chr11 34357581)
such as
chr10 123715082 . A ATCACTGCTGCCACTCACTCGGGTCACCTGCTGCTCCACGTGGCCCAGAGCTTCTGT .
PASS DP=196;ECNT=2;NLOD=7.6;N_ART_LOD=-1.425;POP_AF=2.5e-06;P_CONTAM=1.663e-10;P_GERMLINE=-47.68;TLOD=11.24
GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:163,5:0.034:81,3:82,2:37:326
,367:60:6:false:false:0|1:123715081_A_C:0.01,0.03,0.03:0.02,0.002271,0.978 0/0:25,0:0.016:13,0:12,0:0:292,0
:0:0:false:false:0|1:123715081_A_C:.:.
chr11 707740 . C CGAAGGCCAGGAACCTGGCCTTCCCCTGGGGGCACGCAAACATGGAGGGCTGTGACACGCGACCCCCCTGGG
. PASS DP=181;ECNT=1;NLOD=7.17;N_ART_LOD=-1.413;POP_AF=2.5e-06;P_CONTAM=1.459e-05;P_GERMLINE=-33.07;TLO
D=5.79 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB 0/1:115,3:0.115:50,1:65,2:37:314
,321:60:5:false:false:0.02,0.02,0.025:0.004925,0.006118,0.989 0/0:23,0:0.046:12,0:11,0:0:288,0:0:0:false:false
:.:.

Two weeks ago, I have another sample stop at chr 7 with
java.lang.StringIndexOutOfBoundsException: String index out of range: 1383
I guess these are related.

@jonn-smith
Copy link
Collaborator

@xmzhuo Interesting. Is this hg19 or hg38 data? I can add this to our tests.

For everyone else - thanks for your patience. I'm starting to work on this issue this week so we should have a fix relatively soon (1-2 weeks).

@xmzhuo
Copy link

xmzhuo commented Mar 31, 2021 via email

@daisyyr
Copy link

daisyyr commented Jul 28, 2021

Hi, everyone~ Is this problem solved now? It seems that I've encounted similiar problems. I'm using GATK4.2 and hg38 data.

11:43:25.661 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000441716.2 for variant: chr6:167976552-167976594(ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG* -> A): Variant overlaps transcript but is not completely contained within it. Funcotator cannot currently handle this case. Transcript: ENST00000441716.2 Variant: [VC Unknown @ chr6:167976552-167976594 Q. of type=INDEL alleles=[ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG*, A] attr={AS_FilterStatus=SITE, AS_SB_TABLE=[43, 26|2, 2], DP=94, ECNT=1, GERMQ=93, MBQ=[31, 20], MFRL=[288, 110], MMQ=[60, 60], MPOS=56, NALOD=1.37, NLOD=6.17, POPAF=4.6, ROQ=93, TLOD=10.97} GT=GT:AD:AF:DP:F1R2:F2R1:SB 0/1:46,4:0.07:50:14,3:10,0:28,18,2,2 0/0:23,0:0.041:23:8,0:5,0:15,8,0,0 filters=
11:43:25.661 WARN GencodeFuncotationFactory - Creating default GencodeFuncotation on transcript ENST00000441716.2 for problem variant: chr6:167976552-167976594(ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG* -> A)
11:44:04.904 INFO ProgressMeter - chr8:677091 4.5 3000 666.0
11:45:35.226 INFO ProgressMeter - chr11:62279639 6.0 4000 665.6
11:46:54.284 INFO ProgressMeter - chr15:19905537 7.3 5000 682.4
11:48:12.767 WARN FuncotatorUtils - createAminoAcidSequence given a coding sequence of length not divisible by 3. Dropping bases from the end: 2 (size=293, ref allele: G)
11:48:16.949 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000379751.5 for variant: chr20:3786474-3786537(TGGGGCCCATCCCGGCGCGCCCCCCGCCCCGGGGCCCGGCGCCGCCGCCGCCGCCCCGGGGCGG* -> T): Cannot yet handle indels starting outside an exon and ending within an exon.
11:48:16.949 WARN GencodeFuncotationFactory - Creating default GencodeFuncotation on transcript ENST00000379751.5 for problem variant: chr20:3786474-3786537(TGGGGCCCATCCCGGCGCGCCCCCCGCCCCGGGGCCCGGCGCCGCCGCCGCCGCCCCGGGGCGG* -> T)
11:48:31.506 INFO ProgressMeter - chr21:18282114 8.9 6000 670.6
11:49:08.210 INFO ProgressMeter - chr21:18282114 9.6 6888 720.6
11:49:08.210 INFO ProgressMeter - Traversal complete. Processed 6888 total variants in 9.6 minutes.
11:49:08.210 INFO VcfFuncotationFactory - ClinVar_VCF 20180429_hg38 cache hits/total: 0/2
11:49:08.211 INFO VcfFuncotationFactory - dbSNP 9606_b151 cache hits/total: 0/4781
11:49:08.230 INFO Funcotator - Shutting down engine
[July 7, 2021 11:49:08 AM GMT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 9.72 minutes.
Runtime.totalMemory()=4879548416
Tool returned:
true

@gbrandt6
Copy link
Contributor

@daisyyr Thanks for posting your example here, this issue is still open so it has not been fixed yet.

@gbrandt6
Copy link
Contributor

gbrandt6 commented Nov 9, 2021

@xmzhuo @twood1 we have released a fix for a very similar bug in Funcotator (#6289 ). Could you test the newest GATK version 4.2.3.0 and let us know if it also solves this bug?

@jkobject
Copy link

jkobject commented Jul 7, 2022

@gbrandt6 this is the same as #6289 and as per my comment there. I still see the bug in gatk 4.2.6.1. It occurs rarely but breaks the pipelines.

@jkobject
Copy link

jkobject commented Jul 12, 2022

just to help with associating issues: here is the list of issues that seems to be talking about the same problem: #6651, #7523, #6345, #4307, #6546, #3749, #4804, #6289. Seems to exist since 2018.

@jonn-smith
Copy link
Collaborator

@jkobject This problem has to do with indels and predicted protein change sequences. I'm starting a refactor of how the predicted protein changes get created. When that's complete, this issue will be fixed.

In the meantime, can you post the stack trace and share the example workspace you mention in #6289 ?

@jkobject
Copy link

jkobject commented Jul 12, 2022

I can, this only happens on 10 of our 2000 samples (only in WES) none of our 600 WGS seems to have the same issue. It is always on some small contig (you can see here range is 544, but all cases are small ranges like this one).

Everything is the default mutect2 pipeline and params (e.g. gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta) : except the interval file: gs://ccleparams/region_file_wgs.list
GATK 4.2.6.1.

Here is the VCF file to annotate gs://ccleparams/test/CDS-2jucw0.hg38-filtered.vcf.gz

Here is the stacktrace:

....
10:53:39.044 INFO VcfFuncotationFactory - ClinVar_VCF 20180429_hg38 cache hits/total: 0/2145
10:53:39.249 INFO VcfFuncotationFactory - dbSNP 9606_b151 cache hits/total: 0/1069225
10:53:39.520 INFO Funcotator - Shutting down engine
[July 12, 2022 10:53:39 AM GMT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 115.46 minutes.
Runtime.totalMemory()=2050490368
java.lang.StringIndexOutOfBoundsException: String index out of range: 544
at java.lang.String.substring(String.java:1963)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.initializeForInsertion(ProteinChangeInfo.java:293)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.<init>(ProteinChangeInfo.java:101)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.create(ProteinChangeInfo.java:399)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSequenceComparison(GencodeFuncotationFactory.java:2054)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createCodingRegionFuncotationForProteinCodingFeature(GencodeFuncotationFactory.java:1235)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createExonFuncotation(GencodeFuncotationFactory.java:1083)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:1020)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:847)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:831)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:508)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:509)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:564)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:243)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:211)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:182)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:152)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:162)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:924)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:878)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3500m -jar /root/gatk.jar Funcotator --data-sources-path /cromwell_root/datasources_dir --ref-version hg38 --output-file-format VCF -R gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta -V gs://fc-secure-d2a2d895-a7af-4117-bdc7-652d7d268324/94e769a1-28e1-4bd7-b09f-9e47fb7d8352/omics_mutect2/14fe5685-740c-4e09-9d1a-8c8d14c0ae5b/call-mutect2/Mutect2/2de52f4f-eea0-4ec7-acc1-f47b1a2d1e6c/call-Filter/attempt-2/CDS-2jucw0.hg38-filtered.vcf.gz -O CDS-2jucw0.hg38-filtered.vcf.gz.annotated.vcf.gz -L /cromwell_root/ccleparams/region_file_wgs.list --annotation-default normal_barcode: --annotation-default tumor_barcode:NP5 --annotation-default Center:DEPMAP --annotation-default source:Unknown

@jonn-smith
Copy link
Collaborator

@jkobject OK, thanks!

@jkobject
Copy link

my quickfix was to reduce the intervals to target regions of my WES (instead of using the full genome region) and give it to funcotator. Remark: The GATK mutect2 WDL does not give the default intervals to funcotator, only to mutect2.

@jkobject
Copy link

After running it on all my samples it actually only solved half of them... I will look into the try/catch fix

@fmarce753
Copy link

Hi everyone! i partially solved the problem "WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr__:: due to alternate allele: ".
The origin of the problem is that we have complex datasets that contain more than one sample. In the set of samples, more than one alternative allele is detected, including the "
". The idea is to have one line for each variant because, apparently, Funcotator reads it properly. I applied the following commands and it worked perfectly:

  1. Normalize:
    bcftools norm -m - cohort.vcf > cohort_norm.vcf

  2. Select SNPs (I haven't tried it for indels yet)
    gatk SelectVariants -R hg38.fa -V "cohort_norm.vcf" --select-type SNP -O "cohort_snp.vcf.gz"

  3. Remove the * variants remaining:
    awk -F'\t' '$5 != "*"' cohort_snp.vcf > filtered_cohort_snp.vcf

  4. Apply Funcotator.

At the moment this works perfectly for me. If anyone has a better solution please upload it.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants