Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<INS> sequence missing in ALT field and represented as C<INS>, T<INS>, A<INS>, G<INS> #501

Open
leone93 opened this issue Jul 31, 2024 · 2 comments
Assignees
Milestone

Comments

@leone93
Copy link

leone93 commented Jul 31, 2024

As the title says. Running sniffles 2.4 with this setting on my sample:

sniffles -t "$thr" --minsupport 3 --max-del-seq-len 1000000 --reference "$REF.fasta" --input "$ALIGN_DIR/$i/$i.sorted.bam" \
--output-rnames --sample-id "$i" --vcf "$SV_DIR/samples/$i/$i.$REF_NAME.sniffles.vcf.gz" --snf "$SV_DIR/samples/$i/$i.$REF_NAME.sniffles.snf"
echo -e "$SV_DIR/samples/$i/$i.$REF_NAME.sniffles.snf\t$i" >> "$SV_DIR/samples_list_4_multisample.tsv"

produces files with this value in the ALT field of many INS. Moreover, this error is propagated to to the multisample files created with:

sniffles --threads "$thr" \
--reference "$REF.fasta" \
--input "$SV_DIR/samples_list_4_multisample.tsv" \
--vcf multisample.$REF_NAME.sniffles.vcf.gz

More details. It does not depend on SVLEN (it is random from that point of view).
This error is not present on sniffles 2.2 (same settings), although I have some INS also there. Also, the quantity of INS in the multisample files is way lower in sniffles 2.2 compare to 2.4 on the same dataset (20 entries vs. 420).
Is there a way to have the proper sequence on the alt field for INS?

@hermannromanek
Copy link
Collaborator

Hi @leone93

I'm looking into this issue - is it possible for you to share a couple of occurences from the generated VCF? This would help a lot in identifying whats going on.

As for the quantity of inserts, this is most likely a consequence of the newer version of sniffles also doing alignments, and possible separating calls that are sufficiently different. This is controlled by the merge param --combine-pctseq which defaults to 0.7, i.e. only calls whose sequence are above 70% similarity are reported as a single variant.

This is a new feature - did you check the variants to see if those get reported wrongly? Setting the mentioned parameter to 0 should emulate the 2.2 behaviour (i.e., no alignment checks being done on variants).

Thanks,
Hermann

@leone93
Copy link
Author

leone93 commented Aug 20, 2024

Hi @hermannromanek and @fritzsedlazeck !
Sorry for the late response!
here you are a couple of these INS lines from a single sample vcf:
Chr1 41126819 Sniffles2.INS.4E8S0 T T<INS> 60 PASS IMPRECISE;SVTYPE=INS;SVLEN=6432;END=41126819;SUPPORT=34;RNAMES=m64313e_220704_023609/102760552/ccs,m64313e_220704_023609/96338895/ccs,m64313e_220704_023609/30867756/ccs,m64313e_220704_023609/151258310/ccs,m64313e_220704_023609/113640328/ccs,m64313e_220704_023609/179110627/ccs,m64313e_220704_023609/165546595/ccs,m64313e_220704_023609/27789126/ccs,m64313e_220704_023609/119799858/ccs,m64313e_220704_023609/121766126/ccs,m64313e_220704_023609/24578579/ccs,m64313e_220704_023609/61540535/ccs,m64313e_220704_023609/154601259/ccs,m64313e_220704_023609/67109779/ccs,m64313e_220704_023609/78250968/ccs,m64313e_220704_023609/23791966/ccs,m64313e_220704_023609/92146483/ccs,m64313e_220704_023609/25364332/ccs,m64313e_220704_023609/170985260/ccs,m64313e_220704_023609/160367877/ccs,m64313e_220704_023609/3541029/ccs,m64313e_220704_023609/6817328/ccs,m64313e_220704_023609/149949202/ccs,m64313e_220704_023609/42730503/ccs,m64313e_220704_023609/111085705/ccs,m64313e_220704_023609/56427088/ccs,m64313e_220704_023609/136971812/ccs,m64313e_220704_023609/104465954/ccs,m64313e_220704_023609/27134070/ccs,m64313e_220704_023609/68616675/ccs,m64313e_220704_023609/67240813/ccs,m64313e_220704_023609/27394082/ccs,m64313e_220704_023609/30213806/ccs,m64313e_220704_023609/3148415/ccs;COVERAGE=18,1,1,1,24;STRAND=+;AF=1.000;STDEV_LEN=38.532;STDEV_POS=0.000;SUPPORT_LONG=27 GT:GQ:DR:DV 1/1:60:0:85 Chr4 8607072 Sniffles2.INS.184S3 A A<INS> 45 PASS PRECISE;SVTYPE=INS;SVLEN=63;END=8607072;SUPPORT=5;RNAMES=m64313e_220704_023609/173344440/ccs,m64313e_220704_023609/99287661/ccs,m64313e_220704_023609/170591313/ccs,m64313e_220704_023609/14877803/ccs,m64313e_220704_023609/117442152/ccs;COVERAGE=22,21,21,21,16;STRAND=+-;AF=0.238;STDEV_LEN=0.000;STDEV_POS=0.000;SUPPORT_LONG=0 GT:GQ:DR:DV 0/1:5:16:5 Chr4 31659553 Sniffles2.INS.3E7S3 T T<INS> 55 GT PRECISE;SVTYPE=INS;SVLEN=163;END=31659553;SUPPORT=3;RNAMES=m64313e_220704_023609/132055311/ccs,m64313e_220704_023609/147130830/ccs,m64313e_220704_023609/45484245/ccs;COVERAGE=14,16,16,16,16;STRAND=+-;AF=0.188;STDEV_LEN=0.577;STDEV_POS=0.000;SUPPORT_LONG=0 GT:GQ:DR:DV 0/0:6:13:3 Chr5 16387275 Sniffles2.INS.20CS4 T T<INS> 60 PASS PRECISE;SVTYPE=INS;SVLEN=1276;END=16387275;SUPPORT=5;RNAMES=m64313e_220704_023609/114034422/ccs,m64313e_220704_023609/74908093/ccs,m64313e_220704_023609/134154239/ccs,m64313e_220704_023609/19267721/ccs,m64313e_220704_023609/174262772/ccs;COVERAGE=11,12,12,13,29;STRAND=+-;AF=0.417;STDEV_LEN=3.000;STDEV_POS=0.000;SUPPORT_LONG=0 GT:GQ:DR:DV 0/1:30:7:5 Chr6 3652857 Sniffles2.INS.5BS5 A A<INS> 60 PASS PRECISE;SVTYPE=INS;SVLEN=5839;END=3652857;SUPPORT=4;RNAMES=m64313e_220704_023609/112591763/ccs,m64313e_220704_023609/10094382/ccs,m64313e_220704_023609/5768041/ccs,m64313e_220704_023609/39780805/ccs;COVERAGE=20,1,1,1,11;STRAND=+;AF=1.000;STDEV_LEN=0;STDEV_POS=0;SUPPORT_LONG=3 GT:GQ:DR:DV 1/1:27:0:10 Chr6 11141004 Sniffles2.INS.180S5 T T<INS> 60 PASS PRECISE;SVTYPE=INS;SVLEN=7472;END=11141004;SUPPORT=22;RNAMES=m64313e_220704_023609/17826740/ccs,m64313e_220704_023609/76547106/ccs,m64313e_220704_023609/85984472/ccs,m64313e_220704_023609/11667759/ccs,m64313e_220704_023609/134283470/ccs,m64313e_220704_023609/170591116/ccs,m64313e_220704_023609/23922660/ccs,m64313e_220704_023609/148966122/ccs,m64313e_220704_023609/170001124/ccs,m64313e_220704_023609/75695761/ccs,m64313e_220704_023609/84740071/ccs,m64313e_220704_023609/66913999/ccs,m64313e_220704_023609/8717987/ccs,m64313e_220704_023609/46268658/ccs,m64313e_220704_023609/72417931/ccs,m64313e_220704_023609/55249368/ccs,m64313e_220704_023609/60360010/ccs,m64313e_220704_023609/115018033/ccs,m64313e_220704_023609/22151892/ccs,m64313e_220704_023609/136382408/ccs,m64313e_220704_023609/16843737/ccs,m64313e_220704_023609/123994259/ccs;COVERAGE=19,2,2,2,8;STRAND=+-;AF=1.000;STDEV_LEN=2.828;STDEV_POS=0.000;SUPPORT_LONG=18 GT:GQ:DR:DV 1/1:60:0:58 Chr7 22465817 Sniffles2.INS.26CS6 A A<INS> 49 GT PRECISE;SVTYPE=INS;SVLEN=266;END=22465817;SUPPORT=4;RNAMES=m64313e_220704_023609/13895232/ccs,m64313e_220704_023609/11012018/ccs,m64313e_220704_023609/88342596/ccs,m64313e_220704_023609/44894366/ccs;COVERAGE=51,50,50,50,45;STRAND=+-;AF=0.080;STDEV_LEN=0.000;STDEV_POS=0.000;SUPPORT_LONG=0 GT:GQ:DR:DV 0/0:60:46:4 Chr8 6662224 Sniffles2.INS.A9S7 A A<INS> 28 GT IMPRECISE;SVTYPE=INS;SVLEN=445;END=6662224;SUPPORT=7;RNAMES=m64313e_220704_023609/31916765/ccs,m64313e_220704_023609/179505146/ccs,m64313e_220704_023609/137628233/ccs,m64313e_220704_023609/24313897/ccs,m64313e_220704_023609/37685523/ccs,m64313e_220704_023609/55378596/ccs,m64313e_220704_023609/37880543/ccs;COVERAGE=50,50,50,57,58;STRAND=+-;AF=0.140;STDEV_LEN=40.696;STDEV_POS=49.641;SUPPORT_LONG=0 GT:GQ:DR:DV 0/0:49:43:7 Chr11 12181444 Sniffles2.INS.17FSA A A<INS> 60 PASS IMPRECISE;SVTYPE=INS;SVLEN=3640;END=12181444;SUPPORT=5;RNAMES=m64313e_220704_023609/79301286/ccs,m64313e_220704_023609/53478128/ccs,m64313e_220704_023609/79955941/ccs,m64313e_220704_023609/88016459/ccs,m64313e_220704_023609/139329790/ccs;COVERAGE=67,76,76,77,91;STRAND=+-;AF=0.145;STDEV_LEN=27.465;STDEV_POS=0.000;SUPPORT_LONG=2 GT:GQ:DR:DV 0/0:60:65:11 Chr11 27841081 Sniffles2.INS.480SA T T<INS> 60 PASS PRECISE;SVTYPE=INS;SVLEN=394;END=27841081;SUPPORT=13;RNAMES=m64313e_220704_023609/93586025/ccs,m64313e_220704_023609/4326981/ccs,m64313e_220704_023609/123404550/ccs,m64313e_220704_023609/131007122/ccs,m64313e_220704_023609/69796052/ccs,m64313e_220704_023609/53674667/ccs,m64313e_220704_023609/19597916/ccs,m64313e_220704_023609/91620639/ccs,m64313e_220704_023609/91031592/ccs,m64313e_220704_023609/111217044/ccs,m64313e_220704_023609/107806724/ccs,m64313e_220704_023609/71764623/ccs,m64313e_220704_023609/40501307/ccs;COVERAGE=30,30,30,30,34;STRAND=+-;AF=0.433;STDEV_LEN=0.000;STDEV_POS=0.000;SUPPORT_LONG=0 GT:GQ:DR:DV 0/1:60:17:13 Chr12 8902607 Sniffles2.INS.10ESB T T<INS> 60 PASS PRECISE;SVTYPE=INS;SVLEN=9019;END=8902607;SUPPORT=17;RNAMES=m64313e_220704_023609/90637006/ccs,m64313e_220704_023609/112984302/ccs,m64313e_220704_023609/160236139/ccs,m64313e_220704_023609/13239138/ccs,m64313e_220704_023609/105382991/ccs,m64313e_220704_023609/114426616/ccs,m64313e_220704_023609/90703249/ccs,m64313e_220704_023609/137823960/ccs,m64313e_220704_023609/13960686/ccs,m64313e_220704_023609/129827041/ccs,m64313e_220704_023609/78577958/ccs,m64313e_220704_023609/163513184/ccs,m64313e_220704_023609/43649010/ccs,m64313e_220704_023609/104203260/ccs,m64313e_220704_023609/29821465/ccs,m64313e_220704_023609/133564333/ccs,m64313e_220704_023609/32047747/ccs;COVERAGE=22,1,1,1,23;STRAND=+-;AF=1.000;STDEV_LEN=2.828;STDEV_POS=0.000;SUPPORT_LONG=15 GT:GQ:DR:DV 1/1:60:0:48 Chr12 9270541 Sniffles2.INS.113SB A A<INS> 60 PASS PRECISE;SVTYPE=INS;SVLEN=149;END=9270541;SUPPORT=6;RNAMES=m64313e_220704_023609/35523086/ccs,m64313e_220704_023609/37683345/ccs,m64313e_220704_023609/27918818/ccs,m64313e_220704_023609/56821920/ccs,m64313e_220704_023609/15206024/ccs,m64313e_220704_023609/127600358/ccs;COVERAGE=20,15,15,15,23;STRAND=+-;AF=0.400;STDEV_LEN=0.500;STDEV_POS=22.500;SUPPORT_LONG=0 GT:GQ:DR:DV 0/1:34:9:6 Chr12 11791587 Sniffles2.INS.18FSB A A<INS> 60 GT PRECISE;SVTYPE=INS;SVLEN=142;END=11791587;SUPPORT=3;RNAMES=m64313e_220704_023609/11929890/ccs,m64313e_220704_023609/3345120/ccs,m64313e_220704_023609/655939/ccs;COVERAGE=17,14,14,14,16;STRAND=+-;AF=0.214;STDEV_LEN=0.000;STDEV_POS=0.000;SUPPORT_LONG=0 GT:GQ:DR:DV 0/0:0:11:3 Chr12 14771624 Sniffles2.INS.274SB T T<INS> 60 PASS PRECISE;SVTYPE=INS;SVLEN=4084;END=14771624;SUPPORT=4;RNAMES=m64313e_220704_023609/27592328/ccs,m64313e_220704_023609/153356836/ccs,m64313e_220704_023609/173081666/ccs,m64313e_220704_023609/84084865/ccs;COVERAGE=10,10,10,11,20;STRAND=+-;AF=0.900;STDEV_LEN=0.000;STDEV_POS=0.000;SUPPORT_LONG=2 GT:GQ:DR:DV 1/1:15:1:9

For the multisample, I understand the --combine-pctseq paramter (it's the same of truvari), and maybe I will down it to 0.60 for my dataset.
Another question about that: did you also compare the length for merging?
I notice that even with a low limit of 50 bp for calling events, in the multisample files, some samples present a genotype, maybe in bigger variants, even though that sample genotype is below the minimum size.
To make it more clear. In the multisample vcf, I have an INS of 60 bp that comes from a population of samples. Checking every sample in the aligned bam, I noticed 2 or 3 samples (out of 184) show that is true that there is an INS there, but that is below 50 bp (around 45-47) for these samples (and confirming that this event is not present in the single samples vcf of these samples).
I think it is correct to say that there is an event there, but I'm not sure if it is grouped with the 60 bp event. Looking at in length, it is 30-35% more. How does sniffles merging logic work? Thanks for the clarification
Leo

@hermannromanek hermannromanek added this to the 2.5 milestone Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants