Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

position mistake of warning?:[warning] found non ACGT ALT character 'N', encoding as 'T' for position: 1547942 #23

Closed
liserjrqlxue opened this issue Jun 17, 2022 · 1 comment

Comments

@liserjrqlxue
Copy link

I use echtvar to encode dbSNP RS ID and got may warns like [warning] found non ACGT ALT character 'N', encoding as 'T' for position: 1547942.
However, I can't found any 'N' at position 1547942.
But before I post an issue, I found that there is a N at position 1547943. This is a bit confusing.

1       1547921 .       CTTTTCTTTTTTTTTTTTTGT   C       .       .       RS=1298430233
1       1547941 .       T       TTC     .       .       RS=1274674348
1       1547942 .       T       TC      .       .       RS=67112328
1       1547942 .       T       C       .       .       RS=140478415
1       1547943 .       T       C       .       .       RS=367618803
1       1547943 .       T       TTN     .       .       RS=1553137706

reproduction

command

echtvar encode debug.zip dbsnp.json debug.vcf.gz

log

[echtvar] adding VCF:debug.vcf.gz
[echtvar] on chromosome "1"
[warning] found non ACGT ALT character 'N', encoding as 'T' for position: 1547942
[echtvar] wrote 28 total variants and 8 long variants (28.57%)

input

debug.zip

dbsnp.json

[
        {"field": "RS",  "alias": "rsID"    }
]

debug.vcf.gz

delete unused contig

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20200501
##source=dbSNP
##dbSNP_BUILD_ID=154
##reference=GRCh37.p13
##phasing=partial
##INFO=<ID=RS,Number=1,Type=Integer,Description="dbSNP ID (i.e. rs number)">
##contig=<ID=1>
##bcftools_normVersion=1.10.2+htslib-1.10.2
##bcftools_normCommand=norm -f GRCh37.p13.fasta.gz -m- -w 10000 -O v GCF_000001405.25.gz; Date=Fri Sep 11 14:51:11 2020
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       1547921 .       CTTTTCTTTTTTTTTTTTTGT   C       .       .       RS=1298430233
1       1547926 .       CTTTTTTTTTT     C       .       .       RS=375755672
1       1547926 .       CTTTTTTT        C       .       .       RS=375755672
1       1547931 .       TTTTTTTTTG      T       .       .       RS=1332647484
1       1547932 .       T       C       .       .       RS=1570432881
1       1547934 .       T       G       .       .       RS=1056143574
1       1547935 .       T       G       .       .       RS=1449414995
1       1547936 .       TTTTG   T       .       .       RS=1305172649
1       1547937 .       T       G       .       .       RS=1375323149
1       1547938 .       T       G       .       .       RS=570290048
1       1547938 .       TTG     T       .       .       RS=1307834628
1       1547939 .       T       C       .       .       RS=140292274
1       1547939 .       T       G       .       .       RS=140292274
1       1547939 .       TG      T       .       .       RS=1285958813
1       1547940 .       G       T       .       .       RS=148775059
1       1547940 .       G       GT      .       .       RS=201873628
1       1547940 .       G       GTT     .       .       RS=201873628
1       1547940 .       G       GTTTTTTT        .       .       RS=201873628
1       1547940 .       G       GTGT    .       .       RS=1557529600
1       1547940 .       G       GTTC    .       .       RS=1557529612
1       1547941 .       T       TTC     .       .       RS=1274674348
1       1547942 .       T       TC      .       .       RS=67112328
1       1547942 .       T       C       .       .       RS=140478415
1       1547943 .       T       C       .       .       RS=367618803
1       1547943 .       T       TTN     .       .       RS=1553137706
1       1547948 .       T       G       .       .       RS=6677572
1       1547948 .       T       TG      .       .       RS=1553137708
1       1547949 .       G       T       .       .       RS=372045450
@brentp
Copy link
Owner

brentp commented Jun 17, 2022

Hi, here is your problematic variant:

1       1547943 .       T       TTN     .       .       RS=1553137706

note the N in the ALT field. echtvar is reporting the 0-based position. I'll update it to report the 1-based position so there is less confusion.

@brentp brentp closed this as completed in 035d170 Jun 17, 2022
brentp added a commit that referenced this issue Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants