mykrobe calls the wrong amino acid change for first drug resistance SNP on a bottom strand gene #127

tw1nk0i · 2021-05-01T01:08:25Z

I ran Mykrobe on the FASTQ data for BioProject PRJEB5280 Run ERR751367 from NCBI. Mykrobe identifies a SNP conferring isoniazid resistance, katG_S315G-GCT2155167GGT. The resulting amino acid, in this case the G in S315G, is incorrect. The last part of the SNP, GCT2155167GGT, identifies the codon that was changed and the position within the genome. These nucleotides are always the top strand nucleotides. Therefore, for genes on the bottom strand, we must take the reverse complement. GCT becomes AGC, which is serine, the S in S315G. So far so good. However, the reverse complement of GGT is ACC, which is threonine, but the output S315G says the resulting amino acid is glycine. I believe the code is neglecting to use the reverse complement, so it is incorrectly translating GGT to glycine.

I have also tried running Mykrobe on .bam files that I generated by running snippy (version 3.2) on the same FASTQ data downloaded from NCBI. I see the same error, and looking manually at the bam file in IGV confirms that S315G is incorrect and should be S315T.

I suspect that this bug only affects the first drug resistant SNP listed in the json output that is in a bottom strand gene. katG_S315G-GCT2155167GGT just happens to be an isoniazid SNP, so it is near the top of the file and thus often the SNP affected.

Identical cases for katG_S315G-GCT2155167GGT occur with the following NCBI BioProject and Run pairs:
PRJEB5280, ERR779909
PRJEB10385, ERR1034759
PRJEB11653, ERR1213902

I have ~200 more identical cases. I can provide them if they would be helpful for debugging.

martinghunt · 2021-05-04T08:36:59Z

Thanks for spotting this. Yes, it is indeed a bug. Will investigate/fix some time this week. I expect the bug applies only to the "any amino acid change" entries in the mutation catalog - eg S315X. The "X" is not getting converted to the correct amino acid, as you've described.

martinghunt · 2021-05-10T17:13:54Z

This is now fixed on the master branch. I tested on ERR751367 and got katG_S315T-GCT2155167GGT in the resistance call for Isoniazid.

martinghunt mentioned this issue May 10, 2021

Fix bug reporting new amino acid X mutations on reverse strand #129

Merged

martinghunt closed this as completed in #129 May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mykrobe calls the wrong amino acid change for first drug resistance SNP on a bottom strand gene #127

mykrobe calls the wrong amino acid change for first drug resistance SNP on a bottom strand gene #127

tw1nk0i commented May 1, 2021

martinghunt commented May 4, 2021

martinghunt commented May 10, 2021

mykrobe calls the wrong amino acid change for first drug resistance SNP on a bottom strand gene #127

mykrobe calls the wrong amino acid change for first drug resistance SNP on a bottom strand gene #127

Comments

tw1nk0i commented May 1, 2021

martinghunt commented May 4, 2021

martinghunt commented May 10, 2021