Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mykrobe calls the wrong amino acid change for first drug resistance SNP on a bottom strand gene #127

Closed
tw1nk0i opened this issue May 1, 2021 · 2 comments · Fixed by #129

Comments

@tw1nk0i
Copy link

tw1nk0i commented May 1, 2021

I ran Mykrobe on the FASTQ data for BioProject PRJEB5280 Run ERR751367 from NCBI. Mykrobe identifies a SNP conferring isoniazid resistance, katG_S315G-GCT2155167GGT. The resulting amino acid, in this case the G in S315G, is incorrect. The last part of the SNP, GCT2155167GGT, identifies the codon that was changed and the position within the genome. These nucleotides are always the top strand nucleotides. Therefore, for genes on the bottom strand, we must take the reverse complement. GCT becomes AGC, which is serine, the S in S315G. So far so good. However, the reverse complement of GGT is ACC, which is threonine, but the output S315G says the resulting amino acid is glycine. I believe the code is neglecting to use the reverse complement, so it is incorrectly translating GGT to glycine.

I have also tried running Mykrobe on .bam files that I generated by running snippy (version 3.2) on the same FASTQ data downloaded from NCBI. I see the same error, and looking manually at the bam file in IGV confirms that S315G is incorrect and should be S315T.

I suspect that this bug only affects the first drug resistant SNP listed in the json output that is in a bottom strand gene. katG_S315G-GCT2155167GGT just happens to be an isoniazid SNP, so it is near the top of the file and thus often the SNP affected.

Identical cases for katG_S315G-GCT2155167GGT occur with the following NCBI BioProject and Run pairs:
PRJEB5280, ERR779909
PRJEB10385, ERR1034759
PRJEB11653, ERR1213902

I have ~200 more identical cases. I can provide them if they would be helpful for debugging.

@martinghunt
Copy link
Member

Thanks for spotting this. Yes, it is indeed a bug. Will investigate/fix some time this week. I expect the bug applies only to the "any amino acid change" entries in the mutation catalog - eg S315X. The "X" is not getting converted to the correct amino acid, as you've described.

@martinghunt
Copy link
Member

This is now fixed on the master branch. I tested on ERR751367 and got katG_S315T-GCT2155167GGT in the resistance call for Isoniazid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants