- Download Files
- Introduction
- CARD Website and Antibiotic Resistance Ontology
- RGI for Genome Analysis
- RGI at the Command Line
- Microreact files
If you are doing this demo live, you can download all the files we will be viewing here: https://github.com/agmcarthur/vtec2023-amr/tree/main/downloads_for_demo
You can also download the lecture slides here: https://github.com/agmcarthur/vtec2023-amr/tree/main/lecture_slides
This module gives an introduction to prediction of antimicrobial resistome and phenotype based on comparison of genomic DNA sequencing data to reference sequence information. While there is a large diversity of reference databases and software, this tutorial is focused on the Comprehensive Antibiotic Resistance Database (CARD) for genomic AMR prediction.
There are several databases (see here for a list) which try and organise information about AMR as well as helping with interpretation of resistome results. Many of these are either specialised on a specific type of resistance gene (e.g., beta-lactamases), organism (e.g., Mycobacterium tuberculosis), or are an automated amalgamation of other databases (e.g., MEGARes). There are also many tools for detecting AMR genes each with their own strengths and weaknesses (see this paper for a non-comprehensive list of tools!).
The "Big 3" databases that are comprehensive (involving many organisms, genes, and types of resistance), regularly updated, have their own gene identification tool(s), and are carefully maintained and curated are:
- Comprehensive Antibiotic Resistance Database (CARD) with the Resistance Gene Identifier (RGI).
- National Center for Biotechnology Information's National Database of Antibiotic Resistant Organisms (NDARO) with AMRFinderPlus.
- ResFinder database with its associated ResFinder tool.
In this practical we are going to focus on CARD and the associated RGI tool because:
- The Antibiotic Resistance Ontology it is built upon is a great way to organize information about AMR.
- CARD is the most heavily used database internationally, with over 5000 citations.
- We are biased. CARD is Canadian and pretty much all the workshop faculty collaborate or are part of the group that develops CARD! See Alcock et al. 2023. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Research, 51, D690-D699.
The relationship between AMR genotype and AMR phenotype is complicated and no tools for complete prediction of phenotype from genotype exist. Instead, analyses focus on prediction or catalog of the AMR resistome - the collection of AMR genes and mutants in the sequenced sample. While BLAST and other sequence similarity tools can be used to catalog the resistance determinants in a sample via comparison to a reference sequence database, interpretation and phenotypic prediction are often the largest challenge. To start the tutorial, we will use the Comprehensive Antibiotic Resistance Database (CARD) website to examine the diversity of resistance mechanisms, how they influence bioinformatics analysis approaches, and how CARD’s Antibiotic Resistance Ontology (ARO) can provide an organizing principle for interpretation of bioinformatics results.
CARD’s website provides the ability to:
- Browse the Antibiotic Resistance Ontology (ARO) and associated knowledgebase.
- Browse the underlying AMR detection models, reference sequences, and SNP matrices.
- Download the ARO, reference sequence data, and indices in a number of formats for custom analyses.
- Perform integrated genome analysis using the Resistance Gene Identifier (RGI).
In this part of the tutorial, your instructor will walk you through the following use of the CARD website to familiarize yourself with its resources:
- What are the mechanisms of resistance described in the Antibiotic Resistance Ontology?
- Examine the NDM-1 beta-lactamase protein, it’s mechanism of action, conferred antibiotic resistance, it’s prevalence, and it’s detection model.
- Examine the AAC(6')-Iaa aminoglycoside acetyltransferase, it’s mechanism of action, conferred antibiotic resistance, it’s prevalence, and it’s detection model.
- Examine the fluoroquinolone resistant gyrB for M. tuberculosis, it’s mechanism of action, conferred antibiotic resistance, and it’s detection model.
- Examine the MexAB-OprM efflux complex with MexR mutations, it’s mechanism of action, conferred antibiotic resistance, it’s prevalence, and it’s detection model(s).
Answers:
-
- antibiotic target alteration
- antibiotic target replacement
- antibiotic target protection
- antibiotic inactivation
- antibiotic efflux
- reduced permeability to antibiotic
- resistance by absence
- modification to cell morphology
- resistance by host-dependent nutrient acquisition
- NDM-1: antibiotic inactivation; beta-lactams (penam, cephamycin, carbapenem, cephalosporin); over 40 pathogens (lots of ESKAPE pathogens) - note strong association with plasmids; protein homolog model
- AAC(6')-Iaa: antibiotic inactivation; aminogylcosides; Salmonella enterica; protein homolog model
- gyrB: antibiotic target alteration; fluoroquinolones; Mycobacterium; protein variant model
- MexAB-OprM with MexR mutations: antibiotic efflux; broad range of drug classes; looking at MexA sub-unit: Pseudomonas; efflux meta-model
As illustrated by the exercise above, the diversity of antimicrobial resistance mechanisms requires a diversity of detection algorithms and a diversity of detection limits. CARD’s Resistance Gene Identifier (RGI) currently integrates four CARD detection models: Protein Homolog Model, Protein Variant Model, rRNA Variant Model, and Protein Overexpression Model. Unlike naïve analyses, CARD detection models use curated cut-offs, currently based on BLAST/DIAMOND bitscore cut-offs. Many other available tools are based on BLASTN or BLASTP without defined cut-offs and avoid resistance by mutation entirely.
In this part of the tutorial, your instructor will walk you through the following use of CARD’s Resistome Gene Identifier with default settings “Perfect and Strict hits only”, "Exclude nudge", and "High quality/coverage":
- Resistome prediction for the multidrug resistant Acinetobacter baumannii MDR-TJ, complete genome (NC_017847).
- Resistome prediction for the plasmid isolated from Escherichia coli strain MRSN388634 plasmid (KX276657).
- Explain the difference in fluoroquinolone resistance MIC between two clinical strains of Pseudomonas aeruginosa that appear clonal based on identical MLST (Pseudomonas1.fasta, Pseudomonas2.fasta - these files can be found in this GitHub repo). Hint, look at SNPs.
Answers:
The first two examples list the predicted resistome of the analyzed genome and plasmid, while the third example illustrates that Pseudomonas2.fasta
contains an extra T83I mutation in gyrA conferring resistance to fluoroquinolones, above that provided by background efflux.
RGI is a command line tool as well, so we’ll do an analysis of the 39 E. coli genome assemblies included in the Integrated Assignment. We’ll additionally try RGI’s heat map tool to compare genomes.
Login into your course account’s working directory and make a module5 directory:
cd ~/workspace
mkdir module5
cd module5
Take a peak at the list of E. coli samples:
ls /home/ubuntu/CourseData/module5/ecoli
RGI has already been installed using Conda, list all the available software in Conda, activate RGI, and then review the RGI help screen:
conda env list
conda activate rgi
rgi -h
First we need to acquire the latest AMR reference data from the CARD website:
rgi load -h
wget https://card.mcmaster.ca/latest/data
tar -xvf data ./card.json
less card.json
rgi load --card_json ./card.json --local
ls
We don’t have time to analyze all 39 samples, so let’s analyze 1 as an example (the course GitHub repo contains an EXCEL version of the resulting ED010.txt file). When analyzing FASTA files we use the main sub-command, here with default settings “Perfect and Strict hits only”, "Exclude nudge", and "High quality/coverage":
rgi main -h
rgi main -i /home/ubuntu/workspace/CourseData/module5/ecoli/ED010.fasta -o ED010 -t contig -a DIAMOND -n 4 --local --clean
ls
less ED010.json
less ED010.txt
column -t -s $'\t' ED010.txt | less -S
Discussion Points:
Default RGI main analysis of ED010 lists 12 Perfect annotations and 39 Strict annotations. Yet, 43 annotations are efflux components common in E. coli that may or may not lead to clinical levels of AMR. Nonetheless, outside of efflux there are some antibiotic inactivation and target alteration genes, but only EC beta-lactamase is notable. This isolate is primarily resistant to fluoroquinolone, aminocoumarin, macrolide, and tetracycline antibiotics, although the acrD gene can also contribute resistance to aminoglycosides.
What if these results did not explain our observed phenotype? We might want to explore the RGI Loose hits (the course GitHub repo contains an EXCEL version of the resulting ED010_IncludeLoose.txt file), shown here with settings “Perfect, Strict, and Loose hits”, "Include nudge", and "High quality/coverage":
rgi main -h
rgi main -i /home/ubuntu/workspace/CourseData/module5/ecoli/ED010.fasta -o ED010_IncludeLoose -t contig -a DIAMOND -n 4 --local --clean --include_nudge --include_loose
ls
column -t -s $'\t' ED010_IncludeLoose.txt | less -S
Discussion Points:
An additional 11 nudged Strict annotations (possible partial genes for Escherichia coli emrE, EF-Tu mutants conferring resistance to Pulvomycin, and AcrF) and 394 Loose annotations have been added to investigate for leads that could explain the observed phenotype. Note this scenario is unlikely for clinical isolates given CARD's reference data, but is possible for environmental isolates. The multiple putative gene fragments found via the Nudge may suggest genome assembly problems.
We have pre-compiled results for all 39 samples under “Perfect and Strict hits only"", "Exclude nudge", and "High quality/coverage", so let’s try RGI’s heat map tool (pre-compiled images can be downloaded or viewed from the course GitHub repo):
ls /home/ubuntu/workspace/CourseData/module5/ecoli_json
rgi heatmap -h
rgi heatmap -i /home/ubuntu/workspace/CourseData/module5/ecoli_json -o heatmap
rgi heatmap -i /home/ubuntu/workspace/CourseData/module5/ecoli_json -o cluster_both --cluster both
rgi heatmap -i /home/ubuntu/workspace/CourseData/module5/ecoli_json -o cluster_both_frequency --frequency --cluster both
ls
Yellow represents a Perfect hit, teal represents a Strict hit, purple represents no hit.
Discussion Points:
The last analysis is the most informative, showing that many of these isolates share the same complement of efflux variants (bottom of heatmap) and several isolates share the same overall resistome. Yet most isolates are unique in their resistome, with a subset sharing TEM-1, sul1, and other higher risk genes. Placing these results in phylogenetic and epidemiological context will be helpful.
In the course Integrated Assignment you can use the following annotation file to visualize all of the RGI results in the context of Microreact visualizations: RGI microreact results plus earlier derived whole genome SNP tree.
Notes on the metadata:
- We include RGI Perfect and Strict annotations, but ignore Loose annotations
- We are ignoring all efflux results
- We are ignoring the one vancomycin resistance gene annotated as it was a false positive (i.e. not all of the genes in van clusters found)
- We did not run RGI on the reference genome
Do you think there is evidence of lateral gene transfer?