Customising your inputs

Hands-free updates (sort of)

tbtAMR was designed to allow for flexible updates to the mutational catalogue and/or to the interpretative criteria for that catalogue. The codebase of tbtAMR is designed to read these files and apply the user-defined criteria to the mutations detected in the vcf file. In this way, there is no need for bioinformatician or developer to modify interpretative logic. A wet-lab or medical microbiologist can update the criteria as required. Obviously a bioinformatician is exceedingly useful in running the tool, but the technical overhead of update and maintenance of criteria that is so important to effective and appropriate reporting of DST in M. tuberculosis is much less (freeing us all up for other fun stuff!!).

The way this works is that the files described below allow you to customise:

what the result you report looks like for example you might want to report resistant and not resistant
how that result is determined, it may be based on categorical values or single variant values.
how the overall profile of the sequence should be reported

Important - With these criteria it can be thought of like this - tbtAMR will report Susceptible for all drugs UNLESS you supply a criteria to change that changes that behaviour.

Inputs required

VCF file - obviously!

The VCF file must be generated using an appropriate reference genome. Appropriateness is based on your catalogue of mutations, whatever reference genome positions are used in the catalogue must be able to be linked to the correct positions in the VCF file. Note tbtAMR will not fall over if your VCF file is not generated with the correct reference genome - BUT you may get unexpected results. It is up to you to know which is the correct reference genome.

This can be generated by tbtAMR or be supplied by the user (see How to run)

Catalogue of mutations

This is a csv file and contains information of the mutations that you are interested in detecting in your sequence. Every mutation that you are interested in needs have its own line (see here for examples). The following information is REQUIRED:

A column with the names of drugs you are trying to infer resistance to.
A column with the gene name
A the actual variants you are looking for (example gene_x.change)
The confidence of each variant to cause resistance. This actually does not really need to be a confidence level or metric. But it is used in the next file (interpretative criteria) to provide a result based on a category (rather than on a variant by variant basis).

It does not matter really what you call these columns, you will supply this information in the catalogue config file (see below). Furthermore you are welcome to have additional columns as well that you may use in the interpretative criteria file to provide more control.

Interpretative criteria

This is perhaps the most challenging file to construct/understand. This is where the 'magic' happens. This is also a csv file and in this file you will describe how variants should be interpreted.

Criteria are heirarchical.

default criteria will be applied first, these should be your categorical rules, typically based on the 'confidence' column. These are criteria where you can apply a single category to a large block of variants.
override_simple criteria are used where a single condition (or column) that will change the default behaviour
override_complex is where you have more than one condition (or columns) that will change the default behaviour

Table below describes the columns in the interpretation criteria file. Not it does not matter what order these columns are in.

Column name	Column Required	Description
drug	Yes	This is the name of the drug that this criteria is applied to. This needs to be a SINGLE drug and must be exactly the same as in your catalogue. If you have the same rule for multiple drugs, you will need an extra line for each drug. Drugs cannot be grouped together.
rule_type	Yes	`default`,`override_simple`,`override_complex`. As mentioned above, rules are heirarchical, `default` rules are applied first and any `override` rule will modify the result of that `default` rule.
number_conditions	Yes	This column informs how many columns or conditions are used in the criteria, `default` and `override_simple` criteria will have 1 condition, more complex criteria will have more ie where more than one column in the catalogue is involved.
column_1	Yes	This is the condition to check and MUST reflect a column in your catalogue
values_1	Yes	Possible values of `column_1` that define your criteria, if there are more than one possibility these should be separated with ';'
comparator_1	Yes	This is the relationship between `values_1` and `column_1`, typically it is the word `in`
target_1	Yes	This is also required to link the relationships between columns and values, for all cirteria this will be `values`
shape	Yes	This can be left empty and should only be used where there is a length aspect to the criteria, for example where a single variant is detected. Possible values are `== X`, `> X` , `>= X`, where `X` is the shape/size
column_X	No	Where you have `override_complex` rules X will reflect integers (2, 3 and so on). This is another column in your catalogue that should be checked to test the criteria
values_X	No	Where you have `override_complex` rules X will reflect integers (2, 3 and so on). Possible values of `column_X` that define your criteria, if there are more than one possibility these should be separated with ';'
comparator_X	No	Where you have `override_complex` rules X will reflect integers (2, 3 and so on). This is the relationship between `values_X` and `column_X`, typically it is the word `in`
target_X	No	Where you have `override_complex` rules X will reflect integers (2, 3 and so on). This is also required to link the relationships between columns and values, for all cirteria this will be `values_X`
join	Yes	Where you have more than a single condition to be met - this defines how they are relate. Possibilities are `&` or `
interpretation	Yes	What value should be reported as a consequence of this criteria being evaluated to `True`
description	Yes	This can be left blank, but may be useful for information purposes to supply a reason why this interpretation has been reported

Classification criteria

For reporting M. tuberculosis DST the WHO classification is often also supplied. tbtAMR has default criteria that follows the latest WHO criteria, but if you would like to define your own - or have them displayed differently in the outputs you can supply a classification criteria file (also a csv file). These criteria are applied after the interpretation criteria have been used to establish to what drugs a sequence exhibits resistance to. Therefore these criteria are based on drug-resistances NOT on genes, mechanisms or variants.

Column name	Description
classification	This is the words you want displayed for each classification
shape	This is where your criteria is dependent upon the number of drug classes in which non-susceptibility is observed. Possible values are `== X`, `> X` , `>= X`, where `X` is the shape/size
drug_class_condition	The drug class that the drug or drugs fit into `first-line` or leave blank
required_condition	This is the drug (or drugs) where resistance has been detected. Where multiple drugs are part of the criteria, they should be separated by `&` or `
comparator	`tbtAMR` will extract a list of drugs to which the sequence exhibits resistance, this field reflects how the `required_condition` should be assessed against this list. Typically it should be set to `in`
exclusion_comparator	`tbtAMR` will extract a list of drugs to which the sequence exhibits resistance, this field reflects how the `exlusionary_condition` should be assessed against this list. Typically it should be set to `not in`
exlusionary_condition	This criteria is in the list of drugs the sequences exhibits resistance too.
optional_condition	Where a condition is optional, it may be present

Catalogue config file

This is a .json file and this is the file that ties everything together. It is recommended that you use the example file here as a template to ensure that all the relevant fields are supplied. Below is a description of each important key and examples of possible values.

Describe the catalogue

Key	Description	Example
gene_name_col	The column in your catalogue where the gene names are listed	`gene`
variant_col	The column in your catalogue where the variants are listed	`variant`
drug_name_col	The column in your catalogue where the drug names are listed	`drug`
confidence_column	The column in your catalogue where confendence levels are listed, as stated above - these don't NEED to be condfidences as such - but they do need to be categories that can group variants	`FINAL CONFIDENCE GRADING`

Describe the confidence category and resistance levels

Key	Description
confidence_levels	This itself a dictionary, with each key being a confidence (or category) and the key being its order of precendence (lower the number higher the precedence. ie The value that indicates the highest confidence should have the lowest number and so on - only include levels which will be reported.
confidence_key	The confidence key should describe what values you want to appear in the output of tbtamr. If you don not want these to change just repeat the value.
resistance_levels	The levels of resistance that should be reported - these should be consistent with values in your rules and classification rules files, values that have the lowest value indicate the higher resistance level where more than one value is supplied.

Describe the drugs

Key	Description
drugs_to_infer	A list of drugs you wish to infer on - these must be contained in your catalogue
drugs_to_report	A list of drugs you wish to report in the finalised output of tbtamr split into `first-line` and `other` - these must be contained in your catalogue

Cascade reporting

Where you have selected to use a cascade reporting structure, you will define here the default drugs and then increasing levels.

Key	Description
cascade_reporting	Group drugs for casecade reporting structure. This should have drugs grouped in order of reporting - this grouping does not need to have any relationship to drug classification or who criteria and is only representative of the order the user wishes to report. There must be a `default` set of drugs that will appear in all reports, followed by increasing levels as `levelX`. Within the `levelX` key there will be two keys required `resistance_to_any` - which determines what resistances will trigger this level and then `report` which is what genes to report at this level

Variant format

There may be times when you use a different catalogue, where variants are expressed in a format not consistent with WHO v2 catalogue. If this is the case - you need to ensure that the annotation format of you VCF is consistent with the pattern in your catalogue. If you do decide to go down this route, you will need to supply the format of your variant, so that it can be checked with your catalogue. If these do not match tbtAMR will error and not report DST.

Key	Description
catalogue_variant	This is a list of regex expressions that reflect the values in your vcf file annotation and should be consistent with your catalogue

Examples

You can find examples for all of these criteria files here, it is highly recommended that you modify these example files to fit your needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly