Skip to content

Customising your inputs

Kristy Horan edited this page Oct 4, 2024 · 6 revisions

Hands-free updates (sort of)

tbtAMR was designed to allow for flexible updates to the mutational catalogue and/or to the interpretative criteria for that catalogue. The codebase of tbtAMR is designed to read these files and apply the user-defined criteria to the mutations detected in the vcf file. In this way, there is no need for bioinformatician or developer to modify interpretative logic. A wet-lab or medical microbiologist can update the criteria as required. Obviously a bioinformatician is exceedingly useful in running the tool, but the technical overhead of update and maintenance of criteria that is so important to effective and appropriate reporting of DST in M. tuberculosis is much less (freeing us all up for other fun stuff!!).

The way this works is that the files described below allow you to customise:

  • what the result you report looks like for example you might want to report resistant and not resistant
  • how that result is determined, it may be based on categorical values or single variant values.
  • how the overall profile of the sequence should be reported

Important - With these criteria it can be thought of like this - tbtAMR will report Susceptible for all drugs UNLESS you supply a criteria to change that changes that behaviour.

Inputs required

VCF file - obviously!

The VCF file must be generated using an appropriate reference genome. Appropriateness is based on your catalogue of mutations, whatever reference genome positions are used in the catalogue must be able to be linked to the correct positions in the VCF file. Note tbtAMR will not fall over if your VCF file is not generated with the correct reference genome - BUT you may get unexpected results. It is up to you to know which is the correct reference genome.

  • This can be generated by tbtAMR or be supplied by the user (see How to run)

Catalogue of mutations

This is a csv file and contains information of the mutations that you are interested in detecting in your sequence. Every mutation that you are interested in needs have its own line (see here for examples). The following information is REQUIRED:

  • A column with the names of drugs you are trying to infer resistance to.
  • A column with the gene name
  • A the actual variants you are looking for (example gene_x.change)
  • The confidence of each variant to cause resistance. This actually does not really need to be a confidence level or metric. But it is used in the next file (interpretative criteria) to provide a result based on a category (rather than on a variant by variant basis).

It does not matter really what you call these columns, you will supply this information in the catalogue config file (see below). Furthermore you are welcome to have additional columns as well that you may use in the interpretative criteria file to provide more control.

Interpretative criteria

This is perhaps the most challenging file to construct/understand. This is where the 'magic' happens. This is also a csv file and in this file you will describe how variants should be interpreted.

Criteria are heirarchical.

  • default criteria will be applied first, these should be your categorical rules, typically based on the 'confidence' column. These are criteria where you can apply a single category to a large block of variants.
  • override_simple criteria are used where a single condition (or column) that will change the default behaviour
  • override_complex is where you have more than one condition (or columns) that will change the default behaviour

Table below describes the columns in the interpretation criteria file. Not it does not matter what order these columns are in.

Column name Column Required Description
drug Yes This is the name of the drug that this criteria is applied to. This needs to be a SINGLE drug and must be exactly the same as in your catalogue. If you have the same rule for multiple drugs, you will need an extra line for each drug. Drugs cannot be grouped together.
rule_type Yes default,override_simple,override_complex. As mentioned above, rules are heirarchical, default rules are applied first and any override rule will modify the result of that default rule.
number_conditions Yes This column informs how many columns or conditions are used in the criteria, default and override_simple criteria will have 1 condition, more complex criteria will have more ie where more than one column in the catalogue is involved.
column_1 Yes This is the condition to check and MUST reflect a column in your catalogue
values_1 Yes Possible values of column_1 that define your criteria, if there are more than one possibility these should be separated with ';'
comparator_1 Yes This is the relationship between values_1 and column_1, typically it is the word in
target_1 Yes This is also required to link the relationships between columns and values, for all cirteria this will be values
shape Yes This can be left empty and should only be used where there is a length aspect to the criteria, for example where a single variant is detected. Possible values are == X, > X , >= X, where X is the shape/size
column_X No Where you have override_complex rules X will reflect integers (2, 3 and so on). This is another column in your catalogue that should be checked to test the criteria
values_X No Where you have override_complex rules X will reflect integers (2, 3 and so on). Possible values of column_X that define your criteria, if there are more than one possibility these should be separated with ';'
comparator_X No Where you have override_complex rules X will reflect integers (2, 3 and so on). This is the relationship between values_X and column_X, typically it is the word in
target_X No Where you have override_complex rules X will reflect integers (2, 3 and so on). This is also required to link the relationships between columns and values, for all cirteria this will be values_X
join Yes Where you have more than a single condition to be met - this defines how they are relate. Possibilities are & or `
interpretation Yes What value should be reported as a consequence of this criteria being evaluated to True
description Yes This can be left blank, but may be useful for information purposes to supply a reason why this interpretation has been reported

Classification criteria

For reporting M. tuberculosis DST the WHO classification is often also supplied. tbtAMR has default criteria that follows the latest WHO criteria, but if you would like to define your own - or have them displayed differently in the outputs you can supply a classification criteria file (also a csv file). These criteria are applied after the interpretation criteria have been used to establish to what drugs a sequence exhibits resistance to. Therefore these criteria are based on drug-resistances NOT on genes, mechanisms or variants.

Column name Description
classification This is the words you want displayed for each classification
shape This is where your criteria is dependent upon the number of drug classes in which non-susceptibility is observed. Possible values are == X, > X , >= X, where X is the shape/size
drug_class_condition The drug class that the drug or drugs fit into first-line or leave blank
required_condition This is the drug (or drugs) where resistance has been detected. Where multiple drugs are part of the criteria, they should be separated by & or `
comparator tbtAMR will extract a list of drugs to which the sequence exhibits resistance, this field reflects how the required_condition should be assessed against this list. Typically it should be set to in
exclusion_comparator tbtAMR will extract a list of drugs to which the sequence exhibits resistance, this field reflects how the exlusionary_condition should be assessed against this list. Typically it should be set to not in
exlusionary_condition This criteria is in the list of drugs the sequences exhibits resistance too.
optional_condition Where a condition is optional, it may be present

Catalogue config file

This is a .json file and this is the file that ties everything together. It is recommended that you use the example file here as a template to ensure that all the relevant fields are supplied. Below is a description of each important key and examples of possible values.

Describe the catalogue

Key Description Example
gene_name_col The column in your catalogue where the gene names are listed gene
variant_col The column in your catalogue where the variants are listed variant
drug_name_col The column in your catalogue where the drug names are listed drug
confidence_column The column in your catalogue where confendence levels are listed, as stated above - these don't NEED to be condfidences as such - but they do need to be categories that can group variants FINAL CONFIDENCE GRADING

Describe the confidence category and resistance levels

Key Description
confidence_levels This itself a dictionary, with each key being a confidence (or category) and the key being its order of precendence (lower the number higher the precedence. ie The value that indicates the highest confidence should have the lowest number and so on - only include levels which will be reported.
confidence_key The confidence key should describe what values you want to appear in the output of tbtamr. If you don not want these to change just repeat the value.
resistance_levels The levels of resistance that should be reported - these should be consistent with values in your rules and classification rules files, values that have the lowest value indicate the higher resistance level where more than one value is supplied.

Describe the drugs

Key Description
drugs_to_infer A list of drugs you wish to infer on - these must be contained in your catalogue
drugs_to_report A list of drugs you wish to report in the finalised output of tbtamr split into first-line and other - these must be contained in your catalogue

Cascade reporting

Where you have selected to use a cascade reporting structure, you will define here the default drugs and then increasing levels.

Key Description
cascade_reporting Group drugs for casecade reporting structure. This should have drugs grouped in order of reporting - this grouping does not need to have any relationship to drug classification or who criteria and is only representative of the order the user wishes to report. There must be a default set of drugs that will appear in all reports, followed by increasing levels as levelX. Within the levelX key there will be two keys required resistance_to_any - which determines what resistances will trigger this level and then report which is what genes to report at this level

Variant format

There may be times when you use a different catalogue, where variants are expressed in a format not consistent with WHO v2 catalogue. If this is the case - you need to ensure that the annotation format of you VCF is consistent with the pattern in your catalogue. If you do decide to go down this route, you will need to supply the format of your variant, so that it can be checked with your catalogue. If these do not match tbtAMR will error and not report DST.

Key Description
catalogue_variant This is a list of regex expressions that reflect the values in your vcf file annotation and should be consistent with your catalogue

Examples

You can find examples for all of these criteria files here, it is highly recommended that you modify these example files to fit your needs.