-
Notifications
You must be signed in to change notification settings - Fork 2
Customising your inputs
tbtAMR
was designed to allow for flexible updates to the mutational catalogue and/or to the interpretative criteria for that catalogue. The codebase of tbtAMR
is designed to read these files and apply the user-defined criteria to the mutations detected in the vcf file. In this way, there is no need for bioinformatician or developer to modify interpretative logic. A wet-lab or medical microbiologist can update the criteria as required. Obviously a bioinformatician is exceedingly useful in running the tool, but the technical overhead of update and maintenance of criteria that is so important to effective and appropriate reporting of DST in M. tuberculosis is much less (freeing us all up for other fun stuff!!).
The way this works is that the files described below allow you to customise:
- what the result you report looks like for example you might want to report
resistant
andnot resistant
- how that result is determined, it may be based on categorical values or single variant values.
- how the overall profile of the sequence should be reported
Important - With these criteria it can be thought of like this - tbtAMR
will report Susceptible
for all drugs UNLESS you supply a criteria to change that changes that behaviour.
The VCF file must be generated using an appropriate reference genome. Appropriateness is based on your catalogue of mutations, whatever reference genome positions are used in the catalogue must be able to be linked to the correct positions in the VCF file. Note tbtAMR
will not fall over if your VCF file is not generated with the correct reference genome - BUT you may get unexpected results. It is up to you to know which is the correct reference genome.
- This can be generated by
tbtAMR
or be supplied by the user (seeHow to run
)
This is a csv file and contains information of the mutations that you are interested in detecting in your sequence. Every mutation that you are interested in needs have its own line (see here for examples). The following information is REQUIRED:
- A column with the names of drugs you are trying to infer resistance to.
- A column with the gene name
- A the actual variants you are looking for (example gene_x.change)
- The confidence of each variant to cause resistance. This actually does not really need to be a confidence level or metric. But it is used in the next file (interpretative criteria) to provide a result based on a category (rather than on a variant by variant basis).
It does not matter really what you call these columns, you will supply this information in the catalogue config
file (see below). Furthermore you are welcome to have additional columns as well that you may use in the interpretative criteria file to provide more control.
This is perhaps the most challenging file to construct/understand. This is where the 'magic' happens. This is also a csv file and in this file you will describe how variants should be interpreted.
Criteria are heirarchical.
-
default
criteria will be applied first, these should be your categorical rules, typically based on the 'confidence' column. These are criteria where you can apply a single category to a large block of variants. -
override_simple
criteria are used where a single condition (or column) that will change the default behaviour -
override_complex
is where you have more than one condition (or columns) that will change the default behaviour
Table below describes the columns in the interpretation criteria file. Not it does not matter what order these columns are in.
Column name | Column Required | Description |
---|---|---|
drug | Yes | This is the name of the drug that this criteria is applied to. This needs to be a SINGLE drug and must be exactly the same as in your catalogue. If you have the same rule for multiple drugs, you will need an extra line for each drug. Drugs cannot be grouped together. |
rule_type | Yes |
default ,override_simple ,override_complex . As mentioned above, rules are heirarchical, default rules are applied first and any override rule will modify the result of that default rule. |
number_conditions | Yes | This column informs how many columns or conditions are used in the criteria, default and override_simple criteria will have 1 condition, more complex criteria will have more ie where more than one column in the catalogue is involved. |
column_1 | Yes | This is the condition to check and MUST reflect a column in your catalogue |
values_1 | Yes | Possible values of column_1 that define your criteria, if there are more than one possibility these should be separated with ';' |
comparator_1 | Yes | This is the relationship between values_1 and column_1 , typically it is the word in
|
target_1 | Yes | This is also required to link the relationships between columns and values, for all cirteria this will be values
|
shape | Yes | This can be left empty and should only be used where there is a length aspect to the criteria, for example where a single variant is detected. Possible values are == X , > X , >= X , where X is the shape/size |
column_X | No | Where you have override_complex rules X will reflect integers (2, 3 and so on). This is another column in your catalogue that should be checked to test the criteria |
values_X | No | Where you have override_complex rules X will reflect integers (2, 3 and so on). Possible values of column_X that define your criteria, if there are more than one possibility these should be separated with ';' |
comparator_X | No | Where you have override_complex rules X will reflect integers (2, 3 and so on). This is the relationship between values_X and column_X , typically it is the word in
|
target_X | No | Where you have override_complex rules X will reflect integers (2, 3 and so on). This is also required to link the relationships between columns and values, for all cirteria this will be values_X
|
join | Yes | Where you have more than a single condition to be met - this defines how they are relate. Possibilities are & or ` |
interpretation | Yes | What value should be reported as a consequence of this criteria being evaluated to True
|
description | Yes | This can be left blank, but may be useful for information purposes to supply a reason why this interpretation has been reported |
For reporting M. tuberculosis DST the WHO classification is often also supplied. tbtAMR
has default criteria that follows the latest WHO criteria, but if you would like to define your own - or have them displayed differently in the outputs you can supply a classification criteria file (also a csv file). These criteria are applied after the interpretation criteria have been used to establish to what drugs a sequence exhibits resistance to. Therefore these criteria are based on drug-resistances NOT on genes, mechanisms or variants.
Column name | Description |
---|---|
classification | This is the words you want displayed for each classification |
shape | This is where your criteria is dependent upon the number of drug classes in which non-susceptibility is observed. Possible values are == X , > X , >= X , where X is the shape/size |
drug_class_condition | The drug class that the drug or drugs fit into first-line or leave blank |
required_condition | This is the drug (or drugs) where resistance has been detected. Where multiple drugs are part of the criteria, they should be separated by & or ` |
comparator |
tbtAMR will extract a list of drugs to which the sequence exhibits resistance, this field reflects how the required_condition should be assessed against this list. Typically it should be set to in
|
exclusion_comparator |
tbtAMR will extract a list of drugs to which the sequence exhibits resistance, this field reflects how the exlusionary_condition should be assessed against this list. Typically it should be set to not in
|
exlusionary_condition | This criteria is in the list of drugs the sequences exhibits resistance too. |
optional_condition | Where a condition is optional, it may be present |
This is a .json
file and this is the file that ties everything together. It is recommended that you use the example file here as a template to ensure that all the relevant fields are supplied. Below is a description of each important key and examples of possible values.
Key | Description | Example |
---|---|---|
gene_name_col | The column in your catalogue where the gene names are listed | gene |
variant_col | The column in your catalogue where the variants are listed | variant |
drug_name_col | The column in your catalogue where the drug names are listed | drug |
confidence_column | The column in your catalogue where confendence levels are listed, as stated above - these don't NEED to be condfidences as such - but they do need to be categories that can group variants | FINAL CONFIDENCE GRADING |
Key | Description |
---|---|
confidence_levels | This itself a dictionary, with each key being a confidence (or category) and the key being its order of precendence (lower the number higher the precedence. ie The value that indicates the highest confidence should have the lowest number and so on - only include levels which will be reported. |
confidence_key | The confidence key should describe what values you want to appear in the output of tbtamr. If you don not want these to change just repeat the value. |
resistance_levels | The levels of resistance that should be reported - these should be consistent with values in your rules and classification rules files, values that have the lowest value indicate the higher resistance level where more than one value is supplied. |
Key | Description |
---|---|
drugs_to_infer | A list of drugs you wish to infer on - these must be contained in your catalogue |
drugs_to_report | A list of drugs you wish to report in the finalised output of tbtamr split into first-line and other - these must be contained in your catalogue |
Where you have selected to use a cascade reporting structure, you will define here the default drugs and then increasing levels.
Key | Description |
---|---|
cascade_reporting | Group drugs for casecade reporting structure. This should have drugs grouped in order of reporting - this grouping does not need to have any relationship to drug classification or who criteria and is only representative of the order the user wishes to report. There must be a default set of drugs that will appear in all reports, followed by increasing levels as levelX . Within the levelX key there will be two keys required resistance_to_any - which determines what resistances will trigger this level and then report which is what genes to report at this level |
There may be times when you use a different catalogue, where variants are expressed in a format not consistent with WHO v2 catalogue. If this is the case - you need to ensure that the annotation format of you VCF is consistent with the pattern in your catalogue. If you do decide to go down this route, you will need to supply the format of your variant, so that it can be checked with your catalogue. If these do not match tbtAMR
will error and not report DST.
Key | Description |
---|---|
catalogue_variant | This is a list of regex expressions that reflect the values in your vcf file annotation and should be consistent with your catalogue |
You can find examples for all of these criteria files here, it is highly recommended that you modify these example files to fit your needs.