-
Notifications
You must be signed in to change notification settings - Fork 2
Home
The following has been altered in the code forked from LSHTM:
-
Parameters have been identified and extracted into external files which are read into the model, these can be found in
configuration/parameters.ini
. Most parameters are applied as factors to a vector which is the same length as the number of age bins. -
Dockerfile for easy running across all systems.
-
Settings file where other options which do not fall under the "parameters" category are set.
-
Command line arguments for pointing model to locations such as the main root directory, the parameters file, the contact matrices file etc.
The latest publication of the paper by LSHTM can be found here. Supplementary material can be found here.
This section contains details relating to the final standard of the model with which cross-validation will be performed.
-
The model is now set to run for a single sample using data for that sample and for the overall region being observed, in the case of SCRC this will likely be a particular health board, and the data for the whole of Scotland. The model runs now within 10 seconds due to there now being only one loop iteration for variables such as
lockdown
, and there only being one sample to deal with (not 186). -
Outputs from the model are
output/*-dynamics.qs
andoutput/*-totals.qs
, from a first look it seems we will be mainly interested intotals
as this contains the populations within each compartment with/without lockdown implemented. -
A
SCRC/R/BuildStructures.R
script is responsible for the bulk of the work in assembling the parameters into the form needed by the model. This script has been structured such as to be usable for both local and API cases. The assembly of the local data included with the model into a form readable byBuildStructures.R
is handled bySCRC/R/localdata.R
and an analog will be written for the API, both creating an arguments object which is later interpreted and added to. -
As it is not fully known to what level the model will be run, choice of the names given to the two datasets has been difficult. At the moment:
-
region
refers to the UK as a whole, this parameter set is used only in the calculation of the R0 correction and so (it is assumed) is not the modelled dataset. The choice of name is because this could be either the UK as a whole (as the model is by default) or Scotland. -
subset
refers to the data/parameter set which is carried further and (I believe) is the set passed into the C++ model itself. The reason for this vague name choice is this could be a UK subregion (e.g. Glasgow, West Midlands etc), or a health board.
Note in the case of the HDF5 contact matrix files read as inputs, the names for the above are
national
andsubregion
respectively. -
Before running make sure you have installed all the requirements including those for R found in SCRC/R/requirements.R
.
The model has three modes of running:
- Local mode: Reads the data within the existing folders. This is mostly for demonstration purposes:
Rscript run_model.R <n-realisations> --local
- Remote mode: Reads the data and parameters from the pipeline.
Rscript run_model.R <n-realisations>
- Test mode: Runs the model and just dumps the parameters to files to be tested by the included test scripts.
Rscript run_model.R 1 --local --dump
When running with the pipeline additional output files are produced within the output
folder, these are two HDF5 files containing numerical results. Within the final push_data
statement in run_model.R
an additional argument of a boolean is available if the user wishes to output as CSV also (set to FALSE
by default).
Plotting is performed by default when running via the API
Warnings may appear during the plotting stage, these simply indicate that some data are not available due to the particular choice of parameters.
Due to the model now being designed to run on a single region at a time the plots display results for just that region. If the output files from a run are labelled run-<label>-<n-realisations>
then the plotting script is run from the repo root as:
Rscript plot_results.R run-<label>-<n-realisations>`
with a PDF and CSV file being produced within the outputs
folder.
An important difference between API running and local running is the existence of a script that handles the fetching of the relevant data and then actually calling the model above once for each of the received contact matrices fetched. Note this is different to the vanilla model which does allow import of several matrices but does not offer distinction between the various regions, that is why these need to be run separately.
The model has been proven to run via a Docker container built using the provided Dockerfile
.
To build the container, run within the repository root:
docker build -t covid-uk .
then run the container:
docker run --label coviduk -ti covid-uk
Drafts for processing scripts can be found in SCRC/data_uploading/processing_scripts
. These are an attempt to try to fetch and format data directly from sources as opposed to using the hard coded sources that came with the model itself.
Parameters for the API interpretation of the model are read from TOML files which also group them into groups.
R0
is the distribution from which R0 values are drawn, it is a normal distribution with parameters for scale and location.
Location after data download (assuming config.yml
in SCRC/pipeline_data
):
SCRC/pipeline_data/R0/<version>.toml
Seeding includes parameters used to seed the model, these include:
-
seed
the seed itself. -
min_age
the minimum age for infection seeding. -
max_age
the maximum age for infection seeding. -
seeding_min_start_day
the minimum day from which infection seeding starts. -
seeding_max_start_day
the latest point at which seeding of infection could start.
Location after data download (assuming config.yml
in SCRC/pipeline_data
):
SCRC/pipeline_data/seeding/<version>.toml
Simulation duration parameters:
-
start_day
starting day for observation (usually0
) -
end_day
end day for observation (usually365
) -
start_date_posix
starting date in posix form.
Location after data download (assuming config.yml
in SCRC/pipeline_data
):
SCRC/pipeline_data/time/<version>.toml
The following is a description of the structure for each dataset required by the model:
-
Two sets of four matrices: "Home", "Work", "School", "Other". One for the region (Scotland) and another for the specific sample (e.g. Health board/subregion). "Other" refers to (within the used POLYMOD data) "transport", "leisure" and "otherplace".
-
Each matrix has columns/rows in bins of 5 age years starting
0-4
and ending75+
. -
These matrices should ultimately be in a list form:
matrices = list(`Scotland` = list(other=..., home = ..., ...), `HealthBoardX` = list(other=..., home = ..., ...)
the model accesses these matrices using the place as a key labelled as region_name = "Scotland"
and
sample_name="HealthBoardX"
within constructed the arguments list.
Single value size
given for region/sample populations which is combined number of male and female members. As such only two numbers are
actually needed.
Model includes a premade dataframe which contains age varying symptomatic rates with columns:
"trial" "lp" "chain" "ll" "f_00" "f_10" "f_20" "f_30" "f_40" "f_50" "f_60" "f_70" "size"
the meaning and definitions behind this are currently unclear to me.
Data relating to the proportion of people within each category. This exists as a dataframe with the following columns:
"Age" "Prop_symptomatic" "IFR" "Prop_inf_hosp" "Prop_inf_critical" "Prop_critical_fatal" "Prop_noncritical_fatal" "Prop_symp_hospitalised" "Prop_hospitalised_critical"
and the data are arranged in columns of multiples of 10 years 10-100
inclusive.
Through examination of the contained files within the repository the following data sources have been identified:
File | Description | Source |
---|---|---|
covidm/data/structure_UK.rds |
Dataframe is created from the UK Mid Year Estimates 2019 (2020 LAD Codes) spreadsheet (sheet: 'MYE2-Persons'), Office for National Statistics |
.xls |
covidm/data/wpp2019_pop2020.rds |
World population data taken from the World Population Prospects 2019 | Female .xlsx Male .xlsx |
data/Global_Mobility_Report.csv |
Mobility of Populations in Different Countries* | .csv |
uk_survey |
Social Contact Data POLYMOD social contact data K. Auranen et. al. |
dataset |
MUestimates_all_locations_[1,2].xlsx |
Contact Matrices for 152 countries Projecting social contact matrices in 152 countries using contact surveys and demographic data A. R. Cook et. al. |
datasets |
* Only appears to be used in plotting script
Actual confirmed and "usable" parameters can be identified within the configuration/parameters.ini
file on the kzscisoft-dev
branch of this repository.
-
The amount of time a given individual spends in states , , , or is drawn from distributions , , or , respectively. However the code refers to variables
dE
,dIp
,dIa
anddIs
.dIa
is unknown and not described in the paper, also the definition of these variables appears to not match the paper. -
High probability that paper does not describe the current state of the model.
Addendum - 28/5/20
- dIa corresponds to dIs from the pre-print. dIs corresponds to dIc from the pre-print. The model is coded as in the pre-print if those two substitutions are made.
Likely Out of Date
Parameter | Description | Value | Source | Comments |
---|---|---|---|---|
Latent period | [2][3][4] | Stated in Table S1 in Paper [1] | ||
Pre-Clinical Infectiousness Duration | [5] | Stated in Table S1 in Paper [1] | ||
Clinical Infectiousness Duration | [2][3][4] | Stated in Table S1 in Paper [1] | ||
Subclinical Infectiousness Duration | [1] | "Assumed to be the same duration as total infectious period for clinical cases, including preclinical transmission" | ||
Hospitalization | 1 | - | Hospitalization Ignored | |
- | Incubation period | [1] | Derived | |
Relative infectiousness of subclinical cases | 50% | [1] | Assumed | |
UK Contact Matrices | covidm/data/all_matrices.rds |
[6] | Number of age-j individuals contacted by age-i individual per day | |
Number of age-i individuals | - | See above | Demographic data (Office for National Statistics) | |
- | Proportion of hospitalised cases requiring critical care | 30% | [7] | Stated in Table S1 in Paper [1] |
Serial Interval | 6.5 days | [2][3][4] | Derived Stated in Table S1 in Paper [1] |
|
- | Delay from onset to hospitalization | [7][8] | Stated in Table S1 of Paper [1] | |
- | Duration of hospitalization | [7] | Stated in Table S1 of Paper [1] | |
- | Proportion of hospitalized cases requiring critical care | 30% | [7] | Stated in Table S1 of Paper [1] |
- | Delay from onset to death | [7][8] | Stated in Table S1 of Paper [1] |
Bin age in bins of 5 years up to 84 (e.g. 0-4
,5-9
etc) then rest fall in bin of 85+.
Existing data is loaded and assembled when building the variable UKParameters1
which contains:
type : chr "SEI3R"
dE : num [1:241] 9.21e-06 6.02e-04 3.27e-03 8.38e-03 1.54e-02 ...
dIp : num [1:241] 0.000395 0.018594 0.069279 0.119197 0.145304 ...
dIa : num [1:241] 3.85e-06 2.62e-04 1.49e-03 4.00e-03 7.71e-03 ...
dIs : num [1:241] 1.55e-05 9.85e-04 5.17e-03 1.28e-02 2.27e-02 ...
dH : num 1
dC : num 1
size : num [1:16] 3914028 4138524 3858894 3669250 4184575 ...
matrices :List of 4
contact : num [1:4] 1 1 1 1
contact_mult : num(0)
contact_lowerto: num(0)
u : num [1:16] 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 ...
y : num [1:16] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
fIp : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
fIs : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
fIa : num [1:16] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
rho : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
tau : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
seed_times : num 1
dist_seed_ages : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
schedule : list()
observer : NULL
name : chr "UK | UNITED KINGDOM"
group_names : chr [1:16] "0-4" "5-9" "10-14" "15-19" ...
covid_scenario
contains age specific clinical fractions as estimated by MCMC. It has the raw MCMC draws, so each row corresponds to a draw from the posterior. To quote from the pre-print: "The age-specific clinical fraction was adopted from an estimate based on case data from 6 countries [11] , and the relative infectiousness of subclinical cases, , was assumed to be 50% relative to clinical cases, as we assumed in a previous study [11] .” This clinical fraction is required for the calculation of the R(0) through the next generation matrix as there are clinical specific parameters.
However, there is an inconsistency in the code as defined. They are both fixing parameters and fixing R(0), this creates an issue, as the parameters fully determine the model R(0) through the greatest eigenvalue of the next generation matrix, so the model functionally has two R(0)s, thus they have to make a correction for this, this is calculated in u_adj
with the ratio of sampled R(0) from the normal distribution and the “empirical” R(0) as defined by the parameters. Based on the definitions in the paper, I believe this is an adjustment to individual susceptibility to infection. They then use this adjustment to the individual’s susceptibility to infection to correct things downstream.
Three levels of population grouping: Level 0:
[1] "UK | UNITED KINGDOM"
Level 1:
[1] "UK | ENGLAND" "UK | WALES" "UK | SCOTLAND" "UK | NORTHERN IRELAND"
Level 2:
UK | NORTH EAST
UK | NORTH WEST
UK | YORKSHIRE AND THE HUMBER
UK | EAST MIDLANDS
UK | WEST MIDLANDS
UK | EAST
UK | LONDON
UK | SOUTH EAST
UK | SOUTH WEST
UK | WALES
UK | SCOTLAND
UK | NORTHERN IRELAND
Level 3 & Level 4 have higher granularity still.
-
The effect of non-pharmaceutical interventions on COVID-19 cases, deaths and demand for hospital services in the UK: a modelling study, N. G. Davies et. al, 2020, Table S1, pg. 23
-
Early Transmission Dynamics in Wuhan China, of Novel Coronavirus-Infected Pneumonia, L. Q. Guan et. al., 2020, N Engl J Med. 2020;382: 1199-1207.
-
Epidemiology and Transmission of COVID-19 in Shenzhen China: Analysis of 391 cases and 1,286 of their close contacts, medRxiv. 2020;2020.03.03.20028423.
-
Serial interval of novel coronavirus (2019-nCoV) infections, medRxiv. 2020; 2020.02.03.20019497
-
The contribution of pre-symptomatic infection to the transmission dynamics of COVID-2019, Liu Y et. al, Wellcome Open Research. 2020;5:58.
-
Social contacts and mixing patterns relevant to the spread of infectious diseases, J. Mossong et. al, PLoS Med. 2008;5 e74.
-
A Trial of Lopinavir-Ritonavir in Adults Hospitalized with Sever Covid-19, B. Cao et. al, N Engl H Med. 2020. doi:10.1056/NEJMoa2001282.
-
Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis Of Publicly Available Case Data, N. M. Linton et. al, J Clin Med Res. 2020;9. doi:10.3390/jcm9020538.