Skip to content
Kristian Zarębski edited this page Sep 18, 2020 · 61 revisions

Contents


Development

The following has been altered in the code forked from LSHTM:

  • Parameters have been identified and extracted into external files which are read into the model, these can be found in configuration/parameters.ini. Most parameters are applied as factors to a vector which is the same length as the number of age bins.

  • Dockerfile for easy running across all systems.

  • Settings file where other options which do not fall under the "parameters" category are set.

  • Command line arguments for pointing model to locations such as the main root directory, the parameters file, the contact matrices file etc.

Latest Paper Publication

The latest publication of the paper by LSHTM can be found here. Supplementary material can be found here.

SCRC Implementation

This section contains details relating to the final standard of the model with which cross-validation will be performed.

Outline

  • The model is now set to run for a single sample using data for that sample and for the overall region being observed, in the case of SCRC this will likely be a particular health board, and the data for the whole of Scotland. The model runs now within 10 seconds due to there now being only one loop iteration for variables such as lockdown, and there only being one sample to deal with (not 186).

  • Outputs from the model are output/*-dynamics.qs and output/*-totals.qs, from a first look it seems we will be mainly interested in totals as this contains the populations within each compartment with/without lockdown implemented.

  • A SCRC/R/BuildStructures.R script is responsible for the bulk of the work in assembling the parameters into the form needed by the model. This script has been structured such as to be usable for both local and API cases. The assembly of the local data included with the model into a form readable by BuildStructures.R is handled by SCRC/R/localdata.R and an analog will be written for the API, both creating an arguments object which is later interpreted and added to.

  • As it is not fully known to what level the model will be run, choice of the names given to the two datasets has been difficult. At the moment:

    • region refers to the UK as a whole, this parameter set is used only in the calculation of the R0 correction and so (it is assumed) is not the modelled dataset. The choice of name is because this could be either the UK as a whole (as the model is by default) or Scotland.
    • subset refers to the data/parameter set which is carried further and (I believe) is the set passed into the C++ model itself. The reason for this vague name choice is this could be a UK subregion (e.g. Glasgow, West Midlands etc), or a health board.

    Note in the case of the HDF5 contact matrix files read as inputs, the names for the above are national and subregion respectively.

Running

Before running make sure you have installed all the requirements including those for R found in SCRC/R/requirements.R.

The model has three modes of running:

  1. Local mode: Reads the data within the existing folders. This is mostly for demonstration purposes:
Rscript run_model.R <n-realisations> --local
  1. Remote mode: Reads the data and parameters from the pipeline.
Rscript run_model.R <n-realisations>
  1. Test mode: Runs the model and just dumps the parameters to files to be tested by the included test scripts.
Rscript run_model.R 1 --local --dump

When running with the pipeline additional output files are produced within the output folder, these are two HDF5 files containing numerical results. Within the final push_data statement in run_model.R an additional argument of a boolean is available if the user wishes to output as CSV also (set to FALSE by default).

Plotting

Plotting is performed by default when running via the API

Warnings may appear during the plotting stage, these simply indicate that some data are not available due to the particular choice of parameters.

Due to the model now being designed to run on a single region at a time the plots display results for just that region. If the output files from a run are labelled run-<label>-<n-realisations> then the plotting script is run from the repo root as:

Rscript plot_results.R run-<label>-<n-realisations>`

with a PDF and CSV file being produced within the outputs folder.

Pipeline Model Automation

An important difference between API running and local running is the existence of a script that handles the fetching of the relevant data and then actually calling the model above once for each of the received contact matrices fetched. Note this is different to the vanilla model which does allow import of several matrices but does not offer distinction between the various regions, that is why these need to be run separately.

Docker

The model has been proven to run via a Docker container built using the provided Dockerfile. To build the container, run within the repository root:

docker build -t covid-uk .

then run the container:

docker run --label coviduk -ti covid-uk

Processing Scripts

Drafts for processing scripts can be found in SCRC/data_uploading/processing_scripts. These are an attempt to try to fetch and format data directly from sources as opposed to using the hard coded sources that came with the model itself.

Parameter Structure

Parameters for the API interpretation of the model are read from TOML files which also group them into groups.

R0

R0 is the distribution from which R0 values are drawn, it is a normal distribution with parameters for scale and location.

Location after data download (assuming config.yml in SCRC/pipeline_data):

SCRC/pipeline_data/R0/<version>.toml

Seeding

Seeding includes parameters used to seed the model, these include:

  • seed the seed itself.
  • min_age the minimum age for infection seeding.
  • max_age the maximum age for infection seeding.
  • seeding_min_start_day the minimum day from which infection seeding starts.
  • seeding_max_start_day the latest point at which seeding of infection could start.

Location after data download (assuming config.yml in SCRC/pipeline_data):

SCRC/pipeline_data/seeding/<version>.toml

Time

Simulation duration parameters:

  • start_day starting day for observation (usually 0)
  • end_day end day for observation (usually 365)
  • start_date_posix starting date in posix form.

Location after data download (assuming config.yml in SCRC/pipeline_data):

SCRC/pipeline_data/time/<version>.toml

Data File Structure

The following is a description of the structure for each dataset required by the model:

Contact Matrices

  • Two sets of four matrices: "Home", "Work", "School", "Other". One for the region (Scotland) and another for the specific sample (e.g. Health board/subregion). "Other" refers to (within the used POLYMOD data) "transport", "leisure" and "otherplace".

  • Each matrix has columns/rows in bins of 5 age years starting 0-4 and ending 75+.

  • These matrices should ultimately be in a list form:

matrices = list(`Scotland` = list(other=..., home = ..., ...), `HealthBoardX` = list(other=..., home = ..., ...)

the model accesses these matrices using the place as a key labelled as region_name = "Scotland" and sample_name="HealthBoardX" within constructed the arguments list.

Population Structure

Single value size given for region/sample populations which is combined number of male and female members. As such only two numbers are actually needed.

Age Varying Symptomatic Rates

Model includes a premade dataframe which contains age varying symptomatic rates with columns:

"trial" "lp"    "chain" "ll"    "f_00"  "f_10"  "f_20"  "f_30"  "f_40"  "f_50"  "f_60"  "f_70"  "size"

the meaning and definitions behind this are currently unclear to me.

Health Burden Processes Data

Data relating to the proportion of people within each category. This exists as a dataframe with the following columns:

"Age" "Prop_symptomatic" "IFR" "Prop_inf_hosp" "Prop_inf_critical" "Prop_critical_fatal" "Prop_noncritical_fatal" "Prop_symp_hospitalised" "Prop_hospitalised_critical"

and the data are arranged in columns of multiples of 10 years 10-100 inclusive.

Data Sources

Through examination of the contained files within the repository the following data sources have been identified:

File Description Source
covidm/data/structure_UK.rds Dataframe is created from the UK Mid Year Estimates 2019
(2020 LAD Codes) spreadsheet (sheet: 'MYE2-Persons'),
Office for National Statistics
.xls
covidm/data/wpp2019_pop2020.rds World population data taken from the World Population Prospects 2019 Female .xlsx
Male .xlsx
data/Global_Mobility_Report.csv Mobility of Populations in Different Countries* .csv
uk_survey Social Contact Data
POLYMOD social contact data
K. Auranen et. al.
dataset
MUestimates_all_locations_[1,2].xlsx Contact Matrices for 152 countries
Projecting social contact matrices in 152 countries using contact surveys and demographic data
A. R. Cook et. al.
datasets

* Only appears to be used in plotting script

Parameters

Actual confirmed and "usable" parameters can be identified within the configuration/parameters.ini file on the kzscisoft-dev branch of this repository.

Issues

  • The amount of time a given individual spends in states , , , or is drawn from distributions , , or , respectively. However the code refers to variables dE, dIp, dIa and dIs. dIa is unknown and not described in the paper, also the definition of these variables appears to not match the paper.

  • High probability that paper does not describe the current state of the model.

Addendum - 28/5/20

  • dIa corresponds to dIs from the pre-print. dIs corresponds to dIc from the pre-print. The model is coded as in the pre-print if those two substitutions are made.

Parameter Table

Likely Out of Date

Parameter Description Value Source Comments
Latent period \Gamma(\mu=4.0,k=4) [2][3][4] Stated in Table S1 in Paper [1]
Pre-Clinical Infectiousness Duration \Gamma(\mu=1.5,k=4) [5] Stated in Table S1 in Paper [1]
Clinical Infectiousness Duration \Gamma(\mu=3.5,k=4) [2][3][4] Stated in Table S1 in Paper [1]
Subclinical Infectiousness Duration \Gamma(\mu=5.0,k=4) [1] "Assumed to be the same duration as total infectious period for clinical cases, including preclinical transmission"
Hospitalization 1 - Hospitalization Ignored
- Incubation period d_E+d_P; \mu=5.5 [1] Derived
f Relative infectiousness of subclinical cases 50% [1] Assumed
c_{ij} UK Contact Matrices covidm/data/all_matrices.rds [6] Number of age-j individuals contacted by age-i individual per day
N_{i} Number of age-i individuals - See above Demographic data (Office for National Statistics)
- Proportion of hospitalised cases requiring critical care 30% [7] Stated in Table S1 in Paper [1]
d_E+(y_i(d_P+d_C)+(1-y_i)d_S)/2 Serial Interval 6.5 days [2][3][4] Derived
Stated in Table S1 in Paper [1]
- Delay from onset to hospitalization \Gamma(\mu=7,k=7) [7][8] Stated in Table S1 of Paper [1]
- Duration of hospitalization \Gamma(\mu=10,k=10) [7] Stated in Table S1 of Paper [1]
- Proportion of hospitalized cases requiring critical care 30% [7] Stated in Table S1 of Paper [1]
- Delay from onset to death \Gamma(\mu=22,k=22) [7][8] Stated in Table S1 of Paper [1]

Binning

Bin age in bins of 5 years up to 84 (e.g. 0-4,5-9 etc) then rest fall in bin of 85+.

Appendices

UK Parameters

Existing data is loaded and assembled when building the variable UKParameters1 which contains:

type           : chr "SEI3R"
  dE             : num [1:241] 9.21e-06 6.02e-04 3.27e-03 8.38e-03 1.54e-02 ...
  dIp            : num [1:241] 0.000395 0.018594 0.069279 0.119197 0.145304 ...
  dIa            : num [1:241] 3.85e-06 2.62e-04 1.49e-03 4.00e-03 7.71e-03 ...
  dIs            : num [1:241] 1.55e-05 9.85e-04 5.17e-03 1.28e-02 2.27e-02 ...
  dH             : num 1
  dC             : num 1
  size           : num [1:16] 3914028 4138524 3858894 3669250 4184575 ...
  matrices       :List of 4
  contact        : num [1:4] 1 1 1 1
  contact_mult   : num(0)
  contact_lowerto: num(0)
  u              : num [1:16] 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 ...
  y              : num [1:16] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
  fIp            : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
  fIs            : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
  fIa            : num [1:16] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
  rho            : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
  tau            : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
  seed_times     : num 1
  dist_seed_ages : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
  schedule       : list()
  observer       : NULL
  name           : chr "UK | UNITED KINGDOM"
  group_names    : chr [1:16] "0-4" "5-9" "10-14" "15-19" ...

covidy Parameter

covid_scenario contains age specific clinical fractions as estimated by MCMC. It has the raw MCMC draws, so each row corresponds to a draw from the posterior. To quote from the pre-print: "The age-specific clinical fraction was adopted from an estimate based on case data from 6 countries [11] , and the relative infectiousness of subclinical cases, , was assumed to be 50% relative to clinical cases, as we assumed in a previous study [11] .” This clinical fraction is required for the calculation of the R(0) through the next generation matrix as there are clinical specific parameters.

However, there is an inconsistency in the code as defined. They are both fixing parameters and fixing R(0), this creates an issue, as the parameters fully determine the model R(0) through the greatest eigenvalue of the next generation matrix, so the model functionally has two R(0)s, thus they have to make a correction for this, this is calculated in u_adj with the ratio of sampled R(0) from the normal distribution and the “empirical” R(0) as defined by the parameters. Based on the definitions in the paper, I believe this is an adjustment to individual susceptibility to infection. They then use this adjustment to the individual’s susceptibility to infection to correct things downstream.

Levels

Three levels of population grouping: Level 0:

[1] "UK | UNITED KINGDOM"

Level 1:

[1] "UK | ENGLAND"          "UK | WALES"            "UK | SCOTLAND"         "UK | NORTHERN IRELAND"

Level 2:

UK | NORTH EAST
UK | NORTH WEST
UK | YORKSHIRE AND THE HUMBER
UK | EAST MIDLANDS
UK | WEST MIDLANDS
UK | EAST
UK | LONDON
UK | SOUTH EAST
UK | SOUTH WEST
UK | WALES
UK | SCOTLAND
UK | NORTHERN IRELAND

Level 3 & Level 4 have higher granularity still.

References

  1. The effect of non-pharmaceutical interventions on COVID-19 cases, deaths and demand for hospital services in the UK: a modelling study, N. G. Davies et. al, 2020, Table S1, pg. 23

  2. Early Transmission Dynamics in Wuhan China, of Novel Coronavirus-Infected Pneumonia, L. Q. Guan et. al., 2020, N Engl J Med. 2020;382: 1199-1207.

  3. Epidemiology and Transmission of COVID-19 in Shenzhen China: Analysis of 391 cases and 1,286 of their close contacts, medRxiv. 2020;2020.03.03.20028423.

  4. Serial interval of novel coronavirus (2019-nCoV) infections, medRxiv. 2020; 2020.02.03.20019497

  5. The contribution of pre-symptomatic infection to the transmission dynamics of COVID-2019, Liu Y et. al, Wellcome Open Research. 2020;5:58.

  6. Social contacts and mixing patterns relevant to the spread of infectious diseases, J. Mossong et. al, PLoS Med. 2008;5 e74.

  7. A Trial of Lopinavir-Ritonavir in Adults Hospitalized with Sever Covid-19, B. Cao et. al, N Engl H Med. 2020. doi:10.1056/NEJMoa2001282.

  8. Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis Of Publicly Available Case Data, N. M. Linton et. al, J Clin Med Res. 2020;9. doi:10.3390/jcm9020538.