Simulating ground motion fields conditioned on station data #8317

raoanirudh · 2023-01-03T10:00:46Z

Simulating ground motion fields conditioned on station data

Rationale

Given an earthquake rupture model, a site model, and a set of ground shaking intensity models (GSIMs), the OpenQuake scenario calculator simulates a set of ground motion fields (GMFs) at the target sites for the requested set of intensity measure types (IMTs). This set of GMFs can then be used in the scenario_damage and scenario_risk calculators to estimate the distribution of potential damage, economic losses, fatalities, and other consequences. The scenario calculators are useful for simulating both historical and hypothetical earthquakes.

For historical events, ground motion recordings and macroseismic intensity data ("ground truth") can provide additional constraints on the ground shaking estimates in the region affected by the earthquake. The U.S. Geological Survey ShakeMap system (Worden et al., 2020¹, Wald et al., 2022²) provides near-real time estimates of ground shaking for earthquakes, conditioned on station and macroseismic data where available. Support to read USGS ShakeMaps and simulate GMFs based on the mean and standard deviation values estimated by the ShakeMap — for peak ground acceleration (PGA) and spectral acceleration (SA) at 0.3s, 1.0s, and 3.0s — was added to the OpenQuake-engine in #3545 and #3606, and a link to the scenario_damage and scenario_risk calculators was added in #3608, based on the methodology described by Silva & Horspool (2019)³. Further support for reading ShakeMaps from sources other than the USGS was added as discussed in #6572. Support for MMI was added in #6660, and some performance improvements were introduced in #6624.

Starting from USGS ShakeMap v4.1, the conditioning of the ground shaking is based on the methodology developed by Engler et al. (2022)⁴, which preserves the partition of the (conditioned) within-event and between-event residuals, thus enabling more accurate propagation of the uncertainties through a Monte Carlo simulation approach.

The ability to run OpenQuake scenario damage and loss calculations starting from ShakeMaps from multiple providers has proved to be a useful feature for the assessment of damage and losses for past earthquakes, development and calibration of fragility and vulnerability models, and testing of probabilistic seismic risk models⁵⁶. However, there are still many instances where the user may wish to consider a different set of assumptions in the ground motion conditioning process compared to the ShakeMap service providers. For instance, users may be interested in:

testing the impact of using different rupture geometries and source parameters from published literature for an event⁷
including additional recording station data that they have obtained, or excluding certain outliers based on criteria they deem relevant
employing a different Ground-Motion–Intensity Conversion Equation (GMICE) to convert macroseismic data to ground motion intensities or choosing to ignore the macroseismic data altogether
testing the behavior of individual GMPEs compared to the station data or using a weighted average GMPE for the tectonic region based on a probabilistic hazard model
incorporating a custom vs30 grid or using custom site-amplification functions if microzonation studies are available for the affected area
obtaining estimates for IMT(s) that are not directly available from the ShakeMap provider(s)
employing different models for the spatial cross-correlation of the within-event residuals and cross-correlation of the IMs for the between-event residuals

For the reasons mentioned above, it would be beneficial to include within the OpenQuake-engine, a Conditioned GMF calculator tightly coupled with the existing scenario calculator⁸, that would:

accept user-provided station data in addition to the usual inputs for a scenario calculation including an earthquake rupture model, a site model, and a single GSIM or a GSIM logic tree
calculate the conditioned mean and conditioned between-event and within-event covariances for the target sites for the requested IMTs, for each GSIM given the station data, based on the procedure outlined in Engler et al. (2022)⁴
simulate the requested number of GMFs at the target sites
proceed to damage / loss / consequence calculations if desired

New inputs

Station data csv file

In addition to the usual inputs for a scenario calculation including an earthquake rupture model, a site model, and a single GSIM or a GSIM logic tree, the user would also need to provide a csv file containing the observed intensity values for one or more intensity measure types at a set of ground motion recording stations.

The path to this input file could be specified in the job file using a new parameter station_data_file, eg.:

[station_data]
station_data_file = stationlist_all.csv

The csv file snippet below illustrates the information that would be contained within such a station data file:

STATION_ID	STATION_NAME	LONGITUDE	LATITUDE	STATION_TYPE	PGA_VALUE	PGA_LN_SIGMA	SA(0.3)_VALUE	SA(0.3)_LN_SIGMA	SA(1.0)_VALUE	SA(1.0)_LN_SIGMA
VIGA	LAS VIGAS	-99.23326	16.75870	seismic	0.3550	0.0	0.5262	0.0	0.1012	0.0
VNTA	LA VENTA	-99.81885	16.91426	seismic	0.2061	0.0	0.3415	0.0	0.1051	0.0
COYC	COYUCA	-100.08996	16.99778	seismic	0.1676	0.0	0.2643	0.0	0.0872	0.0
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
UTM_14Q_041_186	NA	-99.79820	16.86687	macroseismic	0.6512	0.8059	0.9535	1.0131	0.4794	1.0822
UTM_14Q_041_185	NA	-99.79761	16.77656	macroseismic	0.5797	0.8059	0.8766	1.0131	0.4577	1.0822
UTM_14Q_040_186	NA	-99.89182	16.86655	macroseismic	0.4770	0.8059	0.7220	1.0131	0.3223	1.0822
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮

Mandatory fields

STATION_ID: string; subject to the same validity checks as the id fields in other input files
LONGITUDE, LATITUDE; or LON, LAT: floats; valid longitude and latitude values
STATION_TYPE: string; currently the only two valid options are 'seismic' and 'macroseismic'
<IMT>_VALUE, <IMT>_LN_SIGMA / <IMT>_STDDEV: floats; for each IMT observed at the recording stations, two values should be provided – for IMTs that are assumed to be lognormally distributed (eg. PGV, PGA, SA), these would be the median and lognormal standard deviation using the column headers <IMT>_VALUE, <IMT>_LN_SIGMA respectively; and for other IMTs (eg., MMI), these would simply be the mean and standard deviation using the column headers <IMT>_VALUE, <IMT>_STDDEV respectively

Optional fields

STATION_NAME: string; free form and not subject to the same constraints as the STATION_ID field, the optional STATION_NAME field can contain information that aids in identifying a particular station
Other fields: could contain notes about the station, flags indicating outlier status for the values reported by the station, site information, etc., but these optional fields will not be read by the station_data_file parser

Site model file for the station sites

All of the site parameters required by the GMMs used in the conditioned scenario calculation will also need to be provided for the set of sites in the station_data_file. This could be accomplished in various ways –

A: Use the existing site model csv file parser to read the site model for the station locations at the same time as the site model file for the hazard / exposure sites, eg:

[site_params]
site_model_file = site_model.csv  site_model_stations.csv

The advantage in this case would be that the already existing site model parser can be used directly, without needing to manage two separate site models under the hood. The drawback is that the two site models will get merged and it could become difficult to separate the stations from the hazard/exposure sites in downstream parts of the calculation.

B: Read the site model for the station locations separately from the site model file for the hazard / exposure sites, eg:

[site_params]
site_model_file = site_model.csv
station_site_model_file = site_model_stations.csv

The advantage in this case would be that the site model for the stations can be kept separate from the the site model file for the hazard / exposure sites. The site model for the stations will be used only during the conditioning process, whereas the site model file for the hazard / exposure sites will be used for the ground motion simulation. The drawback is the requirement of adding a new input parameter station_site_model_file and separate management of two site model files, which might be a non-trivial task.

Ground motion models

Users can choose to specify one or more GSIMs for the conditioned scenario calculation using any of the following options:

A single GMM, eg. gsim = ChiouYoungs2014
A GSIM logic tree file, eg. gsim_logic_tree_file = gsim_logic_tree.xml
A weighted average GSIM, eg. gsim_logic_tree_file = gsim_weighted_avg.xml, where the file gsim_weighted_avg.xml can be constructed using the modifiable GMPE structure for AvgGMPE as shown in the example below:

<?xml version="1.0" encoding="UTF-8"?>
<nrml xmlns:gml="http://www.opengis.net/gml" 
      xmlns="http://openquake.org/xmlns/nrml/0.4">
  <logicTree logicTreeID='lt1'>
    <logicTreeBranchingLevel branchingLevelID="bl1">
      <logicTreeBranchSet 
        branchSetID="bs1" 
        uncertaintyType="gmpeModel" 
        applyToTectonicRegionType="Active Shallow Crust">
        <logicTreeBranch branchID="br1">
          <uncertaintyModel>
            [AvgGMPE]
              b1.AbrahamsonEtAl2014.weight=0.22
              b2.BooreEtAl2014.weight=0.22
              b3.CampbellBozorgnia2014.weight=0.22
              b4.ChiouYoungs2014.weight=0.22
              b5.Idriss2014.weight=0.12
          </uncertaintyModel>
          <uncertaintyWeight>
              1.0
          </uncertaintyWeight>
        </logicTreeBranch>
      </logicTreeBranchSet>
    </logicTreeBranchingLevel>
  </logicTree>
</nrml>

For a regular GSIM logic tree, the conditioning steps will be undertaken for each of the GSIMs in the logic tree file separately, and GMFs will be generated for each of the GSIMs separately as well. This can be useful when the modeller is evaluating or comparing the predictions from different GSIMs for a scenario.

If a weighted average GSIM is provided, the conditioning and GMF simulation will occur only for the blended GSIM. This can be useful when the modeller is evaluating or trying to calibrate the performance of the set of GSIMs for the tectonic region as a whole against one or more scenarios, perhaps looking at the GSIM logic tree from a probabilistic hazard model for the region.

Note: Each of the GSIMs specified for a conditioned GMF calculation must provide the within-event and between-event standard deviations separately. If a GSIM of interest provides only the total standard deviation, a (non-ideal) workaround might be for the user to specify the ratio between the within-event and between-event standard deviations (example), which the engine will use to add the between and within standard deviations to the GSIM.

Calculation steps

The broad calculation steps are as follows, with more detailed descriptions of each step provided subsequently:

Read the usual scenario/scenario_damage/scenario_risk inputs
If the station_data_file is present in the job file,
1. Read the station data input file
2. Trigger the ConditionedGmfComputer instead of the regular GmfComputer
3. Calculate and store the conditioned between-event residual mean at every station site
4. Calculate and store the nominal event bias
5. Calculate the conditioned mean and covariance of the ground motion at the target sites
6. Simulate and store the requested number of ground motion fields at the target sites
Proceed to damage / loss / consequence calculations if desired

Reading the station data input file

This would require adding a function get_station_data(..) to commonlib/readinput.py, to extract the following three pieces of information from the station data csv file:

station_data: a dataframe containing the mean and standard deviation values, in lognormal space or otherwise, for the set of valid IMTs in the csv file
station_sites: a dataframe containing {station_id, lon, lat} for the list of stations in the csv file
imts: the set of valid intensity measure types for which intensity observations are available in the csv file

The ConditionedGmfComputer

The conditoned GMF calculator should essentially follow the conditioning steps outlined in Appendix B of Engler et al. (2022)⁴, see attached pdf of Appendix B for the equations. In short, the main steps are for each target IMT:

From station_data, get the mean observations (recorded values at the stations, "yD") and any additional sigma ("var_addon_D") for the observations that are uncertain, which might arise if the values for this particular IMT were not directly recorded, but obtained by conversion equations or cross-correlation functions
Compute the predicted mean intensity ("mu_yD") and uncertainty components ("phi_D" and "tau_D") at the observation points, from the specified GMM(s)
Compute the raw residuals at the observation points, by subtracting the predicted intensity from the observed intensity (zeta_D = yD - mu_yD)
Compute the station data within-event covariance matrix ("cov_WD_WD"), and add on the additional variance of the residuals for the cases where the station data is uncertain
Compute the (pseudo)-inverse of the station data within-event covariance matrix ("cov_WD_WD_inv")
Compute the cross-intensity measure correlations for the observed intensity measures ("corr_HD_HD")
Using Bayes rule, compute the posterior distribution (i.e., mean "mu_HD_yD" and covariance "cov_HD_HD_yD") of the normalized between-event residual, employing Engler et al. (2022), eqns B8 and B9 (also eqns B18 and B19)
Compute the distribution of the conditional between-event residual ("mu_BD_yD" and "cov_BD_BD_yD")
Compute the nominal bias and its variance as the means of the conditional between-event residual mean and covariance, display this information in the calculation log, and also store it as one of the outputs of the calculation
From the GMMs, compute the predicted mean ("mu_Y") and stddevs ("phi_Y" and "tau_Y") of the intensity at the target sites
Compute the mean of the conditional between-event residual for the target sites ("mu_BY_yD"), eqn. B18 and B19a
Compute the within-event covariance matrices for the target sites and observation sites ("cov_WY_WD" and "cov_WD_WY")
Compute the within-event covariance matrix for the target sites (apriori, "cov_WY_WY")
Compute the regression coefficient matrix ("RC" = cov_WY_WD × cov_WD_WD_inv)
Compute the scaling matrix "C" for the conditioned between-event covariance matrix, eqn. B22
Compute the conditioned within-event covariance matrix for the target sites ("cov_WY_WY_wD"), eqn. B21
Compute the "conditioned between-event" covariance matrix for the target sites ("cov_BY_BY_yD"), last term of eqn. B17
Compute the conditioned mean of the ground motion at the target sites ("mu_Y_yD"), eqn. B16
Compute the conditional covariance of the ground motion at the target sites ("cov_Y_Y_yD"), eqn. B17
Use the conditioned mean vector (mu_Y_yD) and standard deviation matrices (cov_WY_WY_wD and cov_BY_BY_yD) computed in the preceding two steps to simulate the requested number of ground motion fields at the target sites

Outputs

Nominal event bias (from step 9 above): one bias value for each IMT, for every GSIM used in the calculation
Conditioned between-event residual mean (from step 8 above): one bias value for each station site, for each IMT, for every GSIM used in the calculation (useful if the between-event uncertainties are heteroscedastic, depending upon either Vs30, the median estimated ground motions, or both)
Simulated ground motion fields
All other existing outputs of the scenario calculator

Verification tests

Since the implementation is expected to closely follow Engler et al. (2022), the results should match for the set of eleven simplified one-dimensional verification tests devised by Worden et al. (2020) to check the USGS ShakeMap implementation of the Engler et al. (2022) equations.

Future improvements

The conditioning process requires the specification of a spatial correlation model of the within-event residuals between two points for the intensity measures involved in the calculation, a model for the cross-measure correlation of the within-event residuals, and a model for the cross-measure correlation of the between-event residuals. Assuming a conditional independence of the cross-measure correlation and the spatial correlation of a given intensity measure, the spatial cross-correlation of two different IMs at two different sites can be obtained as the product of the cross-correlation of two IMs at the same location and the spatial correlation due to the distance between sites. Given the limited set of correlation models available in the engine currently, the choice of the three aforementioned correlation models could be hardcoded in the initial implementation of the conditioned GMF calculator, using JB2009CorrelationModel as the spatial correlation model of the within-event residuals, BakerJayaram2008 as the cross-measure correlation model for the within-event residuals, and GodaAtkinson2009 as the cross-measure correlation model for the between-event residuals.

Ideally, the choice of the different correlation models should be exposed to the user through parameters in the job file. Direct spatial cross-correlation models for the within-event residuals⁹¹⁰ could also be used instead of separate models for the spatial correlation and cross-measure correlation. Supporting these choices would entail a substantial refactoring of the existing correlation.py and cross_correlation.py modules of the engine, and is thus left for a future issue.

References

Worden, C. B., Thompson, E. M., Hearne, M. G., & Wald, D. J. (2020). ShakeMap Manual Online: technical manual, user’s guide, and software guide, U. S. Geological Survey. URL: http://usgs.github.io/shakemap/. DOI: https://doi.org/10.5066/F7D21VPQ. ↩
Wald, D. J., Worden, C. B., Thompson, E. M., & Hearne, M. G. (2022). ShakeMap operations, policies, and procedures. Earthquake Spectra, 38(1), 756–777. DOI: https://doi.org/10.1177/87552930211030298 ↩
Silva, V., & Horspool, N. (2019). Combining USGS ShakeMaps and the OpenQuake-engine for damage and loss assessment. Earthquake Engineering and Structural Dynamics, 48(6), 634–652. DOI: https://doi.org/10.1002/eqe.3154 ↩
Engler, D. T., Worden, C. B., Thompson, E. M., & Jaiswal, K. S. (2022). Partitioning Ground Motion Uncertainty When Conditioned on Station Data. Bulletin of the Seismological Society of America, 112(2), 1060–1079. DOI: https://doi.org/10.1785/0120210177 ↩ ↩² ↩³ ↩⁴
Riga, E., Karatzetzou, A., Apostolaki, S., Crowley, H., & Pitilakis, K. D. (2021). Verification of seismic risk models using observed damages from past earthquake events. Bulletin of Earthquake Engineering (Vol. 19). DOI: https://doi.org/10.1007/s10518-020-01017-5 ↩
Crowley, H., Silva, V., Kalakonas, P., Martins, L., Weatherill, G. A., Pitilakis, K. D., Riga, E., Borzi, B., & Faravelli, M. (2020). Verification of the European Seismic Risk Model (ESRM20). In 17th World Conference on Earthquake Engineering. Sendai, Japan. ↩
de Pro‐Díaz, Y., Vilanova, S., & Canora, C. (2022). Ranking Earthquake Sources Using Spatial Residuals of Seismic Scenarios: Methodology Application to the 1909 Benavente Earthquake. Bulletin of the Seismological Society of America. DOI: https://doi.org/10.1785/0120220067 ↩
The GMPE Strong Motion Modeller's Toolkit (gmpe-smtk) includes an example of conditional simulation of ground motion fields using hazardlib in conditional_simulation.py, though the current issue envisages a much tighter coupling with the engine's scenario calculator architecture, using the formulation from Appendix B of Engler et al. (2022)⁴ to compute the conditioned mean and within-event and between-event residuals, and permitting users to input station data directly. ↩
Loth, C., & Baker, J. W. (2013). A spatial cross‐correlation model of spectral accelerations at multiple periods. Earthquake Engineering & Structural Dynamics, 42(3), 397–417. DOI: https://doi.org/10.1002/eqe2212 ↩
Du, W., & Ning, C. L. (2021). Modeling spatial cross-correlation of multiple ground motion intensity measures (SAs, PGA, PGV, Ia, CAV, and significant durations) based on principal component and geostatistical analyses. Earthquake Spectra, 37(1), 486–504. DOI: https://doi.org/10.1177/8755293020952442 ↩

The text was updated successfully, but these errors were encountered:

See issue #8317.

raoanirudh added the proposal label Jan 3, 2023

raoanirudh assigned VSilva, raoanirudh, micheles and CatalinaYepes Jan 3, 2023

raoanirudh mentioned this issue Jan 3, 2023

Conditioned ground motion calculator #8318

Merged

raoanirudh linked a pull request Jan 3, 2023 that will close this issue

Conditioned ground motion calculator #8318

Merged

micheles added this to the Engine 3.16.0 milestone Jan 10, 2023

micheles closed this as completed in #8318 Jan 18, 2023

raoanirudh mentioned this issue Jan 27, 2023

Address pending review comments from #8318 #8369

Merged

CatalinaYepes added a commit that referenced this issue Feb 9, 2023

Add documentation for the conditioning of gmfs in scenario calculations.

bb2e63e

See issue #8317.

CatalinaYepes mentioned this issue Feb 9, 2023

Add documentation for the conditioning of gmfs in scenario calculations #8399

Merged

raoanirudh mentioned this issue Mar 17, 2023

Verification test suite for the conditioned GMF calculator #8542

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulating ground motion fields conditioned on station data #8317

Simulating ground motion fields conditioned on station data #8317

raoanirudh commented Jan 3, 2023

Simulating ground motion fields conditioned on station data #8317

Simulating ground motion fields conditioned on station data #8317

Comments

raoanirudh commented Jan 3, 2023

Simulating ground motion fields conditioned on station data

Rationale

New inputs

Station data csv file

Mandatory fields

Optional fields

Site model file for the station sites

Ground motion models

Calculation steps

Reading the station data input file

The ConditionedGmfComputer

Outputs

Verification tests

Future improvements

References

Footnotes