-
Notifications
You must be signed in to change notification settings - Fork 38
/
Exercise_14_Spatial_Models.Rmd
115 lines (80 loc) · 10.7 KB
/
Exercise_14_Spatial_Models.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
title: "Lab 14 - Spatial Models"
author: "GE 509"
output: html_document
---
The objective of this week's lab is to familiarize yourself with the tools and techniques to do exploratory analyses for both time series and spatial data sets. These analyses are useful to gain a better understanding of the data and how it is structured in order to help with the design of more advanced models and should not be done in place of more careful modeling. We'll start with a spatial case study, since it is likely to generate the most questions, and then move on to a time series example.
##Case Study: Ozone Concentrations
For the spatial component of this week's lab we will be looking at concentrations of ozone across the US based on the EPA's Air Quality System (AQS), which is a network of towers measuring ambient air quality (http://www.epa.gov/ttn/airs/airsaqs/). Data is collected on a number of pollutants and today we will focus on ozone (O3) because it is strongly related to not just human health but also ecosystem health—elevated ozone has significant impacts on crop productivity as well as the growth and survival of native flora and fauna. Specifically we will be looking at the annual means of hourly ozone concentration data for 2008. Units are in terms of parts per billion (ppb) and for reference the current health standard is 75 ppb. The file Ozone.RData contains a data frame dat that records the annual mean, standard deviation, and median ozone at 1225 stations along with each station's ID number and its latitude/longitude coordinates. The goal of today's analysis is to perform some basic exploratory analyses on the data—fit a trend surface, plot and fit the correlogram and variogram, and produce a kriged map of national ozone levels as well as the uncertainty associated with that map.
Lets start by loading the data and making a simple map of the observations. You will probably need to download one or both of the libraries indicated.
```{r}
library(spatial) ## spatial stats library
library(maps) ## map of US
load("data/Ozone.RData") ## data
plot(dat$Lon,dat$Lat,xlim=c(-125,-65),ylim=c(25,50),asp=70/50,pch="+",col=dat$Mean/10)
map("state",add=TRUE) ## add the states and then a legend
legend("bottomright",legend=2:8*10,col=2:8,pch="+",cex=1.25)
```
In the above most commands should be familiar or obvious. The one new bit worth mentioning is the asp variable set in plot. This locks the y/x aspect ratio so that the US doesn't get stretched/squished when you resize the window. 70/50 is the approximate ratio in km for 1 degree lat/lon within the study region (exact ratio varies with latitude).
Recall that spatial analysis, like time series analysis, assumes second order stationarity, so next we'll want to calculate a trend surface for the data to see to what extent we can detrend the data.
```{r}
surf0 <- surf.ls(0,dat$Lon,dat$Lat,dat$Mean) ## 0-degree polynomial surface
summary(surf0) ## ANOVA stats & AIC
tr0 <- trmat(surf0,-125,-65,25,50,50) ## project a 50x50 matrix within region
image(tr0,asp=70/50) ## make an color image map of the surface
map("state",add=TRUE)
points(dat$Lon,dat$Lat,pch="+",col=dat$Mean/10)
```
The **surf.ls** command fits a 2D polynomial trend to the data with the degree of the polynomial specified in the first term. The output of surf.ls provides two things that we care about, the parameters for the polynomial trend and the detrended residuals that we'll use to assess spatial autocorrelation.
The **trmat** (trend matrix) command takes the output of the surf.ls command and creates a list, here called tr0, which contains the x- and y-coordinates of the projection grid, and a matrix z that is the smoothed ozone value in each of the grid cells. Within this block of code remember from lecture that the 2nd through 5th terms in the trmat command define the boundary rectangle in terms of the latitude and longitude. The last argument, 50, gives the size of the matrix, meaning that the boundary rectangle of our map will be divided into a 50x50 grid.
### Lab Report Task 1
1. Calculate the trend surface for higher order polynomials (at least 1st and 2nd) and include the plots in the lab report. Do higher order polynomials capture large scale trends or are all the trends finer scale? **What order polynomial would you choose to detrend your data?**
Now that we've calculated a trend surface let’s calculate the correlogram and variogram in order to estimate the scale of spatial autocorrelation. *In the code below, make sure to substitute **surf0** with the polynomial surface you chose from Task 1*:
```{r}
vg <- variogram(surf0,300,xlim=c(0,10),ylim=c(0,1.1*var(dat$Mean)))
abline(h=var(dat$Mean)) ## asymptotic variance
cg<- correlogram(surf0,300,xlim=c(0,10)) ## limit the range of radii to the initial decay
```
The variogram and correlogram both take the object output from surf.ls as their first argument, from which they are able to access the detrended residuals. The second argument in both functions is the number of bins that you want to include in calculating the variogram or correlogram. Be aware that a larger number of bins is not always better because the code will drop bins that have too few observations (<6). The third argument is just restricting our plots to the shorter distance lags where there is significant autocorrelation. The units in x are degrees so here “short” distance lags are still 100s of miles. The units in the y-axis for the variogram are in variance and the abline command provides the overall variance of the data which the variogram should asymptote to at large lags. The units on the y-axis of the correlogram are in terms of correlation (ranges from -1 to 1). Both the variogram and correlogram functions provide a plot by default. We are additionally assigning the output of these functions to a variable name so that we can access the data directly.
In order to interpolate the ozone values using kriging we need to fit a parametric model for the autocorrelation. Let's derive a maximum likelihood estimator for the exponential correlation function expcov. expcov is a built-in R function and you can see the details of the computation by just typing expcov at the command line. In the following code we'll fit the expcov function to the x and y values from the correlogram (cg$x and cg$y) assuming a Normal likelihood. Recall that x and y are not actually data but are the bin centers and correlation coefficients calculated from the binned data. Thus while the graphs may help suggest what functional form is best, we ultimately would want to fit the full spatial model directly to the data (which is beyond today's lab).
```{r}
rng <- 1:10 ## define the distance bins we want to use for the fit
expfit <- function(theta){ ## negative log likelihood function
d = theta[1]
alpha = theta[2]
se = theta[3]
sigma = theta[4]
-sum(dnorm(cg$y[rng],expcov(cg$x[rng],d,alpha,se),sigma,log=TRUE))
}
efit <- optim(c(2,0.1,1,0.01),expfit) ## numerical optimization
efit
```
In the above example we restricted the range of the values from the correlogram to just the first 10 bins. This was done because we are really just interested in the initial decline phase and the number of bins would change depending on how long it takes the correlogram to converge to zero.
Next, let’s plot the correlation functions over the correlogram to visually inspect the fits. You should extend this code to include any additional models you fit.
```{r}
par(lwd=2,cex=1.1)
cg <- correlogram(surf0,300,xlim=c(0,10))
lines(cg$x,expcov(cg$x,efit$par[1],efit$par[2],efit$par[3]),lwd=3)
#lines(cg$x,gaucov(cg$x,gfit$par[1],gfit$par[2],gfit$par[3]),col=2,lwd=3)
#lines(cg$x,sphercov(cg$x,sfit$par[1],sfit$par[2],sfit$par[3],sfit$par[4]),col=3,lwd=3)
legend("topright",legend=c("Exponential","Gaussian","Spherical"),col=1:3,lwd=2)
```
## Lab Report Task 2
2. **EXTRA CREDIT**: There are two other correlation functions built into the spatial package, the Gaussian covariance *gaucov* and the spherical covariance *sphercov*. Derive MLE for these two functions via numerical optimization. Turn in an AIC table for the models you fit and identify the best fitting model that you will use for Kriging
3. Explain the figure generated from the previous box of code. What is being plotted (axes, data, model)? How would you interpret this figure? How would you interpret the parameters of the best-fit model? What is the range? Note: you do not need to include the Gaussian and spherical lines if you did not fit them in the previous step.
Now that we have a covariance function we are ready to proceed with Kriging. The code below is based on the expcov covariance function and a 0th order polynomial, but **you should use the polynomial you selected previously (and the optimal covariance function, if you completed the extra credit task)**.
```{r}
KrExp <- surf.gls(0,expcov,dat$Lon,dat$Lat,dat$Mean,
d=efit$par[1],alpha=efit$par[2],se=efit$par[3]) ##construct surface
pr3 <- prmat(KrExp,-125,-65,25,50,100) ## predict to a matrix
image(pr3,main="Krige",cex.main=2,asp=70/50) ## image of predictions
contour(pr3,add=TRUE,lwd=1) ## conv to ppb ## contours of prediction
map("state",add=TRUE)
points(dat$Lon,dat$Lat,pch="+",col=dat$Mean/10,cex=0.5)
se3 <- semat(KrExp,-125,-65,25,50,150) ## error matrix
```
The *surf.gls* function is similar to the *surf.ls* function but it differs in that it takes the spatial autocorrelation into account. The arguments are the same as for *surf.ls* with the addition that we specify the name of the autocorrelation function in the second slot and the parameters to that function in the last few slots. The *prmat* function is also analogous to the *trmat* function but instead of just providing the trend, it provides a matrix that includes both the trend and the autocorrelation. In other words *prmat* is the function that actually does the kriging. The function *semat* is also very similar to *prmat* and *trmat* except that while *prmat* returns the mean of the prediction semat returns the standard error of the predictions.
### Lab Report Task 3
4. Include your kriged map of mean US ozone for 2008 in your lab report. Where in the country has the highest ozone levels?
5. Generate a second map of mean US ozone but this time include contours based on the error matrix rather than the prediction matrix. Where are ozone levels most uncertain?
6. There are a few reasons that one might be skeptical of this initial exploratory map. Name at least two and describe ways the model could be improved (without collecting additional data).