2021 BU EE585 team project: EFI/NEON terrestrial carbon challenge
Nia Bartolucci [email protected]
Cameron Reimer [email protected]
Kangjoon Cho [email protected]
Zhenpeng Zuo [email protected]
For any current date, the R script named "Data_download.R" is used to pull NEON measurements (NEE, LE, and soil moisture) and NOAA weather forecasts (NOAA’s Global Ensemble Forecasting System, GEFS) across the four NEON sites, and to plot time series for the NEON history and the NOAA projections.
Before running, the variable "base_dir" at Data_download.R that defines the working directory, where the data are temporarily stored and output graphs saved, needs to be set manually. Additionally, the user should mannually create the directories on the local machine: the working directory, as well as "data", "graph", "drives" under the working directory. To schedule running the code on a daily basis, copy the following cron table in the Terminal and hit enter (cron
is required, supported only on Unix-based operating systems):
# [On terminal] crontab -e > i > Insert below code
# setup the terrestral data script to run at 5:00 AM
MAILTO="[email protected];[email protected];[email protected];[email protected]"
00 05 * * * /usr/local/bin/Rscript/ "PATH/Data_download.R"
Of the data being pulled, the NEON measurements are updated monthly, with each update releasing new daily data for the past month. Therefore, for the daily runs, the plotted NEON historical time series will include data only up to the latest NEON release. For the NOAA weather forecasts, 35-day ensemble projections, making up of 31 ensembles, or forecasts by separate models, are released once per six hours at a 1-hour forecasting resolution.
The time series plots will be exported to the "graph" sub-directory under the main directory.
Before generating forecasts, use scripts named "XXX" to fit the historical data. For this historical fit of NEE, LE, and soil moisture, we created a joint, state-space, dynamic linear model which include data models and process models. The data models are inspired from simple Gaussian distributions,
where , , and are the targets for our forecasting, represents time, and 's (given by normal distributions, see below) represent the uncertainties during observation and/or data collection. The subscript represents the observed value of the variables.
The process model includes shortwave radiance, longwave radiance, air temperature, and precipitation as covariates. It also makes NEE, LE, and soil moisture intercorrelated.
where 's are means of the normal distributions and 's define uncertainties, with the subscript indicating the model's iteration over time . For deriving each , 's are coefficients for the terms, including the last step of the variable and the other variables, intercepts , and the corresponding covariates . Incoming shortwave radiation (, ) and temperature (, ) are selected as covariates for NEE, and incoming longwave radiation (, ) for LE, and precipitation (, ) for SM.
Priors used for the data models and the process model are
The model was run with JAGS (Just Another Gibbs Sampler), a statistical software package designed to do Bayesian analyses using Markov Chain Monte Carlo (MCMC) numerical simulation methods, for 20,000 iterations with three chains. The burn-in period is determined to be the first 500 steps of iteration, and is removed in subsequent analyses.