-
-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete rewrite of the Ecology chapter #771
Merged
Merged
Changes from all commits
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
aa1a728
start to write mlr3 code
d313e4b
add 2 remarks
6e7907d
add svm autotuner and mlr3 advantages
jannes-m 5aa0871
rewrite case study section
jannes-m 2dbdc49
update pat-s citation
jannes-m 8f66590
add mlr3 book reference
jannes-m 20f9d57
evaluate spDataLarge data code chunk
jannes-m 9f82d04
update adv-r ref
jannes-m 5d223b6
create a mlr3 taskclassifst
jannes-m a5533b7
delete superfluous following
jannes-m 29e4b93
add to-dos
jannes-m e30f2f0
add missing comma
jannes-m a7edbfa
create log_reg learner and run resampling
jannes-m c74ef1f
start writing the tuning section
jannes-m 019fd19
use lsl study area
jannes-m dcd952b
replace all instances of study_area by study_mask
jannes-m 665565b
add future parallelization
jannes-m a83cf3b
replace mlr by mlr3 output
jannes-m 36bf6c2
use benchmark(_grid) for comparing sp with conventional cv, rewrite p…
jannes-m 162cb19
replace old boxplot, rewrite parallel stuff
jannes-m ee9a95b
update contents
jannes-m fb5cc34
rewrite svm stuff using mlr3
jannes-m aab55e3
delete old readRDS commands
jannes-m d7a633d
run glm and svm sp and nsp cv using benchmark
jannes-m f2ce68d
use only one script for glm and svm cvs
jannes-m c5f2c7e
update svm resampling and exercises
jannes-m 845bd3a
mlr3 code for predicting floristic composition
jannes-m 8a68e8d
update 14-rf_mlr3.R
jannes-m d245799
harmonize with main
jannes-m 3b57594
rename code and output files
jannes-m 9eac3cd
add exercise file
jannes-m f289ff4
rename code chunks and add new exercise style
jannes-m d83926f
use updated spDataLarge data and replace raster by terra
f9fb429
add encapsulate explanation
e94c8a8
update figures
6d61dde
use terra, update figures, use bm_agg instead of agg in boxplot
db23472
Merge branch 'main' into rewrite_cv
1eccbee
update and add references
jannes-m af6072b
sync with main
jannes-m f2d8de9
delete unnecessary ta_2
jannes-m f50d09c
rename 14-rf_mlr3.R to 15-rf_mlr3.R
jannes-m 14c8d77
use terra instead of raster
jannes-m 04fae2d
rename intermediat results
jannes-m e0cd03b
read in geotiffs with terra
jannes-m 58973d2
update 12-cv.R
75786a3
update extdata/.gitignore
718bedd
harmonize
cf7d55c
Merge branch 'main' into rewrite_cv
jannes-m 18c11a6
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m 9978537
update eco text until the start of the dimensionality reduction
jannes-m fd71259
only rename two layers and add a comment
jannes-m e42fe33
save AutoTuner output
jannes-m 22560b7
rewrite the modeling stuff
jannes-m 6fc35f8
rewrite predictive mapping part
jannes-m 852792c
add spatial prediction
jannes-m 57903c5
updating figures of c15
jannes-m c2ea0c8
update code chunk name
jannes-m e3c0ca3
start working on exercises for c15
jannes-m dcd8efc
complete exercices for c15
jannes-m e12dfbc
polish
jannes-m 2c2f788
add missing libraries to ex0
jannes-m ef941e4
delete unnecessary and outdated files
jannes-m 7256a47
add structure to 15-rf_mlr3.R
jannes-m 200b9ee
replace mlr by mlr3
jannes-m 7b3e236
add reference to benchmark() and benchmark_grid()
jannes-m bba8caa
add benchmark and benchmark_grid() hint to _12_exercises
jannes-m b174dca
replace raster by terra again
jannes-m ce69f05
use system.file to attach ta
jannes-m 564f4c3
replace raster by terra in _12-ex.Rmd
jannes-m 60c0596
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m 062c439
replace raster by terra and use new dplyr syntax
jannes-m 20e32dd
use :: style to make clear which packages are used when
jannes-m 127273a
fix typo
jannes-m 13b7b65
delete old, no longer necessary sentence from comments
jannes-m dd58c32
delete merge conflict HEAD
jannes-m 60d14f8
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m 2e6c789
further polish 12-spatial-cv.Rmd
jannes-m b305dc2
delete no longer required mlr3 file
jannes-m 57358be
check if not storing the models decreases substantially runtime and f…
jannes-m 5f691a8
use only score output
jannes-m 92c3492
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,230 @@ | ||
The solutions assume the following packages are attached (other packages will be attached when needed): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
```{r 12-ex-e0, message=FALSE, warning=FALSE} | ||
library(dplyr) | ||
# library(kernlab) | ||
library(mlr3) | ||
library(mlr3learners) | ||
library(mlr3extralearners) | ||
library(mlr3spatiotempcv) | ||
library(mlr3tuning) | ||
library(qgisprocess) | ||
library(raster) | ||
# library(rlang) | ||
library(sf) | ||
library(tmap) | ||
``` | ||
|
||
E1. Compute the following terrain attributes from the `elev` dataset loaded with `terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))$elev` with the help of R-GIS bridges (see this [Chapter](https://geocompr.robinlovelace.net/gis.html#gis)): | ||
- Slope | ||
- Plan curvature | ||
- Profile curvature | ||
- Catchment area | ||
|
||
```{r, eval=FALSE} | ||
# attach data | ||
dem = terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))$elev | ||
|
||
algs = qgisprocess::qgis_algorithms() | ||
dplyr::filter(algs, grepl("curvature", algorithm)) | ||
alg = "saga:slopeaspectcurvature" | ||
qgisprocess::qgis_show_help(alg) | ||
qgisprocess::qgis_arguments(alg) | ||
# terrain attributes (ta) | ||
out_nms = paste0(tempdir(), "/", c("slope", "cplan", "cprof"), | ||
".sdat") | ||
args = rlang::set_names(out_nms, c("SLOPE", "C_PLAN", "C_PROF")) | ||
out = qgis_run_algorithm(alg, ELEVATION = dem, METHOD = 6, | ||
UNIT_SLOPE = "[1] degree", | ||
!!!args, | ||
.quiet = TRUE | ||
) | ||
ta = out[names(args)] |> unlist() |> terra::rast() | ||
names(ta) = c("slope", "cplan", "cprof") | ||
# catchment area | ||
dplyr::filter(algs, grepl("[Cc]atchment", algorithm)) | ||
alg = "saga:catchmentarea" | ||
qgis_show_help(alg) | ||
qgis_arguments(alg) | ||
carea = qgis_run_algorithm(alg, | ||
ELEVATION = dem, | ||
METHOD = 4, | ||
FLOW = file.path(tempdir(), "carea.sdat")) | ||
# transform carea | ||
carea = terra::rast(carea$FLOW[1]) | ||
log10_carea = log10(carea) | ||
names(log10_carea) = "log10_carea" | ||
# add log_carea and dem to the terrain attributes | ||
ta = c(ta, dem, log10_carea) | ||
``` | ||
|
||
E2. Extract the values from the corresponding output rasters to the `lsl` data frame (`data("lsl", package = "spDataLarge"`) by adding new variables called `slope`, `cplan`, `cprof`, `elev` and `log_carea` (see this [section](https://geocompr.robinlovelace.net/spatial-cv.html#case-landslide) for details). | ||
|
||
```{r, eval=FALSE} | ||
# attach terrain attribute raster stack (in case you have skipped the previous | ||
# exercise) | ||
data("lsl", package = "spDataLarge") | ||
ta = terra::rast(system.file("raster/ta.tif", package = "spDataLarge")) | ||
lsl = dplyr::select(lsl, x, y, lslpts) | ||
# extract values to points, i.e., create predictors | ||
lsl[, names(ta)] = terra::extract(ta, lsl[, c("x", "y")]) |> | ||
dplyr::select(-ID) | ||
``` | ||
|
||
E3. Use the derived terrain attribute rasters in combination with a GLM to make a spatial prediction map similar to that shown in this [Figure](https://geocompr.robinlovelace.net/spatial-cv.html#fig:lsl-susc). | ||
Running `data("study_mask", package = "spDataLarge")` attaches a mask of the study area. | ||
|
||
```{r, eval=FALSE} | ||
# attach data (in case you have skipped exercises 1) and 2) | ||
# landslide points with terrain attributes and terrain attribute raster stack | ||
data("lsl", "study_mask", package = "spDataLarge") | ||
ta = terra::rast(system.file("raster/ta.tif", package = "spDataLarge")) | ||
|
||
# fit the model | ||
fit = glm(lslpts ~ slope + cplan + cprof + elev + log10_carea, | ||
data = lsl, family = binomial()) | ||
|
||
# make the prediction | ||
pred = terra::predict(object = ta, model = fit, type = "response") | ||
|
||
# make the map | ||
lsl_sf = sf::st_as_sf(lsl, coords = c("x", "y"), crs = 32717) | ||
study_mask = terra::vect(study_mask) | ||
lsl_sf = sf::st_as_sf(lsl, coords = c("x", "y"), crs = 32717) | ||
hs = terra::shade(ta$slope * pi / 180, | ||
terra::terrain(ta$elev, v = "aspect", unit = "radians")) | ||
rect = tmaptools::bb_poly(raster::raster(hs)) | ||
bbx = tmaptools::bb(raster::raster(hs), xlim = c(-0.00001, 1), | ||
ylim = c(-0.00001, 1), relative = TRUE) | ||
# white raster to only plot the axis ticks, otherwise gridlines would be visible | ||
tm_shape(hs, bbox = bbx) + | ||
tm_grid(col = "black", n.x = 1, n.y = 1, labels.inside.frame = FALSE, | ||
labels.rot = c(0, 90)) + | ||
tm_raster(palette = "white", legend.show = FALSE) + | ||
# hillshade | ||
tm_shape(terra::mask(hs, study_mask), bbox = bbx) + | ||
tm_raster(palette = gray(0:100 / 100), n = 100, | ||
legend.show = FALSE) + | ||
# prediction raster | ||
tm_shape(terra::mask(pred, study_mask)) + | ||
tm_raster(alpha = 0.5, palette = "Reds", n = 6, legend.show = TRUE, | ||
title = "Susceptibility") + | ||
# rectangle and outer margins | ||
qtm(rect, fill = NULL) + | ||
tm_layout(outer.margins = c(0.04, 0.04, 0.02, 0.02), frame = FALSE, | ||
legend.position = c("left", "bottom"), | ||
legend.title.size = 0.9) | ||
``` | ||
|
||
E4. Compute a 100-repeated 5-fold non-spatial cross-validation and spatial CV based on the GLM learner and compare the AUROC values from both resampling strategies with the help of boxplots (see this [Figure](https://geocompr.robinlovelace.net/spatial-cv.html#fig:boxplot-cv). | ||
Hint: You need to specify a non-spatial resampling strategy. | ||
Another hint: You might want to Excercises 4 to 6 in one go with the help of `mlr3::benchmark()` and `mlr3::benchmark_grid()` (for more information, please refer to https://mlr3book.mlr-org.com/perf-eval-cmp.html#benchmarking). | ||
When doing so, keep in mind that the computation can take very long, probably several days. | ||
This, of course, depends on your system. | ||
Computation time will be shorter the more RAM and cores you have at your disposal. | ||
|
||
```{r, eval=FALSE} | ||
# attach data (in case you have skipped exercises 1) and 2) | ||
data("lsl", package = "spDataLarge") # landslide points with terrain attributes | ||
|
||
# create task | ||
task = TaskClassifST$new( | ||
id = "lsl_ecuador", | ||
backend = mlr3::as_data_backend(lsl), target = "lslpts", positive = "TRUE", | ||
extra_args = list( | ||
coordinate_names = c("x", "y"), | ||
coords_as_features = FALSE, | ||
crs = 32717) | ||
) | ||
|
||
# construct learners (for all subsequent exercises) | ||
# GLM | ||
lrn_glm = lrn("classif.log_reg", predict_type = "prob") | ||
|
||
# SVM | ||
# construct SVM learner (using ksvm function from the kernlab package) | ||
lrn_ksvm = lrn("classif.ksvm", predict_type = "prob", kernel = "rbfdot", | ||
type = "C-svc") | ||
# specify nested resampling and adjust learner accordingly | ||
# five spatially disjoint partitions | ||
tune_level = rsmp("spcv_coords", folds = 5) | ||
# use 50 randomly selected hyperparameters | ||
terminator = trm("evals", n_evals = 50) | ||
tuner = tnr("random_search") | ||
# define the outer limits of the randomly selected hyperparameters | ||
ps = ps( | ||
C = p_dbl(lower = -12, upper = 15, trafo = function(x) 2^x), | ||
sigma = p_dbl(lower = -15, upper = 6, trafo = function(x) 2^x) | ||
) | ||
at_ksvm = AutoTuner$new( | ||
learner = lrn_ksvm, | ||
resampling = tune_level, | ||
measure = msr("classif.auc"), | ||
search_space = ps, | ||
terminator = terminator, | ||
tuner = tuner | ||
) | ||
|
||
# QDA | ||
lrn_qda = lrn("classif.qda", predict_type = "prob") | ||
|
||
# SVM without tuning hyperparameters | ||
vals = lrn_ksvm$param_set$values | ||
lrn_ksvm_notune = lrn_ksvm$clone() | ||
lrn_ksvm_notune$param_set$values = c(vals, C = 1, sigma = 1) | ||
|
||
# define resampling strategies | ||
# specify the reampling method, i.e. spatial CV with 100 repetitions and 5 folds | ||
# -> in each repetition dataset will be splitted into five folds | ||
# method: repeated_spcv_coords -> spatial partioning | ||
rsmp_sp = rsmp("repeated_spcv_coords", folds = 5, repeats = 100) | ||
# method: repeated_cv -> non-spatial partitioning | ||
rsmp_nsp = rsmp("repeated_cv", folds = 5, repeats = 100) | ||
|
||
# (spatial) cross-validataion | ||
#**************************** | ||
# create your design | ||
grid = benchmark_grid(tasks = task, | ||
learners = list(lrn_glm, at_ksvm, lrn_qda, | ||
lrn_ksvm_notune), | ||
resamplings = list(rsmp_sp, rsmp_nsp)) | ||
# execute the cross-validation | ||
library(future) | ||
# execute the outer loop sequentially and parallelize the inner loop | ||
future::plan(list("sequential", "multisession"), | ||
workers = floor(availableCores() / 2)) | ||
set.seed(021522) | ||
bmr = benchmark(grid, store_backends = FALSE) | ||
# stop parallelization | ||
future:::ClusterRegistry("stop") | ||
# save your result, e.g. to | ||
# saveRDS(bmr, file = "extdata/12-bmr.rds") | ||
|
||
# plot your result | ||
autoplot(bmr, measure = msr("classif.auc")) | ||
``` | ||
|
||
E5. Model landslide susceptibility using a quadratic discriminant analysis (QDA). | ||
Assess the predictive performance of the QDA. | ||
What is the a difference between the spatially cross-validated mean AUROC value of the QDA and the GLM? | ||
|
||
```{r, eval=FALSE} | ||
# attach data (in case you have skipped exercise 4) | ||
bmr = readRDS("extdata/12-bmr.rds") | ||
|
||
# plot your result | ||
autoplot(bmr, measure = msr("classif.auc")) | ||
# QDA has higher AUROC values on average than GLM which indicates moderately | ||
# non-linear boundaries | ||
``` | ||
|
||
E6. Run the SVM without tuning the hyperparameters. | ||
Use the `rbfdot` kernel with $\sigma$ = 1 and *C* = 1. | ||
Leaving the hyperparameters unspecified in **kernlab**'s `ksvm()` would otherwise initialize an automatic non-spatial hyperparameter tuning. | ||
|
||
```{r, eval=FALSE} | ||
# attach data (in case you have skipped exercise 4) | ||
bmr = readRDS("extdata/12-bmr.rds") | ||
# plot your result | ||
autoplot(bmr, measure = msr("classif.auc")) | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍