Issue with `project()` when used with integrated model #106

cwaldock1 · 2024-03-18T07:43:43Z

I would like to use the project function with integrated SDMs, and found in my own code that I get the error:

Error: Model predictors are missing from the scenario predictor!

This error is also reproduced by adding a two types of data to the Biodiversity distribution model object. The model contains a new predictor when integrated that is not passed along to the scenario object, and then is not found in the project() call.

Below is a modified version of your tutorial that reproduces the issue, but perhaps I am also inputing something wrong in the calls?

Many thanks for any assistance,
Conor

# Load the packages
library(ibis.iSDM)
library(stars)
library(xgboost)
library(terra)
library(igraph)
library(ggplot2)
library(ncdf4)
library(assertthat)

# Don't print out as many messages
options("ibis.setupmessages" = FALSE)

# Background and biodiversity data
background <- terra::rast(system.file('extdata/europegrid_50km.tif', package='ibis.iSDM'))
virtual_points <- sf::st_read(system.file('extdata/input_data.gpkg', package='ibis.iSDM'), 'points', quiet = TRUE)

# In addition we will use the species data to generate a presence-absence dataset with pseudo-absence points.
# Here we first specify the settings to use:
ass <- pseudoabs_settings(background = background, nrpoints = 200,
                          method =  "random")
virtual_pseudoabs <- add_pseudoabsence(df = virtual_points, 
                                       field_occurrence = "Observed",
                                       settings = ass)

# Note we are loading different predictors than in previous examples
# These are in netcdf4 format, a format specific for storing spatial-temporal data including metadata.
ll <- list.files(system.file("extdata/predictors_presfuture/", package = "ibis.iSDM", mustWork = TRUE), "*.nc",full.names = TRUE)

# From those list of predictors are first loading the current ones as raster data
# We are loading only data from the very first, contemporary time step for model fitting
pred_current <- terra::rast()
for(i in ll) suppressWarnings( pred_current <- c(pred_current, terra::rast(i, lyrs = 1) ) )
names(pred_current) <- tools::file_path_sans_ext(basename(ll))

# Get future predictors
# These we will load in using the stars package and also ignoring the first time step
pred_future <- stars::read_stars(ll) |> stars:::slice.stars('Time', 2:86)
st_crs(pred_future) <- st_crs(4326) # Set projection
# Rename future predictors to those of current
names(pred_future) <- names(pred_current)

# Plot the test data
plot(pred_current['secdf'],
     col = colorRampPalette(c("grey20", "orange", "lightgreen", "green"))(10),
     main = "Share of secondary vegetation")



#### train model with the above data ----

# Train model adding the data loaded above
x <- distribution(background) |> 
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed', name = 'Virtual points') |> 

#### HERE IS THE ONLY CHANGE
  add_biodiversity_poipa(virtual_pseudoabs, field_occurrence = "Observed") |>

  # Note that we scale the predictors here
  add_predictors(pred_current, transform = 'scale',derivates = 'none') |> 
  engine_glmnet(alpha = 0) 

# Train the model
modf <- train(x, runname = 'Simple PPM', inference_only = T, verbose = FALSE)
modf$model
# Add a threshold to this model by getting 05 percentile of values
# modf <- threshold(modf, method = 'percentile', value = 0.05)

# Now lets create a scenarios object via scenarios
sc <- scenario(modf) |> 
  # Apply the same variable transformations as above. 
  add_predictors(pred_future, transform = 'scale') |>
  project() #### ERROR OCCURS HERE

The text was updated successfully, but these errors were encountered:

cwaldock1 · 2024-03-18T08:04:06Z

I also found the same error to occur when using quadratic terms in specified in:
add_predictors(pred_current, transform = 'scale',derivates = 'quad')

Martin-Jung · 2024-03-18T09:30:06Z

Hi,
you are using the engine_glmnet() which does not support joint integration or similar. Instead the default, as set by the parameter method_integration in train(), is to sequentially add previous predictions (e.g. from the first to second last dataset) to the next prediction as covariate. See the help file for other options of integration.

However, when making a future prediction you are only providing a set of predetermined variables (pred_future) which does not have the prediction made before. See summary(modf) for the variable names.
Any projection needs to have the same variables included (or at least variables named the same) that were used to train the model!
So you would need to think of a way how the predicted distribution can be added to pred_future.

I also found the same error to occur when using quadratic terms in specified in:
add_predictors(pred_current, transform = 'scale',derivates = 'quad')

Same as above, you simply forgot in this case to also create derivates for the future variables?
e.g. add_predictors(pred_future, derivates = 'quad')

cwaldock1 · 2024-03-18T09:42:54Z

Hi Martin, Thanks for the quick response.

For the first case, with the intergration through addition of covariates from a first model the add_predictors is missing the "poipo_X_simulated_mean" or "poipo_X_mean" objects. Although possible to extract from the trained model objects created, it would perhaps be more intuitive if this was automated for the user.

In the second case, I have added your suggestion but it still does not work. The piping of add_predictors to the scenario objects do not work with the derivates in the example below with your solution.

Thanks again!
Conor

# https://iiasa.github.io/ibis.iSDM/articles/04_biodiversity_projections.html

# Load the packages
library(ibis.iSDM)
library(stars)
library(xgboost)
library(terra)
library(igraph)
library(ggplot2)
library(ncdf4)
library(assertthat)

# Don't print out as many messages
options("ibis.setupmessages" = FALSE)

# Background and biodiversity data
background <- terra::rast(system.file('extdata/europegrid_50km.tif', package='ibis.iSDM'))
virtual_points <- sf::st_read(system.file('extdata/input_data.gpkg', package='ibis.iSDM'), 'points', quiet = TRUE)

# In addition we will use the species data to generate a presence-absence dataset with pseudo-absence points.
# Here we first specify the settings to use:
ass <- pseudoabs_settings(background = background, nrpoints = 200,
                          method =  "random")
virtual_pseudoabs <- add_pseudoabsence(df = virtual_points, 
                                       field_occurrence = "Observed",
                                       settings = ass)

# Note we are loading different predictors than in previous examples
# These are in netcdf4 format, a format specific for storing spatial-temporal data including metadata.
ll <- list.files(system.file("extdata/predictors_presfuture/", package = "ibis.iSDM", mustWork = TRUE), "*.nc",full.names = TRUE)

# From those list of predictors are first loading the current ones as raster data
# We are loading only data from the very first, contemporary time step for model fitting
pred_current <- terra::rast()
for(i in ll) suppressWarnings( pred_current <- c(pred_current, terra::rast(i, lyrs = 1) ) )
names(pred_current) <- tools::file_path_sans_ext(basename(ll))

# Get future predictors
# These we will load in using the stars package and also ignoring the first time step
pred_future <- stars::read_stars(ll) |> stars:::slice.stars('Time', 2:86)
st_crs(pred_future) <- st_crs(4326) # Set projection
# Rename future predictors to those of current
names(pred_future) <- names(pred_current)

# Plot the test data
plot(pred_current['secdf'],
     col = colorRampPalette(c("grey20", "orange", "lightgreen", "green"))(10),
     main = "Share of secondary vegetation")



#### train model with the above data ----

# Train model adding the data loaded above
x <- distribution(background) |> 
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed', name = 'Virtual points') |> 
#   add_biodiversity_poipa(virtual_pseudoabs, field_occurrence = "Observed")|>
  # Note that we scale the predictors here
  add_predictors(pred_current, transform = 'scale', derivates = 'quad') |> 
  engine_glmnet(alpha = 0) 

# Train the model
modf <- train(x, runname = 'glmnet test', inference_only = T, verbose = FALSE)
modf$model

# Now lets create a scenarios object via scenarios
sc <- scenario(modf) |> 
  # Apply the same variable transformations as above. 
  add_predictors(pred_future, transform = 'scale', derivates = 'quad') |>
  project()

> sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] glmnetUtils_1.1.9       glmnet_4.1-8            lubridate_1.9.3        
 [4] rnaturalearthdata_1.0.0 rnaturalearth_1.0.1     assertthat_0.2.1       
 [7] ncdf4_1.22              ggplot2_3.5.0           igraph_2.0.2           
[10] terra_1.7-71            xgboost_1.7.7.1         stars_0.6-4            
[13] sf_1.0-15               abind_1.4-5             ibis.iSDM_0.1.3        

loaded via a namespace (and not attached):
 [1] gtable_0.3.4                  shape_1.4.6.1                 remotes_2.4.2.1              
 [4] lattice_0.22-5                numDeriv_2016.8-1.1           tzdb_0.4.0                   
 [7] vctrs_0.6.5                   tools_4.3.3                   generics_0.1.3               
[10] curl_5.2.1                    stats4_4.3.3                  parallel_4.3.3               
[13] tibble_3.2.1                  proxy_0.4-27                  fansi_1.0.6                  
[16] pkgconfig_2.0.3               Matrix_1.6-5                  KernSmooth_2.23-22           
[19] data.table_1.15.2             uuid_1.2-0                    lifecycle_1.0.4              
[22] fmesher_0.1.5                 compiler_4.3.3                MatrixModels_0.5-3           
[25] progress_1.2.3                mnormt_2.1.1                  munsell_0.5.0                
[28] codetools_0.2-19              class_7.3-22                  pillar_1.9.0                 
[31] crayon_1.5.2                  classInt_0.4-10               lwgeom_0.2-14                
[34] wk_0.9.1                      iterators_1.0.14              foreach_1.5.2                
[37] inlabru_2.10.1                tidyselect_1.2.1              dplyr_1.1.4                  
[40] splines_4.3.3                 grid_4.3.3                    colorspace_2.1-0             
[43] cli_3.6.2                     magrittr_2.0.3                survival_3.5-8               
[46] utf8_1.2.4                    e1071_1.7-14                  sn_2.1.1                     
[49] readr_2.1.5                   withr_3.0.0                   prettyunits_1.2.0            
[52] scales_1.3.0                  sp_2.1-3                      bit64_4.0.5                  
[55] timechange_0.3.0              httr_1.4.7                    bit_4.0.5                    
[58] rnaturalearthhires_1.0.0.9000 hms_1.1.3                     s2_1.1.6                     
[61] rlang_1.1.3                   Rcpp_1.0.12                   INLA_24.02.09                
[64] glue_1.7.0                    DBI_1.2.2                     rstudioapi_0.15.0            
[67] vroom_1.6.5                   jsonlite_1.8.8                plyr_1.8.9                   
[70] R6_2.5.1                      units_0.8-5

Martin-Jung · 2024-03-18T12:19:22Z

For the first case, with the intergration through addition of covariates from a first model the add_predictors is missing the "poipo_X_simulated_mean" or "poipo_X_mean" objects. Although possible to extract from the trained model objects created, it would perhaps be more intuitive if this was automated for the user.

Sure, but this might only work for your case, not for one where someone for example one fits 2 iterated models with different formula or predictors (like a sequenced climate and land-use model). In this case it would be better to pass the entire model over to support these predictions. See also issue #78 for the same point made. Pull requests welcome :)

In the second case, I have added your suggestion but it still does not work. The piping of add_predictors to the scenario objects do not work with the derivates in the example below with your solution.

You are right, this has worked in the past. I will take a look or maybe @mhesselbarth can?
(Note: I checked the respective script and there is bunch of code additions that i did not make and do not work for stars?)

Essentially these should be identical after creating 'quad' transformed derivates, so I suspect somewhere a bug in the stars add_predictor() script.

x$get_predictor_names()
sc$get_predictor_names()

cwaldock1 · 2024-03-18T15:12:27Z

Sure, but this might only work for your case, not for one where someone for example one fits 2 iterated models with different formula or predictors (like a sequenced climate and land-use model).

I see, yes. So one would have to iteratively fit and predict. And then for scenarios follow the additions through with updated suitabilities from the first models.

Thanks for time invested in explanations and updates.

…transforms #106

mhesselbarth · 2024-04-10T13:40:57Z

FYI I started to work on this on a development branch (https://github.com/iiasa/ibis.iSDM/tree/project). Please be aware that at this point, this is highly experimental!

Martin-Jung · 2024-04-10T21:53:30Z

FYI I started to work on this on a development branch (https://github.com/iiasa/ibis.iSDM/tree/project). Please be aware that at this point, this is highly experimental!

Made some fixes f79b88d in dev to projection extent mismatches when using project. Make sure to merge to your branch.

mhesselbarth · 2024-04-11T05:27:24Z

I can already smell the merge conflict 😄

mhesselbarth · 2024-04-16T14:43:53Z

Should be now included on the dev branch. So the following code should run

# Load the packages
library(stars)
library(terra)
library(sf)
library(units)

background <- terra::rast(system.file('extdata/europegrid_50km.tif', package='ibis.iSDM'))
virtual_points <- sf::st_read(system.file('extdata/input_data.gpkg', package='ibis.iSDM'),
                              'points', quiet = TRUE)

ll <- list.files(system.file("extdata/predictors_presfuture/", package = "ibis.iSDM", mustWork = TRUE),
                 "*.nc", full.names = TRUE)

pred_current <- terra::rast()
for(i in ll) suppressWarnings(pred_current <- c(pred_current, terra::rast(i, lyrs = 1)))
names(pred_current) <- tools::file_path_sans_ext(basename(ll))

terra::units(pred_current) <- NULL

pred_future <- stars::read_stars(ll) |> stars:::slice.stars('Time', 2:86)
sf::st_crs(pred_future) <- sf::st_crs(4326) # Set projection

names(pred_future) <- names(pred_current)
pred_future <- units::drop_units(pred_future)

modf <- distribution(background) |>
  add_predictors(pred_current, transform = 'none', derivates = 'none') |>
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed',
                         formula = "observed ~ bio01 + bio12") |>
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed',
                         formula = "observed ~ crops + primf") |>
  engine_glmnet() |>
  train(method_integration = "predictor")

sc <- scenario(modf, copy_model = FALSE) |>
  add_predictors(pred_future, transform = 'none') |>
  project()

Martin-Jung added a commit that referenced this issue Mar 19, 2024

🐛 Updated predictor derivate function to support quadratic and hinge …

86c40db

…transforms #106

Martin-Jung self-assigned this Mar 21, 2024

Martin-Jung added the bug Something isn't working label Mar 21, 2024

Martin-Jung added a commit that referenced this issue Mar 28, 2024

Further fixes to derivates and scenario variable preparations #106

a640e40

Martin-Jung added a commit that referenced this issue Mar 28, 2024

Small bug fix #106 and changed saving to mdim arrays #109

78d6bee

mhesselbarth mentioned this issue Apr 16, 2024

project for method_integration = "predictor" #119

Merged

mhesselbarth closed this as completed Apr 26, 2024

Martin-Jung mentioned this issue May 9, 2024

Factor variables are not correctly handled anymore #125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with `project()` when used with integrated model #106

Issue with `project()` when used with integrated model #106

cwaldock1 commented Mar 18, 2024 •

edited

Loading

cwaldock1 commented Mar 18, 2024

Martin-Jung commented Mar 18, 2024

cwaldock1 commented Mar 18, 2024

Martin-Jung commented Mar 18, 2024 •

edited

Loading

cwaldock1 commented Mar 18, 2024 •

edited

Loading

mhesselbarth commented Apr 10, 2024

Martin-Jung commented Apr 10, 2024

mhesselbarth commented Apr 11, 2024

mhesselbarth commented Apr 16, 2024

Issue with project() when used with integrated model #106

Issue with project() when used with integrated model #106

Comments

cwaldock1 commented Mar 18, 2024 • edited Loading

cwaldock1 commented Mar 18, 2024

Martin-Jung commented Mar 18, 2024

cwaldock1 commented Mar 18, 2024

Martin-Jung commented Mar 18, 2024 • edited Loading

cwaldock1 commented Mar 18, 2024 • edited Loading

mhesselbarth commented Apr 10, 2024

Martin-Jung commented Apr 10, 2024

mhesselbarth commented Apr 11, 2024

mhesselbarth commented Apr 16, 2024

Issue with `project()` when used with integrated model #106

Issue with `project()` when used with integrated model #106

cwaldock1 commented Mar 18, 2024 •

edited

Loading

Martin-Jung commented Mar 18, 2024 •

edited

Loading

cwaldock1 commented Mar 18, 2024 •

edited

Loading