Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with project() when used with integrated model #106

Closed
cwaldock1 opened this issue Mar 18, 2024 · 9 comments
Closed

Issue with project() when used with integrated model #106

cwaldock1 opened this issue Mar 18, 2024 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@cwaldock1
Copy link

cwaldock1 commented Mar 18, 2024

I would like to use the project function with integrated SDMs, and found in my own code that I get the error:

Error: Model predictors are missing from the scenario predictor!

This error is also reproduced by adding a two types of data to the Biodiversity distribution model object. The model contains a new predictor when integrated that is not passed along to the scenario object, and then is not found in the project() call.

Below is a modified version of your tutorial that reproduces the issue, but perhaps I am also inputing something wrong in the calls?

Many thanks for any assistance,
Conor

# Load the packages
library(ibis.iSDM)
library(stars)
library(xgboost)
library(terra)
library(igraph)
library(ggplot2)
library(ncdf4)
library(assertthat)

# Don't print out as many messages
options("ibis.setupmessages" = FALSE)

# Background and biodiversity data
background <- terra::rast(system.file('extdata/europegrid_50km.tif', package='ibis.iSDM'))
virtual_points <- sf::st_read(system.file('extdata/input_data.gpkg', package='ibis.iSDM'), 'points', quiet = TRUE)

# In addition we will use the species data to generate a presence-absence dataset with pseudo-absence points.
# Here we first specify the settings to use:
ass <- pseudoabs_settings(background = background, nrpoints = 200,
                          method =  "random")
virtual_pseudoabs <- add_pseudoabsence(df = virtual_points, 
                                       field_occurrence = "Observed",
                                       settings = ass)

# Note we are loading different predictors than in previous examples
# These are in netcdf4 format, a format specific for storing spatial-temporal data including metadata.
ll <- list.files(system.file("extdata/predictors_presfuture/", package = "ibis.iSDM", mustWork = TRUE), "*.nc",full.names = TRUE)

# From those list of predictors are first loading the current ones as raster data
# We are loading only data from the very first, contemporary time step for model fitting
pred_current <- terra::rast()
for(i in ll) suppressWarnings( pred_current <- c(pred_current, terra::rast(i, lyrs = 1) ) )
names(pred_current) <- tools::file_path_sans_ext(basename(ll))

# Get future predictors
# These we will load in using the stars package and also ignoring the first time step
pred_future <- stars::read_stars(ll) |> stars:::slice.stars('Time', 2:86)
st_crs(pred_future) <- st_crs(4326) # Set projection
# Rename future predictors to those of current
names(pred_future) <- names(pred_current)

# Plot the test data
plot(pred_current['secdf'],
     col = colorRampPalette(c("grey20", "orange", "lightgreen", "green"))(10),
     main = "Share of secondary vegetation")



#### train model with the above data ----

# Train model adding the data loaded above
x <- distribution(background) |> 
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed', name = 'Virtual points') |> 

#### HERE IS THE ONLY CHANGE
  add_biodiversity_poipa(virtual_pseudoabs, field_occurrence = "Observed") |>

  # Note that we scale the predictors here
  add_predictors(pred_current, transform = 'scale',derivates = 'none') |> 
  engine_glmnet(alpha = 0) 

# Train the model
modf <- train(x, runname = 'Simple PPM', inference_only = T, verbose = FALSE)
modf$model
# Add a threshold to this model by getting 05 percentile of values
# modf <- threshold(modf, method = 'percentile', value = 0.05)

# Now lets create a scenarios object via scenarios
sc <- scenario(modf) |> 
  # Apply the same variable transformations as above. 
  add_predictors(pred_future, transform = 'scale') |>
  project() #### ERROR OCCURS HERE 

@cwaldock1
Copy link
Author

I also found the same error to occur when using quadratic terms in specified in:
add_predictors(pred_current, transform = 'scale',derivates = 'quad')

@Martin-Jung
Copy link
Collaborator

Hi,
you are using the engine_glmnet() which does not support joint integration or similar. Instead the default, as set by the parameter method_integration in train(), is to sequentially add previous predictions (e.g. from the first to second last dataset) to the next prediction as covariate. See the help file for other options of integration.

However, when making a future prediction you are only providing a set of predetermined variables (pred_future) which does not have the prediction made before. See summary(modf) for the variable names.
Any projection needs to have the same variables included (or at least variables named the same) that were used to train the model!
So you would need to think of a way how the predicted distribution can be added to pred_future.

I also found the same error to occur when using quadratic terms in specified in:
add_predictors(pred_current, transform = 'scale',derivates = 'quad')

Same as above, you simply forgot in this case to also create derivates for the future variables?
e.g. add_predictors(pred_future, derivates = 'quad')

@cwaldock1
Copy link
Author

Hi Martin, Thanks for the quick response.

For the first case, with the intergration through addition of covariates from a first model the add_predictors is missing the "poipo_X_simulated_mean" or "poipo_X_mean" objects. Although possible to extract from the trained model objects created, it would perhaps be more intuitive if this was automated for the user.

In the second case, I have added your suggestion but it still does not work. The piping of add_predictors to the scenario objects do not work with the derivates in the example below with your solution.

Thanks again!
Conor

# https://iiasa.github.io/ibis.iSDM/articles/04_biodiversity_projections.html

# Load the packages
library(ibis.iSDM)
library(stars)
library(xgboost)
library(terra)
library(igraph)
library(ggplot2)
library(ncdf4)
library(assertthat)

# Don't print out as many messages
options("ibis.setupmessages" = FALSE)

# Background and biodiversity data
background <- terra::rast(system.file('extdata/europegrid_50km.tif', package='ibis.iSDM'))
virtual_points <- sf::st_read(system.file('extdata/input_data.gpkg', package='ibis.iSDM'), 'points', quiet = TRUE)

# In addition we will use the species data to generate a presence-absence dataset with pseudo-absence points.
# Here we first specify the settings to use:
ass <- pseudoabs_settings(background = background, nrpoints = 200,
                          method =  "random")
virtual_pseudoabs <- add_pseudoabsence(df = virtual_points, 
                                       field_occurrence = "Observed",
                                       settings = ass)

# Note we are loading different predictors than in previous examples
# These are in netcdf4 format, a format specific for storing spatial-temporal data including metadata.
ll <- list.files(system.file("extdata/predictors_presfuture/", package = "ibis.iSDM", mustWork = TRUE), "*.nc",full.names = TRUE)

# From those list of predictors are first loading the current ones as raster data
# We are loading only data from the very first, contemporary time step for model fitting
pred_current <- terra::rast()
for(i in ll) suppressWarnings( pred_current <- c(pred_current, terra::rast(i, lyrs = 1) ) )
names(pred_current) <- tools::file_path_sans_ext(basename(ll))

# Get future predictors
# These we will load in using the stars package and also ignoring the first time step
pred_future <- stars::read_stars(ll) |> stars:::slice.stars('Time', 2:86)
st_crs(pred_future) <- st_crs(4326) # Set projection
# Rename future predictors to those of current
names(pred_future) <- names(pred_current)

# Plot the test data
plot(pred_current['secdf'],
     col = colorRampPalette(c("grey20", "orange", "lightgreen", "green"))(10),
     main = "Share of secondary vegetation")



#### train model with the above data ----

# Train model adding the data loaded above
x <- distribution(background) |> 
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed', name = 'Virtual points') |> 
#   add_biodiversity_poipa(virtual_pseudoabs, field_occurrence = "Observed")|>
  # Note that we scale the predictors here
  add_predictors(pred_current, transform = 'scale', derivates = 'quad') |> 
  engine_glmnet(alpha = 0) 

# Train the model
modf <- train(x, runname = 'glmnet test', inference_only = T, verbose = FALSE)
modf$model

# Now lets create a scenarios object via scenarios
sc <- scenario(modf) |> 
  # Apply the same variable transformations as above. 
  add_predictors(pred_future, transform = 'scale', derivates = 'quad') |>
  project()

> sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] glmnetUtils_1.1.9       glmnet_4.1-8            lubridate_1.9.3        
 [4] rnaturalearthdata_1.0.0 rnaturalearth_1.0.1     assertthat_0.2.1       
 [7] ncdf4_1.22              ggplot2_3.5.0           igraph_2.0.2           
[10] terra_1.7-71            xgboost_1.7.7.1         stars_0.6-4            
[13] sf_1.0-15               abind_1.4-5             ibis.iSDM_0.1.3        

loaded via a namespace (and not attached):
 [1] gtable_0.3.4                  shape_1.4.6.1                 remotes_2.4.2.1              
 [4] lattice_0.22-5                numDeriv_2016.8-1.1           tzdb_0.4.0                   
 [7] vctrs_0.6.5                   tools_4.3.3                   generics_0.1.3               
[10] curl_5.2.1                    stats4_4.3.3                  parallel_4.3.3               
[13] tibble_3.2.1                  proxy_0.4-27                  fansi_1.0.6                  
[16] pkgconfig_2.0.3               Matrix_1.6-5                  KernSmooth_2.23-22           
[19] data.table_1.15.2             uuid_1.2-0                    lifecycle_1.0.4              
[22] fmesher_0.1.5                 compiler_4.3.3                MatrixModels_0.5-3           
[25] progress_1.2.3                mnormt_2.1.1                  munsell_0.5.0                
[28] codetools_0.2-19              class_7.3-22                  pillar_1.9.0                 
[31] crayon_1.5.2                  classInt_0.4-10               lwgeom_0.2-14                
[34] wk_0.9.1                      iterators_1.0.14              foreach_1.5.2                
[37] inlabru_2.10.1                tidyselect_1.2.1              dplyr_1.1.4                  
[40] splines_4.3.3                 grid_4.3.3                    colorspace_2.1-0             
[43] cli_3.6.2                     magrittr_2.0.3                survival_3.5-8               
[46] utf8_1.2.4                    e1071_1.7-14                  sn_2.1.1                     
[49] readr_2.1.5                   withr_3.0.0                   prettyunits_1.2.0            
[52] scales_1.3.0                  sp_2.1-3                      bit64_4.0.5                  
[55] timechange_0.3.0              httr_1.4.7                    bit_4.0.5                    
[58] rnaturalearthhires_1.0.0.9000 hms_1.1.3                     s2_1.1.6                     
[61] rlang_1.1.3                   Rcpp_1.0.12                   INLA_24.02.09                
[64] glue_1.7.0                    DBI_1.2.2                     rstudioapi_0.15.0            
[67] vroom_1.6.5                   jsonlite_1.8.8                plyr_1.8.9                   
[70] R6_2.5.1                      units_0.8-5     

@Martin-Jung
Copy link
Collaborator

Martin-Jung commented Mar 18, 2024

For the first case, with the intergration through addition of covariates from a first model the add_predictors is missing the "poipo_X_simulated_mean" or "poipo_X_mean" objects. Although possible to extract from the trained model objects created, it would perhaps be more intuitive if this was automated for the user.

Sure, but this might only work for your case, not for one where someone for example one fits 2 iterated models with different formula or predictors (like a sequenced climate and land-use model). In this case it would be better to pass the entire model over to support these predictions. See also issue #78 for the same point made. Pull requests welcome :)

In the second case, I have added your suggestion but it still does not work. The piping of add_predictors to the scenario objects do not work with the derivates in the example below with your solution.

You are right, this has worked in the past. I will take a look or maybe @mhesselbarth can?
(Note: I checked the respective script and there is bunch of code additions that i did not make and do not work for stars?)

  • Essentially these should be identical after creating 'quad' transformed derivates, so I suspect somewhere a bug in the stars add_predictor() script.
x$get_predictor_names()
sc$get_predictor_names()

@cwaldock1
Copy link
Author

cwaldock1 commented Mar 18, 2024

Sure, but this might only work for your case, not for one where someone for example one fits 2 iterated models with different formula or predictors (like a sequenced climate and land-use model).

I see, yes. So one would have to iteratively fit and predict. And then for scenarios follow the additions through with updated suitabilities from the first models.

Thanks for time invested in explanations and updates.

@mhesselbarth
Copy link
Collaborator

FYI I started to work on this on a development branch (https://github.com/iiasa/ibis.iSDM/tree/project). Please be aware that at this point, this is highly experimental!

@Martin-Jung
Copy link
Collaborator

FYI I started to work on this on a development branch (https://github.com/iiasa/ibis.iSDM/tree/project). Please be aware that at this point, this is highly experimental!

Made some fixes f79b88d in dev to projection extent mismatches when using project. Make sure to merge to your branch.

@mhesselbarth
Copy link
Collaborator

I can already smell the merge conflict 😄

@mhesselbarth
Copy link
Collaborator

Should be now included on the dev branch. So the following code should run

# Load the packages
library(stars)
library(terra)
library(sf)
library(units)

background <- terra::rast(system.file('extdata/europegrid_50km.tif', package='ibis.iSDM'))
virtual_points <- sf::st_read(system.file('extdata/input_data.gpkg', package='ibis.iSDM'),
                              'points', quiet = TRUE)

ll <- list.files(system.file("extdata/predictors_presfuture/", package = "ibis.iSDM", mustWork = TRUE),
                 "*.nc", full.names = TRUE)

pred_current <- terra::rast()
for(i in ll) suppressWarnings(pred_current <- c(pred_current, terra::rast(i, lyrs = 1)))
names(pred_current) <- tools::file_path_sans_ext(basename(ll))

terra::units(pred_current) <- NULL

pred_future <- stars::read_stars(ll) |> stars:::slice.stars('Time', 2:86)
sf::st_crs(pred_future) <- sf::st_crs(4326) # Set projection

names(pred_future) <- names(pred_current)
pred_future <- units::drop_units(pred_future)

modf <- distribution(background) |>
  add_predictors(pred_current, transform = 'none', derivates = 'none') |>
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed',
                         formula = "observed ~ bio01 + bio12") |>
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed',
                         formula = "observed ~ crops + primf") |>
  engine_glmnet() |>
  train(method_integration = "predictor")

sc <- scenario(modf, copy_model = FALSE) |>
  add_predictors(pred_future, transform = 'none') |>
  project()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants