Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression running tune_bayes on mars with tune 1.1.2 #720

Open
dramanica opened this issue Sep 15, 2023 · 3 comments
Open

Regression running tune_bayes on mars with tune 1.1.2 #720

dramanica opened this issue Sep 15, 2023 · 3 comments
Labels
upkeep maintenance, infrastructure, and similar

Comments

@dramanica
Copy link

With the latest version of tune (1.1.2), using tune_bayes on a mars model raises lots of errors. No errors are raised with version 1.1.1. Here is a reprex, showing the errors with version 1.1.2.

library(tidymodels)
two_rec <- recipe(Class~.,data=two_class_dat)
mars_spec <- parsnip::mars() %>% 
  parsnip::set_engine("earth") %>%
  parsnip::set_mode("classification")
mars_tune_spec <- parsnip::mars(num_terms=tune()) %>% 
  parsnip::set_engine("earth") %>%
  parsnip::set_mode("classification")
two_tune_wkflow <-# new workflow object
  workflow() %>% # use workflow function
  add_recipe(two_rec) %>% # add the new recipe
  add_model(mars_tune_spec)
two_cv <- vfold_cv(two_class_dat, v=3)

two_bayer_res <- tune_bayes(two_tune_wkflow,
                            resamples = two_cv, initial=8)
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> → A | error:   `num_terms` should be >= 1.
#> There were issues with some computations   A: x1
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more information.
#> no non-missing arguments to max; returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> There were issues with some computations   A: x14
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more information.
#> no non-missing arguments to max; returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> ! No improvement for 10 iterations; returning current results.
#> There were issues with some computations   A: x14There were issues with some computations   A: x30

Created on 2023-09-15 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       Ubuntu 22.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_GB:en
#>  collate  en_GB.UTF-8
#>  ctype    en_GB.UTF-8
#>  tz       Europe/London
#>  date     2023-09-15
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version    date (UTC) lib source
#>  backports       1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
#>  broom         * 1.0.5      2023-06-09 [1] CRAN (R 4.3.1)
#>  class           7.3-22     2023-05-03 [3] CRAN (R 4.3.1)
#>  cli             3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
#>  codetools       0.2-19     2023-02-01 [3] CRAN (R 4.2.2)
#>  colorspace      2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
#>  data.table      1.14.8     2023-02-17 [1] CRAN (R 4.3.0)
#>  dials         * 1.2.0      2023-04-03 [1] CRAN (R 4.3.0)
#>  DiceDesign      1.9        2021-02-13 [1] CRAN (R 4.3.0)
#>  digest          0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
#>  dplyr         * 1.1.2      2023-04-20 [1] CRAN (R 4.3.0)
#>  earth         * 5.3.2      2023-01-26 [1] CRAN (R 4.3.0)
#>  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
#>  evaluate        0.21       2023-05-05 [1] CRAN (R 4.3.0)
#>  fansi           1.0.4      2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap         1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
#>  foreach         1.5.2      2022-02-02 [1] CRAN (R 4.3.0)
#>  Formula       * 1.2-5      2023-02-24 [1] CRAN (R 4.3.0)
#>  fs              1.6.3      2023-07-20 [1] CRAN (R 4.3.1)
#>  furrr           0.3.1      2022-08-15 [1] CRAN (R 4.3.0)
#>  future          1.33.0     2023-07-01 [1] CRAN (R 4.3.1)
#>  future.apply    1.11.0     2023-05-21 [1] CRAN (R 4.3.0)
#>  generics        0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
#>  ggplot2       * 3.4.2      2023-04-03 [1] CRAN (R 4.3.0)
#>  globals         0.16.2     2022-11-21 [1] CRAN (R 4.3.0)
#>  glue            1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
#>  gower           1.0.1      2022-12-22 [1] CRAN (R 4.3.0)
#>  GPfit           1.0-8      2019-02-08 [1] CRAN (R 4.3.0)
#>  gtable          0.3.3      2023-03-21 [1] CRAN (R 4.3.0)
#>  hardhat         1.3.0      2023-03-30 [1] CRAN (R 4.3.0)
#>  htmltools       0.5.5      2023-03-23 [1] CRAN (R 4.3.0)
#>  infer         * 1.0.4      2022-12-02 [1] CRAN (R 4.3.0)
#>  ipred           0.9-14     2023-03-09 [1] CRAN (R 4.3.0)
#>  iterators       1.0.14     2022-02-05 [1] CRAN (R 4.3.0)
#>  knitr           1.43       2023-05-25 [1] CRAN (R 4.3.0)
#>  lattice         0.21-8     2023-04-05 [3] CRAN (R 4.3.0)
#>  lava            1.7.2.1    2023-02-27 [1] CRAN (R 4.3.0)
#>  lhs             1.1.6      2022-12-17 [1] CRAN (R 4.3.0)
#>  lifecycle       1.0.3      2022-10-07 [1] CRAN (R 4.3.0)
#>  listenv         0.9.0      2022-12-16 [1] CRAN (R 4.3.0)
#>  lubridate       1.9.2      2023-02-10 [1] CRAN (R 4.3.0)
#>  magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
#>  MASS            7.3-60     2023-05-04 [3] CRAN (R 4.3.1)
#>  Matrix          1.6-0      2023-07-08 [3] CRAN (R 4.3.1)
#>  modeldata     * 1.1.0      2023-01-25 [1] CRAN (R 4.3.0)
#>  munsell         0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
#>  nnet            7.3-19     2023-05-03 [3] CRAN (R 4.3.1)
#>  parallelly      1.36.0     2023-05-26 [1] CRAN (R 4.3.0)
#>  parsnip       * 1.1.0      2023-04-12 [1] CRAN (R 4.3.0)
#>  pillar          1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
#>  plotmo        * 3.6.2      2022-05-21 [1] CRAN (R 4.3.0)
#>  plotrix       * 3.8-2      2021-09-08 [1] CRAN (R 4.3.0)
#>  prodlim         2023.03.31 2023-04-02 [1] CRAN (R 4.3.0)
#>  purrr         * 1.0.1      2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache         0.16.0     2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3     1.8.2      2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo            1.25.0     2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils         2.12.2     2022-11-11 [1] CRAN (R 4.3.0)
#>  R6              2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
#>  Rcpp            1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
#>  recipes       * 1.0.6      2023-04-25 [1] CRAN (R 4.3.0)
#>  reprex          2.0.2      2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang           1.1.1      2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown       2.23       2023-07-01 [1] CRAN (R 4.3.1)
#>  rpart           4.1.19     2022-10-21 [3] CRAN (R 4.2.1)
#>  rsample       * 1.2.0      2023-08-23 [1] CRAN (R 4.3.1)
#>  rstudioapi      0.15.0     2023-07-07 [1] CRAN (R 4.3.1)
#>  scales        * 1.2.1      2022-08-20 [1] CRAN (R 4.3.0)
#>  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
#>  styler          1.10.1     2023-06-05 [1] CRAN (R 4.3.1)
#>  survival        3.5-5      2023-03-12 [3] CRAN (R 4.3.1)
#>  TeachingDemos * 2.12       2020-04-07 [1] CRAN (R 4.3.0)
#>  tibble        * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
#>  tidymodels    * 1.1.0      2023-05-01 [1] CRAN (R 4.3.0)
#>  tidyr         * 1.3.0      2023-01-24 [1] CRAN (R 4.3.0)
#>  tidyselect      1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
#>  timechange      0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
#>  timeDate        4022.108   2023-01-07 [1] CRAN (R 4.3.0)
#>  tune          * 1.1.2      2023-08-23 [1] CRAN (R 4.3.1)
#>  utf8            1.2.3      2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs           0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
#>  withr           2.5.0      2022-03-03 [1] CRAN (R 4.3.0)
#>  workflows     * 1.1.3      2023-02-22 [1] CRAN (R 4.3.0)
#>  workflowsets  * 1.0.1      2023-04-06 [1] CRAN (R 4.3.0)
#>  xfun            0.39       2023-04-20 [1] CRAN (R 4.3.0)
#>  yaml            2.3.7      2023-01-23 [1] CRAN (R 4.3.0)
#>  yardstick     * 1.2.0      2023-04-21 [1] CRAN (R 4.3.0)
#> 
#>  [1] /home/andrea/R/x86_64-pc-linux-gnu-library
#>  [2] /usr/lib/R/site-library
#>  [3] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@simonpcouch
Copy link
Contributor

Thanks for the issue! This is a strange one.

With tune 1.1.1, I see:

two_bayer_res %>% collect_metrics()
#> # A tibble: 8 × 8
#>   num_terms .metric  .estimator  mean     n std_err .config              .iter
#>       <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>                <int>
#> 1         2 accuracy binary     0.784     3  0.0195 Preprocessor1_Model1     0
#> 2         2 roc_auc  binary     0.866     3  0.0101 Preprocessor1_Model1     0
#> 3         3 accuracy binary     0.829     3  0.0122 Preprocessor1_Model2     0
#> 4         3 roc_auc  binary     0.885     3  0.0105 Preprocessor1_Model2     0
#> 5         4 accuracy binary     0.822     3  0.0174 Preprocessor1_Model3     0
#> 6         4 roc_auc  binary     0.883     3  0.0116 Preprocessor1_Model3     0
#> 7         5 accuracy binary     0.819     3  0.0150 Preprocessor1_Model4     0
#> 8         5 roc_auc  binary     0.883     3  0.0110 Preprocessor1_Model4     0

Note that all .iter are 0, i.e. the Bayesian search never started after the initial grid. I see the same output with 1.1.2.

Given the few changes that were made in 1.1.2, this seems to be an issue with our logging previously (resolved in #682) rather than a newly introduced issue in tuning. Still need to troubleshoot what that newly surfaced issue is.

@simonpcouch
Copy link
Contributor

Ah, yes. With 1.1.2, if you set control = control_bayes(TRUE, TRUE), then you'll see the errors go away, as before.

The underlying issue as that the GP can't predict any possible new points. The default num_terms() parameter object will only result in searches across integers in [2, 5]. That initial search covers all of those possible num_terms values, so pred_gp() returns early, noting that there were no more candidate models.

tune/R/tune_bayes.R

Lines 375 to 380 in 74854a5

candidates <-
pred_gp(
gp_mod, param_info,
control = control,
current = mean_stats %>% dplyr::select(dplyr::all_of(param_info$id))
)

tune/R/tune_bayes.R

Lines 584 to 591 in 74854a5

if (nrow(pred_grid) == 0) {
msg <- "No remaining candidate models"
} else {
msg <- "An error occurred when creating candidates parameters: "
msg <- paste(msg, as.character(object))
}
tune_log(control, split = NULL, task = msg, type = "warning")
return(pred_grid %>% dplyr::mutate(.mean = NA_real_, .sd = NA_real_))

That message is passed to tune_log() and then promptly ignored due to verbosity settings, resulting in that "num_terms should be >= 1." error downstream (since the GP is passing on an NA for num_terms). I think that logic was written before early returns from tune_bayes_workflow() were properly caught and intermediate results returned and was maybe(?) implemented that way so that some results made it out of tune_bayes() in that case. I'd argue we ought to stop optimization and exit early with a more informative error in this case.

I think I'll wait on addressing this in favor of a refactor of tune_log() which should simplify the machinery for early exits.

@dramanica
Copy link
Author

Thank you @simonpcouch for looking into this. I have just checked, and the situation is the same in the original analysis that inspired the reprex, so at least this is not a corner case due to the example dataset.
Waiting for a refactor of tune_log() sounds sensible, it is now clear how to manage this issue in the mean time.

@simonpcouch simonpcouch added the upkeep maintenance, infrastructure, and similar label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upkeep maintenance, infrastructure, and similar
Projects
None yet
Development

No branches or pull requests

2 participants