Skip to content

Releases: tidymodels/bonsai

bonsai 0.3.1

23 Jul 15:48
1e6bf6e
Compare
Choose a tag to compare
  • Fixed bug where "aorsf" models would not successfully fit in socket cluster workers (i.e. with plan(multisession)) unless another engine requiring bonsai had been fitted in the worker (#85).

bonsai 0.3.0

24 Jun 14:29
1a11997
Compare
Choose a tag to compare
  • Introduced support for accelerated oblique random forests for the "classification" and "regression" modes using the new "aorsf" engine (#78 by @bcjaeger).

  • Enabled passing Dataset Parameters to the "lightgbm" engine. To pass an argument that would be usually passed as an element to the param argument in lightgbm::lgb.Dataset(), pass the argument directly through the ellipses in set_engine(), e.g. boost_tree() %>% set_engine("lightgbm", linear_tree = TRUE) (#77).

  • Enabled case weights with the "lightgbm" engine (#72 by @p-schaefer).

  • Fixed issues in metadata for the "partykit" engine for rand_forest() where some engine arguments were mistakenly protected (#74).

  • Addressed type check error when fitting lightgbm model specifications with arguments mistakenly left as tune() (#79).

bonsai 0.2.1

29 Nov 19:16
436eb95
Compare
Choose a tag to compare
  • The most recent dials and parsnip releases introduced tuning integration for the lightgbm num_leaves engine argument! The num_leaves parameter sets the maximum number of nodes per tree, and is an important tuning parameter for lightgbm (tidymodels/dials#256, tidymodels/parsnip#838). With the newest version of each of dials, parsnip, and bonsai installed, tune this argument by marking the num_leaves engine argument for tuning when defining your model specification:
boost_tree() %>% set_engine("lightgbm", num_leaves = tune())
  • Fixed a bug where lightgbm's parallelism argument num_threads was overridden when passed via param rather than as a main argument. By default, then, lightgbm will fit sequentially rather than with num_threads = foreach::getDoParWorkers(). The user can still set num_threads via engine arguments with engine = "lightgbm":
boost_tree() %>% set_engine("lightgbm", num_threads = x)

Note that, when tuning hyperparameters with the tune package, detection of parallel backend will still work as usual.

  • The boost_tree argument stop_iter now maps to the lightgbm:::lgb.train() argument early_stopping_round rather than its alias early_stopping_rounds. This does not affect parsnip's interface to lightgbm (i.e. via boost_tree() %>% set_engine("lightgbm")), though will introduce errors for code that uses the train_lightgbm() wrapper directly and sets the lightgbm::lgb.train() argument early_stopping_round by its alias early_stopping_rounds via train_lightgbm()'s ....

  • Disallowed passing main model arguments as engine arguments to set_engine("lightgbm", ...) via aliases. That is, if a main argument is marked for tuning and a lightgbm alias is supplied as an engine argument, bonsai will now error, rather than supplying both to lightgbm and allowing the package to handle aliases. Users can still interface with non-main boost_tree() arguments via their lightgbm aliases (#53).

bonsai 0.2.0

31 Aug 18:18
5f3787b
Compare
Choose a tag to compare
  • Enabled bagging with lightgbm via the sample_size argument to boost_tree (#32 and tidymodels/parsnip#768). The following docs now available in ?details_boost_tree_lightgbm describe the interface in detail:

The sample_size argument is translated to the bagging_fraction parameter in the param argument of lgb.train. The argument is interpreted by lightgbm as a proportion rather than a count, so bonsai internally reparameterizes the sample_size argument with [dials::sample_prop()] during tuning.

To effectively enable bagging, the user would also need to set the bagging_freq argument to lightgbm. bagging_freq defaults to 0, which means bagging is disabled, and a bagging_freq argument of k means that the booster will perform bagging at every kth boosting iteration. Thus, by default, the sample_size argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to bagging_freq and use k = 1 when the analogue to bagging_fraction is in $(0, 1)$. bonsai will thus automatically set bagging_freq = 1 in set_engine("lightgbm", ...) if sample_size (i.e. bagging_fraction) is not equal to 1 and no bagging_freq value is supplied. This default can be overridden by setting the bagging_freq argument to set_engine() manually.

  • Corrected mapping of the mtry argument in boost_tree with the lightgbm engine. mtry previously mapped to the feature_fraction argument to lgb.train but was documented as mapping to an argument more closely resembling feature_fraction_bynode. mtry now maps to feature_fraction_bynode.

    This means that code that set feature_fraction_bynode as an argument to set_engine() will now error, and the user can now pass feature_fraction to set_engine() without raising an error.

  • Fixed error in lightgbm with engine argument objective = "tweedie" and response values less than 1.

  • A number of documentation improvements, increases in testing coverage, and changes to internals in anticipation of the 4.0.0 release of the lightgbm package. Thank you to @jameslamb for the effort and expertise!

bonsai 0.1.0

23 Jun 11:28
0700751
Compare
Choose a tag to compare

Lifecycle: experimental CRAN status Codecov test coverage R-CMD-check

v0.1.0 is the initial release of the bonsai package!

bonsai provides bindings for additional tree-based model engines for use with the parsnip package.

This package is based off of the work done in the treesnip repository by Athos Damiani, Daniel Falbel, and Roel Hogervorst. bonsai is the official CRAN version of the package; new development will reside here.

Installation

You can install the initial release of bonsai with:

install.packages("bonsai")

You can install the development version of bonsai from GitHub with:

# install.packages("pak")
pak::pak("tidymodels/bonsai")

Available Engines

The bonsai package provides additional engines for the models in the following table:

model engine mode
boost_tree lightgbm regression
boost_tree lightgbm classification
decision_tree partykit regression
decision_tree partykit classification
rand_forest partykit regression
rand_forest partykit classification

Code of Conduct

Please note that the bonsai project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.