Document various tree methods. (#6564)

dmlc · Jan 2, 2021 · 2b049b3 · 2b049b3
1 parent fa13992
commit 2b049b3
Show file tree

Hide file tree

Showing 3 changed files with 106 additions and 3 deletions.
diff --git a/doc/index.rst b/doc/index.rst
@@ -22,6 +22,7 @@ Contents
   XGBoost User Forum <https://discuss.xgboost.ai>
   GPU support <gpu/index>
   parameter
+  treemethod
   Python package <python/index>
   R package <R-package/index>
   JVM package <jvm/index>

diff --git a/doc/parameter.rst b/doc/parameter.rst
@@ -74,8 +74,8 @@ Parameters for Tree Booster
 
 * ``max_depth`` [default=6]
 
-  - Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 is only accepted in ``lossguided`` growing policy when tree_method is set as ``hist`` and it indicates no limit on depth. Beware that XGBoost aggressively consumes memory when training a deep tree.
-  - range: [0,∞] (0 is only accepted in ``lossguided`` growing policy when tree_method is set as ``hist``)
+  - Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 is only accepted in ``lossguided`` growing policy when tree_method is set as ``hist`` or ``gpu_hist`` and it indicates no limit on depth. Beware that XGBoost aggressively consumes memory when training a deep tree.
+  - range: [0,∞] (0 is only accepted in ``lossguided`` growing policy when tree_method is set as ``hist`` or ``gpu_hist``)
 
 * ``min_child_weight`` [default=1]
 
@@ -130,7 +130,7 @@ Parameters for Tree Booster
 
 * ``tree_method`` string [default= ``auto``]
 
-  - The tree construction algorithm used in XGBoost. See description in the `reference paper <http://arxiv.org/abs/1603.02754>`_.
+  - The tree construction algorithm used in XGBoost. See description in the `reference paper <http://arxiv.org/abs/1603.02754>`_ and :doc:`treemethod`.
   - XGBoost supports  ``approx``, ``hist`` and ``gpu_hist`` for distributed training.  Experimental support for external memory is available for ``approx`` and ``gpu_hist``.
 
   - Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``, this is a

diff --git a/doc/treemethod.rst b/doc/treemethod.rst
@@ -0,0 +1,102 @@
+####################
+XGBoost Tree methods
+####################
+
+
+For training boosted tree models, there are 2 parameters used for choosing algorithms,
+namely ``updater`` and ``tree_method``.  XGBoost has 4 builtin tree methods, namely
+``exact``, ``approx``, ``hist`` and ``gpu_hist``.  Along with these tree methods, there
+are also some free standing updaters including ``grow_local_histmaker``, ``refresh``,
+``prune`` and ``sync``.  The parameter ``updater`` is more primitive than ``tree_method``
+as the latter is just a pre-configuration of the former.  The difference is mostly due to
+historical reasons that each updater requires some specific configurations and might has
+missing features.  As we are moving forward, the gap between them is becoming more and
+more irrevelant.  We will collectively document them under tree methods.
+
+**************
+Exact Solution
+**************
+
+Exact means XGBoost considers all candidates from data for tree splitting, but underlying
+the objective is still interpreted as a Taylor expansion.
+
+1. ``exact``: Vanilla tree boosting tree algorithm described in `reference paper
+   <http://arxiv.org/abs/1603.02754>`_.  During each split finding procedure, it iterates
+   over every entry of input data.  It's more accurate (among other greedy methods) but
+   slow in computation performance.  Also it doesn't support distributed training as
+   XGBoost employs row spliting data distribution while ``exact`` tree method works on a
+   sorted column format.  This tree method can be used with parameter ``tree_method`` set
+   to ``exact``.
+
+
+**********************
+Approximated Solutions
+**********************
+
+As ``exact`` tree method is slow in performance and not scalable, we often employ
+approximated training algorithms.  These algorithms build a gradient histogram for each
+node and iterate through the histogram instead of real dataset.  Here we introduce the
+implementations in XGBoost below.
+
+1. ``grow_local_histmaker`` updater: An approximation tree method described in `reference
+   paper <http://arxiv.org/abs/1603.02754>`_.  This updater is rarely used in practice so
+   it's still an updater rather than tree method.  During split finding, it first runs a
+   weighted GK sketching for data points belong to current node to find split candidates,
+   using hessian as weights.  The histogram is built upon this per-node sketch.  It's
+   faster than ``exact`` in some applications, but still slow in computation.
+
+2. ``approx`` tree method: An approximation tree method described in `reference paper
+   <http://arxiv.org/abs/1603.02754>`_.  Different from ``grow_local_histmaker``, it runs
+   sketching before building each tree using all the rows (rows belonging to the root)
+   instead of per-node dataset.  Similar to ``grow_local_histmaker`` updater, hessian is
+   used as weights during sketch.  The algorithm can be accessed by setting
+   ``tree_method`` to ``approx``.
+
+3. ``hist`` tree method: An approximation tree method used in LightGBM with slight
+   differences in implementation.  It runs sketching before training using only user
+   provided weights instead of hessian.  The subsequent per-node histogram is built upon
+   this global sketch.  This is the fastest algorithm as it runs sketching only once.  The
+   algorithm can be accessed by setting ``tree_method`` to ``hist``.
+
+4. ``gpu_hist`` tree method: The ``gpu_hist`` tree method is a GPU implementation of
+   ``hist``, with additional support for gradient based sampling.  The algorithm can be
+   accessed by setting ``tree_method`` to ``gpu_hist``.
+
+************
+Implications
+************
+
+Some objectives like ``reg:squarederror`` have constant hessian.  In this case, ``hist``
+or ``gpu_hist`` should be preferred as weighted sketching doesn't make sense with constant
+weights.  When using non-constant hessian objectives, sometimes ``approx`` yields better
+accuracy, but with slower computation performance.  Most of the time using ``(gpu)_hist``
+with higher ``max_bin`` can achieve similar or even superior accuracy while maintaining
+good performance.  However, as xgboost is largely driven by community effort, the actual
+implementations have some differences than pure math description.  Result might have
+slight differences than expectation, which we are currently trying to overcome.
+
+**************
+Other Updaters
+**************
+
+1. ``Pruner``: It prunes the built tree by ``gamma`` parameter.  ``pruner`` is usually
+   used as part of other tree methods.
+2. ``Refresh``: Refresh the statistic of bulilt trees on a new training dataset.
+3. ``Sync``: Synchronize the tree among workers when running distributed training.
+
+****************
+Removed Updaters
+****************
+
+2 Updaters were removed during development due to maintainability.  We describe them here
+solely for the interest of documentation.  First one is distributed colmaker, which was a
+distributed version of exact tree method.  It required specialization for column based
+spliting strategy and a different prediction procedure.  As the exact tree method is slow
+by itself and scaling is even less efficient, we removed it entirely.  Second one is
+``skmaker``.  Per-node weighted sketching employed by ``grow_local_histmaker`` is slow,
+the ``skmaker`` was unmaintained and seems to be a workaround trying to eliminate the
+histogram creation step and uses sketching values directly during split evaluation.  It
+was never tested and contained some unknown bugs, we decided to remove it and focus our
+resources on more promising algorithms instead.  For accuracy, most of the time
+``approx``, ``hist`` and ``gpu_hist`` are enough with some parameters tunning, so removing
+them don't have any real practical impact.