Native Pseudo-Huber loss support #5479

LionOrCatThatIsTheQuestion · 2020-04-03T08:13:02Z

Would it be possible to support Pseudo-Huber-loss (https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function) natively?

I implemented it as a custom losss function (I use the Python SKLearn API)

def huber_approx_obj(y_true, y_pred):
    z = y_pred - y_true
    delta = 1
    scale = 1 + (z/delta)**2
    scale_sqrt = np.sqrt(scale)
    grad = z/scale_sqrt
    hess = 1/(scale*scale_sqrt)
    return grad, hess

but the feature importance plots don't support custom loss functions (and it slows the learning process in comparison to 'reg:squarederror').

The basic problem is the need for a robust regession objective; MSE can be sensitive to outliers in application.

trivialfis · 2020-04-14T10:09:35Z

@LionOrCatThatIsTheQuestion Would you like to make a PR for this? It should just be a simple class defined in src/objective/regression_loss.h. I am happy to help. ;-)

LionOrCatThatIsTheQuestion · 2020-04-15T08:00:31Z

@trivialfis what evaluation metric should I use, rmse or mae would be my first guess?

Here is my code so far:

struct PseudoHuberError {

  XGBOOST_DEVICE static bst_float PredTransform(bst_float x) { 
    return x; 
  }

  XGBOOST_DEVICE static bool CheckLabel(bst_float label) {
    return true;
  }
 
  XGBOOST_DEVICE static bst_float FirstOrderGradient(bst_float predt, bst_float label) {
    const float z = predt - label;
    const float scale_sqrt = std::sqrt(1 + std::pow(z,2));
    return z/scale_sqrt;
  }
 
  XGBOOST_DEVICE static bst_float SecondOrderGradient(bst_float predt, bst_float label) {
    const float scale = 1 + std::pow(predt - label,2);
    const float scale_sqrt = std::sqrt(scale);
    return 1/(scale*scale_sqrt);
  }
 
  static bst_float ProbToMargin(bst_float base_score) { 
    return base_score; 
  }
 
  static const char* LabelErrorMsg() {
    return "";
  }
 
  static const char* DefaultEvalMetric() { 
    return "mae"; 
  }
 
  static const char* Name() { 
    return "reg:pseudohubererror"; 
  }
};

trivialfis · 2020-04-17T23:20:57Z

Does it make sense to use Pseudo-Huber loss as a metric?

LionOrCatThatIsTheQuestion · 2020-04-20T15:08:12Z

Guess Pseudo-Huber loss would be an option too (seems natural to choose the same metric as loss function?) or MAE.

The idea was to implemented Pseudo-Huber loss as a twice differentiable approximation of MAE, so on second thought MSE as metric kind of defies the original purpose.

LionOrCatThatIsTheQuestion · 2020-04-20T15:11:44Z

The advantage of MAE (and also MSE), is that they are better/natural interpretable. Pseudo-Huber loss does not have the same values as MAE in the case "abs(y_pred - y_true) > 1", it just has the same linear shape as opposed to quadratic.

trivialfis · 2020-04-21T21:42:01Z

@LionOrCatThatIsTheQuestion We can set the default metric to be huber, as users can specify other metrics if they like. To me using huber as the default metric seems appropriate here. You can add a metric in src/metric/elementise_metric.cu. Feel free to ping me if you have any issue around the code base.

LionOrCatThatIsTheQuestion · 2020-04-22T09:43:57Z

@trivialfis Could you explain what the function GetFinal(...) does? I used MAE as reference:

struct EvalRowPHE {
  char const *Name() const {
    return "phe";
  }
  XGBOOST_DEVICE bst_float EvalRow(bst_float label, bst_float pred) const {
    bst_float diff = label - pred;
    return std::sqrt(1 + diff * diff) - 1;
  }
  static bst_float GetFinal(bst_float esum, bst_float wsum) {
    return wsum == 0 ? esum : esum / wsum;
  }
};
`` `

trivialfis · 2020-05-07T10:49:50Z

@LionOrCatThatIsTheQuestion Is there any reason we should fix \delta to be 1 ?

trivialfis · 2020-05-07T10:51:20Z

For normal cases GetFinal is just a way to compute weighted mean. One special case is the gamma deviance, which is weighted deviance: https://rdrr.io/cran/MetricsWeighted/man/deviance_gamma.html

LionOrCatThatIsTheQuestion · 2020-05-08T11:53:48Z

\delta should be 1 by default, but adjustable would be better than fixed - the question is more if its possible and how to implement an additional parameter for a metric?

e.g. in sklearn interface, I would just use the keyword 'reg:pseudohubererror' to specify the metric

QuantHao · 2020-07-12T11:08:44Z

Would it be possible to support Pseudo-Huber-loss (https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function) natively?

I implemented it as a custom losss function (I use the Python SKLearn API)
def huber_approx_obj(y_true, y_pred):
    z = y_pred - y_true
    delta = 1
    scale = 1 + (z/delta)**2
    scale_sqrt = np.sqrt(scale)
    grad = z/scale_sqrt
    hess = 1/(scale*scale_sqrt)
    return grad, hess
but the feature importance plots don't support custom loss functions (and it slows the learning process in comparison to 'reg:squarederror').

The basic problem is the need for a robust regession objective; MSE can be sensitive to outliers in application.

Hi, is it possible to relax the constrain of delta equals 1 so that user could choose other delta such as 1.35 to obtain achieve 95% statistical efficiency? Or maybe just set its default as 1.35 to be compatible with sklearn?

Reference: https://scikit-learn.org/stable/modules/linear_model.html#huber-regression

pidefrem · 2021-01-12T18:34:03Z

\delta should be 1 by default, but adjustable would be better than fixed - the question is more if its possible and how to implement an additional parameter for a metric?

e.g. in sklearn interface, I would just use the keyword 'reg:pseudohubererror' to specify the metric

Passing an additional parameter for a metric is done for poisson regression and tweedie regression for example. See:

xgboost/src/objective/regression_obj.cu

Lines 457 to 464 in 03cd087

    
           // declare parameter 
        
           struct TweedieRegressionParam : public XGBoostParameter<TweedieRegressionParam> { 
        
             float tweedie_variance_power; 
        
             DMLC_DECLARE_PARAMETER(TweedieRegressionParam) { 
        
               DMLC_DECLARE_FIELD(tweedie_variance_power).set_range(1.0f, 2.0f).set_default(1.5f) 
        
                 .describe("Tweedie variance power.  Must be between in range [1, 2)."); 
        
             } 
        
           };

Making delta a parameter would imply some refactoring because PseudoHuberError has only static member functions:

xgboost/src/objective/regression_loss.h

Lines 101 to 112 in 03cd087

    
           struct PseudoHuberError { 
        
             XGBOOST_DEVICE static bst_float PredTransform(bst_float x) { 
        
               return x; 
        
             } 
        
             XGBOOST_DEVICE static bool CheckLabel(bst_float) { 
        
               return true; 
        
             } 
        
             XGBOOST_DEVICE static bst_float FirstOrderGradient(bst_float predt, bst_float label) { 
        
               const float z = predt - label; 
        
               const float scale_sqrt = std::sqrt(1 + std::pow(z, 2)); 
        
               return z/scale_sqrt; 
        
             }

PseudoHuberError is used as a template parameter to RegLossObj:

xgboost/src/objective/regression_obj.cu

Lines 41 to 42 in 03cd087

    
           template<typename Loss> 
        
           class RegLossObj : public ObjFunction {

trivialfis · 2021-01-12T19:05:14Z

Reopening as a reminder.

chan4899 · 2022-01-07T12:19:43Z

Any plans on making Delta adjustable? Turns out PseudoHuberLoss is very effective against outliers and I would like to tune delta further to get better results.

trivialfis added the feature-request label Apr 4, 2020

trivialfis self-assigned this Apr 15, 2020

LionOrCatThatIsTheQuestion mentioned this issue May 8, 2020

Pseudo-huber loss metric added #5647

Merged

trivialfis closed this as completed in #5647 May 18, 2020

trivialfis reopened this Jan 12, 2021

trivialfis mentioned this issue Mar 14, 2022

Implement slope for Pseduo-Huber. #7727

Merged

trivialfis closed this as completed in #7727 Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Pseudo-Huber loss support #5479

Native Pseudo-Huber loss support #5479

LionOrCatThatIsTheQuestion commented Apr 3, 2020 •

edited by trivialfis

Loading

trivialfis commented Apr 14, 2020

LionOrCatThatIsTheQuestion commented Apr 15, 2020 •

edited

Loading

trivialfis commented Apr 17, 2020

LionOrCatThatIsTheQuestion commented Apr 20, 2020 •

edited

Loading

LionOrCatThatIsTheQuestion commented Apr 20, 2020

trivialfis commented Apr 21, 2020

LionOrCatThatIsTheQuestion commented Apr 22, 2020 •

edited by trivialfis

Loading

trivialfis commented May 7, 2020

trivialfis commented May 7, 2020 •

edited

Loading

LionOrCatThatIsTheQuestion commented May 8, 2020

QuantHao commented Jul 12, 2020

pidefrem commented Jan 12, 2021 •

edited

Loading

trivialfis commented Jan 12, 2021

chan4899 commented Jan 7, 2022

Native Pseudo-Huber loss support #5479

Native Pseudo-Huber loss support #5479

Comments

LionOrCatThatIsTheQuestion commented Apr 3, 2020 • edited by trivialfis Loading

trivialfis commented Apr 14, 2020

LionOrCatThatIsTheQuestion commented Apr 15, 2020 • edited Loading

trivialfis commented Apr 17, 2020

LionOrCatThatIsTheQuestion commented Apr 20, 2020 • edited Loading

LionOrCatThatIsTheQuestion commented Apr 20, 2020

trivialfis commented Apr 21, 2020

LionOrCatThatIsTheQuestion commented Apr 22, 2020 • edited by trivialfis Loading

trivialfis commented May 7, 2020

trivialfis commented May 7, 2020 • edited Loading

LionOrCatThatIsTheQuestion commented May 8, 2020

QuantHao commented Jul 12, 2020

pidefrem commented Jan 12, 2021 • edited Loading

trivialfis commented Jan 12, 2021

chan4899 commented Jan 7, 2022

LionOrCatThatIsTheQuestion commented Apr 3, 2020 •

edited by trivialfis

Loading

LionOrCatThatIsTheQuestion commented Apr 15, 2020 •

edited

Loading

LionOrCatThatIsTheQuestion commented Apr 20, 2020 •

edited

Loading

LionOrCatThatIsTheQuestion commented Apr 22, 2020 •

edited by trivialfis

Loading

trivialfis commented May 7, 2020 •

edited

Loading

pidefrem commented Jan 12, 2021 •

edited

Loading