Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native Pseudo-Huber loss support #5479

Closed
LionOrCatThatIsTheQuestion opened this issue Apr 3, 2020 · 14 comments · Fixed by #5647 or #7727
Closed

Native Pseudo-Huber loss support #5479

LionOrCatThatIsTheQuestion opened this issue Apr 3, 2020 · 14 comments · Fixed by #5647 or #7727
Assignees

Comments

@LionOrCatThatIsTheQuestion
Copy link
Contributor

LionOrCatThatIsTheQuestion commented Apr 3, 2020

Would it be possible to support Pseudo-Huber-loss (https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function) natively?

I implemented it as a custom losss function (I use the Python SKLearn API)

def huber_approx_obj(y_true, y_pred):
    z = y_pred - y_true
    delta = 1
    scale = 1 + (z/delta)**2
    scale_sqrt = np.sqrt(scale)
    grad = z/scale_sqrt
    hess = 1/(scale*scale_sqrt)
    return grad, hess

but the feature importance plots don't support custom loss functions (and it slows the learning process in comparison to 'reg:squarederror').

The basic problem is the need for a robust regession objective; MSE can be sensitive to outliers in application.

@trivialfis
Copy link
Member

@LionOrCatThatIsTheQuestion Would you like to make a PR for this? It should just be a simple class defined in src/objective/regression_loss.h. I am happy to help. ;-)

@LionOrCatThatIsTheQuestion
Copy link
Contributor Author

LionOrCatThatIsTheQuestion commented Apr 15, 2020

@trivialfis what evaluation metric should I use, rmse or mae would be my first guess?

Here is my code so far:

struct PseudoHuberError {

  XGBOOST_DEVICE static bst_float PredTransform(bst_float x) { 
    return x; 
  }

  XGBOOST_DEVICE static bool CheckLabel(bst_float label) {
    return true;
  }
 
  XGBOOST_DEVICE static bst_float FirstOrderGradient(bst_float predt, bst_float label) {
    const float z = predt - label;
    const float scale_sqrt = std::sqrt(1 + std::pow(z,2));
    return z/scale_sqrt;
  }
 
  XGBOOST_DEVICE static bst_float SecondOrderGradient(bst_float predt, bst_float label) {
    const float scale = 1 + std::pow(predt - label,2);
    const float scale_sqrt = std::sqrt(scale);
    return 1/(scale*scale_sqrt);
  }
 
  static bst_float ProbToMargin(bst_float base_score) { 
    return base_score; 
  }
 
  static const char* LabelErrorMsg() {
    return "";
  }
 
  static const char* DefaultEvalMetric() { 
    return "mae"; 
  }
 
  static const char* Name() { 
    return "reg:pseudohubererror"; 
  }
};

@trivialfis trivialfis self-assigned this Apr 15, 2020
@trivialfis
Copy link
Member

Does it make sense to use Pseudo-Huber loss as a metric?

@LionOrCatThatIsTheQuestion
Copy link
Contributor Author

LionOrCatThatIsTheQuestion commented Apr 20, 2020

Guess Pseudo-Huber loss would be an option too (seems natural to choose the same metric as loss function?) or MAE.

The idea was to implemented Pseudo-Huber loss as a twice differentiable approximation of MAE, so on second thought MSE as metric kind of defies the original purpose.

@LionOrCatThatIsTheQuestion
Copy link
Contributor Author

The advantage of MAE (and also MSE), is that they are better/natural interpretable. Pseudo-Huber loss does not have the same values as MAE in the case "abs(y_pred - y_true) > 1", it just has the same linear shape as opposed to quadratic.

@trivialfis
Copy link
Member

@LionOrCatThatIsTheQuestion We can set the default metric to be huber, as users can specify other metrics if they like. To me using huber as the default metric seems appropriate here. You can add a metric in src/metric/elementise_metric.cu. Feel free to ping me if you have any issue around the code base.

@LionOrCatThatIsTheQuestion
Copy link
Contributor Author

LionOrCatThatIsTheQuestion commented Apr 22, 2020

@trivialfis Could you explain what the function GetFinal(...) does? I used MAE as reference:

struct EvalRowPHE {
  char const *Name() const {
    return "phe";
  }
  XGBOOST_DEVICE bst_float EvalRow(bst_float label, bst_float pred) const {
    bst_float diff = label - pred;
    return std::sqrt(1 + diff * diff) - 1;
  }
  static bst_float GetFinal(bst_float esum, bst_float wsum) {
    return wsum == 0 ? esum : esum / wsum;
  }
};
`` `

@trivialfis
Copy link
Member

@LionOrCatThatIsTheQuestion Is there any reason we should fix \delta to be 1 ?

@trivialfis
Copy link
Member

trivialfis commented May 7, 2020

For normal cases GetFinal is just a way to compute weighted mean. One special case is the gamma deviance, which is weighted deviance: https://rdrr.io/cran/MetricsWeighted/man/deviance_gamma.html

@LionOrCatThatIsTheQuestion
Copy link
Contributor Author

\delta should be 1 by default, but adjustable would be better than fixed - the question is more if its possible and how to implement an additional parameter for a metric?

e.g. in sklearn interface, I would just use the keyword 'reg:pseudohubererror' to specify the metric

@QuantHao
Copy link

Would it be possible to support Pseudo-Huber-loss (https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function) natively?

I implemented it as a custom losss function (I use the Python SKLearn API)

def huber_approx_obj(y_true, y_pred):
    z = y_pred - y_true
    delta = 1
    scale = 1 + (z/delta)**2
    scale_sqrt = np.sqrt(scale)
    grad = z/scale_sqrt
    hess = 1/(scale*scale_sqrt)
    return grad, hess

but the feature importance plots don't support custom loss functions (and it slows the learning process in comparison to 'reg:squarederror').

The basic problem is the need for a robust regession objective; MSE can be sensitive to outliers in application.

Hi, is it possible to relax the constrain of delta equals 1 so that user could choose other delta such as 1.35 to obtain achieve 95% statistical efficiency? Or maybe just set its default as 1.35 to be compatible with sklearn?

Reference: https://scikit-learn.org/stable/modules/linear_model.html#huber-regression

@pidefrem
Copy link

pidefrem commented Jan 12, 2021

\delta should be 1 by default, but adjustable would be better than fixed - the question is more if its possible and how to implement an additional parameter for a metric?

e.g. in sklearn interface, I would just use the keyword 'reg:pseudohubererror' to specify the metric

Passing an additional parameter for a metric is done for poisson regression and tweedie regression for example. See:

// declare parameter
struct TweedieRegressionParam : public XGBoostParameter<TweedieRegressionParam> {
float tweedie_variance_power;
DMLC_DECLARE_PARAMETER(TweedieRegressionParam) {
DMLC_DECLARE_FIELD(tweedie_variance_power).set_range(1.0f, 2.0f).set_default(1.5f)
.describe("Tweedie variance power. Must be between in range [1, 2).");
}
};

Making delta a parameter would imply some refactoring because PseudoHuberError has only static member functions:

struct PseudoHuberError {
XGBOOST_DEVICE static bst_float PredTransform(bst_float x) {
return x;
}
XGBOOST_DEVICE static bool CheckLabel(bst_float) {
return true;
}
XGBOOST_DEVICE static bst_float FirstOrderGradient(bst_float predt, bst_float label) {
const float z = predt - label;
const float scale_sqrt = std::sqrt(1 + std::pow(z, 2));
return z/scale_sqrt;
}

PseudoHuberError is used as a template parameter to RegLossObj:

template<typename Loss>
class RegLossObj : public ObjFunction {

@trivialfis
Copy link
Member

Reopening as a reminder.

@trivialfis trivialfis reopened this Jan 12, 2021
@chan4899
Copy link

chan4899 commented Jan 7, 2022

Any plans on making Delta adjustable? Turns out PseudoHuberLoss is very effective against outliers and I would like to tune delta further to get better results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants