Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] refit sets init_score=0 #4951

Open
TremaMiguel opened this issue Jan 15, 2022 · 2 comments
Open

[python-package] refit sets init_score=0 #4951

TremaMiguel opened this issue Jan 15, 2022 · 2 comments
Labels

Comments

@TremaMiguel
Copy link
Contributor

TremaMiguel commented Jan 15, 2022

Description

Changes in refitted booster due to different dataset are only true if init_score=0 and decay_rate=0. Quoting @jmoralez

... it seems that it sets init_score=0. So if we use refit with the same data and decay_rate=0.0 I'd expect to get the same results, however that seems to only be true if we set init_score=0 in the first booster. So with the current test, the assertion about the different scores passes even if we send in the same data to refit without modifying any of the dataset arguments.

reference: #4894 (comment)

Reproducible example

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import log_loss

import lightgbm as lgb
import numpy as np

X, y = load_breast_cancer(return_X_y=True)
lgb_train = lgb.Dataset(X, y, init_score=np.zeros(y.size))
train_params = {
    'objective': 'binary',
    'verbose': -1,
    'seed': 123
}
gbm = lgb.train(train_params, lgb_train, num_boost_round=10)
non_weight_err_pred = log_loss(y, gbm.predict(X))
refit_weight = np.random.rand(y.shape[0])
dataset_params = {
    'max_bin': 260,
    'min_data_in_bin': 5,
    'data_random_seed': 123,
}
new_gbm = gbm.refit(
    data=X,
    label=y,
    weight=refit_weight,
    dataset_params=dataset_params,
    decay_rate=0.0,
)
weight_err_pred = log_loss(y, new_gbm.predict(X))
train_set_params = new_gbm.train_set.get_params()
stored_weights = new_gbm.train_set.get_weight()
assert weight_err_pred != non_weight_err_pred
assert train_set_params["max_bin"] == 260
assert train_set_params["min_data_in_bin"] == 5
assert train_set_params["data_random_seed"] == 123
np.testing.assert_allclose(stored_weights, refit_weight)

Environment info

LightGBM v3.3.1.99

@jameslamb
Copy link
Collaborator

Thanks for opening this!

Two very minor suggestions that might help you on projects on GitHub like this one in the future:

  1. When you're opening an issue based on a specific conversation, please link to that conversation. In this case, [python-package] support customizing Dataset creation in Booster.refit() (fixes #3038) #4894 (comment)
  2. GitHub supports syntax highlighting (click here for documentation), so when you create a code block in markdown anywhere on GitHub, consider adding the specific language you're using to the block.

I've edited this issue's description with both of those changes, just wanted to let you know.

@TremaMiguel
Copy link
Contributor Author

I appreciate the clarification, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants