[python-package] refit sets init_score=0 #4951

TremaMiguel · 2022-01-15T14:33:20Z

Description

Changes in refitted booster due to different dataset are only true if init_score=0 and decay_rate=0. Quoting @jmoralez

... it seems that it sets init_score=0. So if we use refit with the same data and decay_rate=0.0 I'd expect to get the same results, however that seems to only be true if we set init_score=0 in the first booster. So with the current test, the assertion about the different scores passes even if we send in the same data to refit without modifying any of the dataset arguments.

reference: #4894 (comment)

Reproducible example

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import log_loss

import lightgbm as lgb
import numpy as np

X, y = load_breast_cancer(return_X_y=True)
lgb_train = lgb.Dataset(X, y, init_score=np.zeros(y.size))
train_params = {
    'objective': 'binary',
    'verbose': -1,
    'seed': 123
}
gbm = lgb.train(train_params, lgb_train, num_boost_round=10)
non_weight_err_pred = log_loss(y, gbm.predict(X))
refit_weight = np.random.rand(y.shape[0])
dataset_params = {
    'max_bin': 260,
    'min_data_in_bin': 5,
    'data_random_seed': 123,
}
new_gbm = gbm.refit(
    data=X,
    label=y,
    weight=refit_weight,
    dataset_params=dataset_params,
    decay_rate=0.0,
)
weight_err_pred = log_loss(y, new_gbm.predict(X))
train_set_params = new_gbm.train_set.get_params()
stored_weights = new_gbm.train_set.get_weight()
assert weight_err_pred != non_weight_err_pred
assert train_set_params["max_bin"] == 260
assert train_set_params["min_data_in_bin"] == 5
assert train_set_params["data_random_seed"] == 123
np.testing.assert_allclose(stored_weights, refit_weight)

Environment info

LightGBM v3.3.1.99

The text was updated successfully, but these errors were encountered:

jameslamb · 2022-01-15T18:22:49Z

Thanks for opening this!

Two very minor suggestions that might help you on projects on GitHub like this one in the future:

When you're opening an issue based on a specific conversation, please link to that conversation. In this case, [python-package] support customizing Dataset creation in Booster.refit() (fixes #3038) #4894 (comment)
GitHub supports syntax highlighting (click here for documentation), so when you create a code block in markdown anywhere on GitHub, consider adding the specific language you're using to the block.

I've edited this issue's description with both of those changes, just wanted to let you know.

TremaMiguel · 2022-01-16T16:46:05Z

I appreciate the clarification, thanks!

TremaMiguel mentioned this issue Jan 15, 2022

[python-package] support customizing Dataset creation in Booster.refit() (fixes #3038) #4894

Merged

jameslamb mentioned this issue Jan 21, 2022

[python] Reusing dataset constructed with free_raw_data = True isn't possible. Intended behaviour? #4965

Open

jameslamb added the bug label Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] refit sets init_score=0 #4951

[python-package] refit sets init_score=0 #4951

TremaMiguel commented Jan 15, 2022 •

edited by jameslamb

Loading

jameslamb commented Jan 15, 2022

TremaMiguel commented Jan 16, 2022

[python-package] refit sets init_score=0 #4951

[python-package] refit sets init_score=0 #4951

Comments

TremaMiguel commented Jan 15, 2022 • edited by jameslamb Loading

Description

Reproducible example

Environment info

jameslamb commented Jan 15, 2022

TremaMiguel commented Jan 16, 2022

TremaMiguel commented Jan 15, 2022 •

edited by jameslamb

Loading