Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate transform implementation with lightGBM, add separate header file support. #4734

Closed

Conversation

chjinche
Copy link
Contributor

@chjinche chjinche commented Oct 28, 2021

We introduce a built-in way for LightGBM to support feature transformation, which is more convenient for development and iteration. And compared to separate data transformation in advance, this way could put transformation and model in one place, so keep offline and online consistency.

The basic idea of integration method is to enable feature transformation when parsing data. Specifically,

  • For training, transform file is passed in dataset initialization. The files will be utilized to create a parser with transform, then take effect every time parse one line. Transform file content will be saved as a section in trained model file.
  • For inference, keep the same experience as previous. Transform will be restored along with model loading if exists.

Below are more details you may care.

  • The integration introduces no change to c apis and algorithm iteself.
  • Add a new option USE_TRANSFORM in compliation, which control the scenario that transform implementation impacts. So no interference with original code base if users do not turn on the option. Even if compilation with this option on, if no transform file passed by user explicitly, the code flow will keep the same as before, try other parsers to do data extraction.
  • Both c++ package and python package are supported with transform feature.
  • Add two params transform_file and header_file (optional) in data initialization.
  • Add separate header file support in case user may have many chunks of data without header.
  • Add transform task to ci, so ensure no regression.

@jameslamb
Copy link
Collaborator

closing this PR for now, please refer to #4733 (comment).

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants