[FEA] Genetic Programming for Feature Engineering #2121

aerdem4 · 2020-04-22T09:28:05Z

Is your feature request related to a problem? Please describe.
Genetic Programming is very useful for feature engineering but main challenge is its time complexity. Luckily, they are easily parallelizable. Therefore, I believe it is a good fit for cuML.

Example: Let's assume you have 2 columns A and B, and a binary target. This target is 1 most of the time when A > B. It is very difficult to learn it with a tree based model but GP can engineer this feature for you.

Describe the solution you'd like
I would like to have the functionalities of gplearn accelerated on GPU. (https://gplearn.readthedocs.io/en/stable/)

teju85 · 2020-04-22T18:25:21Z

@aerdem4 so, are you only looking for a gpu-accelerated SymbolicTransformer?

aerdem4 · 2020-04-22T19:47:25Z

@teju85 I think all of them are the same except the metric. Multiple options for the metric would be nice but spearman is the most useful.

JohnZed · 2020-04-29T21:55:37Z

Alright, whose idea of a joke was it to tag this with Good First Issue? I'm looking at you @WXBN ! ;)

teju85 · 2021-01-19T05:49:46Z

@aerdem4 we are going to have an intern provide us with an initial implementation of this in cuML! For starters, can we assume max program AST depth of 10 or so? Or do you think that's too low to begin with? In practice, what's the deepest program you've come across?

aerdem4 · 2021-01-19T08:39:03Z

@teju85 thanks for the good news! I think 10 is enough for AST depth. Generated features don't need to be very complex but should capture the interactions the model can't. If the intern needs any help, I would be happy to be involved btw.

teju85 · 2021-01-20T02:08:14Z

tagging @vimarsh6739 who'll be implementing this.

aerdem4 · 2021-01-27T11:07:24Z

A simple Kaggle test case:
https://www.kaggle.com/c/loan-default-prediction This dataset has 800 features. People claim that without extracting the feature f527-f528, GBM performs poorly in this old competition. There may be more complex magic features too.

I can also create artificial datasets that we can test if GP can reverse engineer the features that contribute to the target.

@vinaydes

This PR introduces/proposes some of the basic and core (gpu-friendly!) data structures for implementing gplearn in cuML in order to address the issue #2121 . Tagging all who will be involved in this development: @vinaydes @venkywonka @vimarsh6739. PS: It also contains an experimental register-based stack implementation that will be useful while implementing CUDA-based AST evaluation, which is needed for organizing tournaments. Authors: - Thejaswi. N. S (@teju85) Approvers: - Corey J. Nolet (@cjnolet) URL: #3387

github-actions · 2021-03-14T19:14:45Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

aerdem4 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Apr 22, 2020

viclafargue added New Algorithm For tracking new algorithms that will be added to our existing collection proposal Change current process or code good first issue Good for newcomers and removed ? - Needs Triage Need team to review and classify labels Apr 29, 2020

JohnZed removed the good first issue Good for newcomers label Apr 29, 2020

teju85 mentioned this issue Jan 20, 2021

[REVIEW] genetic programming initial structures #3387

Merged

github-actions bot added the inactive-30d label Mar 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Genetic Programming for Feature Engineering #2121

[FEA] Genetic Programming for Feature Engineering #2121

aerdem4 commented Apr 22, 2020

teju85 commented Apr 22, 2020

aerdem4 commented Apr 22, 2020

JohnZed commented Apr 29, 2020

teju85 commented Jan 19, 2021

aerdem4 commented Jan 19, 2021

teju85 commented Jan 20, 2021

aerdem4 commented Jan 27, 2021 •

edited

Loading

github-actions bot commented Mar 14, 2021

[FEA] Genetic Programming for Feature Engineering #2121

[FEA] Genetic Programming for Feature Engineering #2121

Comments

aerdem4 commented Apr 22, 2020

teju85 commented Apr 22, 2020

aerdem4 commented Apr 22, 2020

JohnZed commented Apr 29, 2020

teju85 commented Jan 19, 2021

aerdem4 commented Jan 19, 2021

teju85 commented Jan 20, 2021

aerdem4 commented Jan 27, 2021 • edited Loading

github-actions bot commented Mar 14, 2021

aerdem4 commented Jan 27, 2021 •

edited

Loading