Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature selection updates #58

Merged
merged 4 commits into from
Nov 9, 2021
Merged

feature selection updates #58

merged 4 commits into from
Nov 9, 2021

Conversation

ppdebreuck
Copy link
Owner

Feature selection:

  1. Nans inside the featurized dataframe currently creates an error when performing feature selection on it (doesn't work with the NMI). The solution is to preprocess the data exactly how it is done currently for the model fitting. It solves the Nan issue, and moreover is closer to the actual data used for the model. For now, it only performs the preprocessing when Nans are present. But this could become the future default behaviour.

  2. Big datasets. Computing the NMI on big datasets is slow. A simple solution is to sample the data to compute the NMI. As shown in my master thesis, this convergence below 10,000 datapoint on most matminer features.

@ppdebreuck ppdebreuck marked this pull request as ready for review November 9, 2021 09:15
@ppdebreuck ppdebreuck merged commit f699775 into master Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant