Skip to content

Speed up feature selection #1294

Merged
merged 13 commits into from
Jul 3, 2023
Merged

Speed up feature selection #1294

merged 13 commits into from
Jul 3, 2023

Conversation

alex-hse-repository
Copy link
Collaborator

@alex-hse-repository alex-hse-repository commented Jun 22, 2023

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

  1. Explicitly set ml_task="regression" in calculate_relevance_table, in case of integer target it is set with incorrect value and significantly slows down get_statistics_relevance_table
  2. Change redundancy computation method in mrmr, now it computes redundancy only inside segment, which change the time complexity term from n_segments^2 to n_segments. Add fast_redundancy flag for for backward compatibility
  3. For further speed up It is suggested to use numba as we do the same operation on many pairs of columns

Closing issues

closes #886

@alex-hse-repository alex-hse-repository added the enhancement New feature or request label Jun 22, 2023
@alex-hse-repository alex-hse-repository self-assigned this Jun 22, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2023

Codecov Report

Merging #1294 (e7d8a75) into master (99782c8) will decrease coverage by 0.24%.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #1294      +/-   ##
==========================================
- Coverage   88.46%   88.22%   -0.24%     
==========================================
  Files         193      193              
  Lines       11750    11758       +8     
==========================================
- Hits        10395    10374      -21     
- Misses       1355     1384      +29     
Impacted Files Coverage Δ
etna/analysis/feature_relevance/relevance_table.py 100.00% <100.00%> (ø)
etna/analysis/feature_selection/mrmr_selection.py 100.00% <100.00%> (ø)
...transforms/feature_selection/feature_importance.py 98.88% <100.00%> (+0.01%) ⬆️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link

github-actions bot commented Jun 22, 2023

@github-actions github-actions bot temporarily deployed to pull request June 22, 2023 07:44 Inactive
@github-actions github-actions bot temporarily deployed to pull request June 22, 2023 12:39 Inactive
@github-actions github-actions bot temporarily deployed to pull request June 23, 2023 05:13 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 3, 2023 09:11 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 3, 2023 10:11 Inactive
@Mr-Geekman Mr-Geekman merged commit fc80b96 into master Jul 3, 2023
@Mr-Geekman Mr-Geekman deleted the issue-886 branch July 3, 2023 11:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try to speed up feature selection methods
3 participants