Skip to content

Feature selection #875

Merged
merged 36 commits into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
605636f
CHANGE: poetry lock
Jun 28, 2022
3c07118
usless commit
Jul 1, 2022
feea479
trash
Jul 13, 2022
8a5a7d9
Merge branch 'master' of https://github.com/tinkoff-ai/etna
Aug 3, 2022
fdc92f9
Merge branch 'master' of https://github.com/tinkoff-ai/etna
Aug 5, 2022
f7d7d12
Merge branch 'master' of https://github.com/tinkoff-ai/etna
Aug 17, 2022
4429aff
ADD: notebook with forecasting strategies
Aug 18, 2022
2b33edc
CHANGE: delete usless commits
Aug 18, 2022
d8a169c
CHANGE: delete usless changes
Aug 18, 2022
3e75c8b
CHANGE: make format notebook
Aug 18, 2022
90cfd03
CHANGE: update changelog
Aug 18, 2022
e8aaf6c
CHANGE: lexical bugs
Aug 18, 2022
9661e35
CHANGE: review notebook
Aug 19, 2022
7bfe9b7
CHANGE: reformat
Aug 19, 2022
51b842a
CHANGE: notebook
Aug 21, 2022
dce3143
CHANGE: spell bugs
Aug 22, 2022
e4357e9
Merge branch 'master' into forecasting_strategies
scanhex12 Aug 22, 2022
51cc530
feature selection initial commit
Aug 22, 2022
e6c55d4
CHANGE: make format
Aug 22, 2022
fb4033b
CHANGE: notebook
Aug 24, 2022
1745737
CHANGE: make format
Aug 24, 2022
48d03c4
CHANGE: changelog
Aug 24, 2022
8e64992
CHANGE: feature selection notebook
Aug 26, 2022
8a95e8a
CHANGE: make format notebook
Aug 26, 2022
8ebc767
Merge branch 'master' into feature_selection
scanhex12 Aug 26, 2022
f3d5d74
CHANGE: feature selection notebook
Aug 26, 2022
503575b
Merge remote-tracking branch 'origin/master' into feature_selection
Jul 4, 2023
c38332e
feature: add feature_selection transform
Jul 5, 2023
e788235
docs: update list of tutorials
Jul 5, 2023
d6b6ba7
chore: update changelog
Jul 5, 2023
166a7e4
style: reformat code
Jul 5, 2023
e1ba7ab
fix: improve notebook content, update docs for gale-shapley
Jul 6, 2023
e3e194a
Merge remote-tracking branch 'origin/master' into feature_selection
Mr-Geekman Jul 10, 2023
255798a
Merge remote-tracking branch 'origin/master' into feature_selection
Mr-Geekman Jul 10, 2023
a9be2e9
fix: rerun feature_selection notebook
Mr-Geekman Jul 11, 2023
3b37d92
Merge branch 'master' into feature_selection
Mr-Geekman Jul 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `DeseasonalityTransform` ([#1307](https://github.com/tinkoff-ai/etna/pull/1307))
-
- Add extension with models from `statsforecast`: `StatsForecastARIMAModel`, `StatsForecastAutoARIMAModel`, `StatsForecastAutoCESModel`, `StatsForecastAutoETSModel`, `StatsForecastAutoThetaModel` ([#1295](https://github.com/tinkoff-ai/etna/pull/1297))
- Notebook `feature_selection` ([#875](https://github.com/tinkoff-ai/etna/pull/875))
-
-

Expand Down
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,15 +175,19 @@ We have also prepared a set of tutorials for an easy introduction:
| [Get started](https://github.com/tinkoff-ai/etna/tree/master/examples/get_started.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/get_started.ipynb) |
| [Backtest](https://github.com/tinkoff-ai/etna/tree/master/examples/backtest.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/backtest.ipynb) |
| [EDA](https://github.com/tinkoff-ai/etna/tree/master/examples/EDA.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/EDA.ipynb) |
| [Outliers](https://github.com/tinkoff-ai/etna/tree/master/examples/outliers.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/outliers.ipynb) |
| [Clustering](https://github.com/tinkoff-ai/etna/tree/master/examples/clustering.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/clustering.ipynb) |
| [Regressors and exogenous data](https://github.com/tinkoff-ai/etna/tree/master/examples/exogenous_data.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/exogenous_data.ipynb) |
| [Custom model and transform](https://github.com/tinkoff-ai/etna/tree/master/examples/custom_transform_and_model.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/custom_transform_and_model.ipynb) |
| [Deep learning models](https://github.com/tinkoff-ai/etna/tree/master/examples/NN_examples.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/NN_examples.ipynb) |
| [Ensembles](https://github.com/tinkoff-ai/etna/tree/master/examples/ensembles.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/ensembles.ipynb) |
| [Custom Transform and Model](https://github.com/tinkoff-ai/etna/tree/master/examples/custom_transform_and_model.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/custom_transform_and_model.ipynb) |
| [Exogenous data](https://github.com/tinkoff-ai/etna/tree/master/examples/exogenous_data.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/exogenous_data.ipynb) |
| [Forecasting strategies](https://github.com/tinkoff-ai/etna/blob/master/examples/forecasting_strategies.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/forecasting_strategies.ipynb) |
| [Classification](https://github.com/tinkoff-ai/etna/blob/master/examples/classification.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/classification.ipynb) |
| [Outliers](https://github.com/tinkoff-ai/etna/tree/master/examples/outliers.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/outliers.ipynb) |
| [Forecasting strategies](https://github.com/tinkoff-ai/etna/tree/master/examples/forecasting_strategies.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/forecasting_strategies.ipynb) |
| [Forecast interpretation](https://github.com/tinkoff-ai/etna/tree/master/examples/forecast_interpretation.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/forecast_interpretation.ipynb) |
| [Clustering](https://github.com/tinkoff-ai/etna/tree/master/examples/clustering.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/clustering.ipynb) |
| [AutoML](https://github.com/tinkoff-ai/etna/tree/master/examples/automl.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/automl.ipynb) |
| [Inference: using saved pipeline on a new data](https://github.com/tinkoff-ai/etna/tree/master/examples/inference.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/inference.ipynb) |
| [Hierarchical time series](https://github.com/tinkoff-ai/etna/blob/master/examples/hierarchical_pipeline.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/hierarchical_pipeline.ipynb) |
| [Classification](https://github.com/tinkoff-ai/etna/blob/master/examples/classification.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/classification.ipynb) |
| [Feature selection](https://github.com/tinkoff-ai/etna/blob/master/examples/feature_selection.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/feature_selection.ipynb) |

## Documentation

Expand Down
1 change: 1 addition & 0 deletions etna/analysis/feature_relevance/relevance_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ def _prepare_df(df: pd.DataFrame, df_exog: pd.DataFrame, segment: str, regressor

def get_statistics_relevance_table(df: pd.DataFrame, df_exog: pd.DataFrame) -> pd.DataFrame:
"""Calculate relevance table with p-values from tsfresh.

Parameters
----------
df:
Expand Down
27 changes: 19 additions & 8 deletions etna/transforms/feature_selection/gale_shapley.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def get_next_candidate(self) -> Optional[str]:

Returns
-------
name: str
name:
name of feature
"""
if self.last_candidate is None:
Expand All @@ -113,7 +113,7 @@ def check_segment(self, segment: str) -> bool:

Returns
-------
is_better: bool
is_better:
returns True if given segment is a better candidate than current match.
"""
if self.tmp_match is None or self.tmp_match_rank is None:
Expand Down Expand Up @@ -178,7 +178,7 @@ def _gale_shapley_iteration(self, available_segments: List[SegmentGaleShapley])

Returns
-------
success: bool
success:
True if there is at least one match attempt at the iteration

Notes
Expand Down Expand Up @@ -212,7 +212,7 @@ def __call__(self) -> Dict[str, str]:

Returns
-------
matching: Dict[str, str]
matching:
matching dict of segment x feature
"""
success_run = True
Expand All @@ -224,13 +224,23 @@ def __call__(self) -> Dict[str, str]:


class GaleShapleyFeatureSelectionTransform(BaseFeatureSelectionTransform):
"""GaleShapleyFeatureSelectionTransform provides feature filtering with Gale-Shapley matching algo according to relevance table.

"""Transform that provides feature filtering by Gale-Shapley matching algorithm according to the relevance table.

Notes
-----
Transform works with any type of features, however most of the models works only with regressors.
Therefore, it is recommended to pass the regressors into the feature selection transforms.

As input, we have a table of relevances with size :math:`N\_{f} \times N\_{s}` where :math:`N\_{f}` -- number of features,
:math:`N\_{s}` -- number of segments.
Procedure of filtering features consist of :math:`\lceil \frac{k}{N\_{s}} \rceil` iterations.
Algorithm of each iteration:

- build a matching between segments and features by `Gale–Shapley algorithm <https://en.wikipedia.org/wiki/Gale%E2%80%93Shapley_algorithm>`_
according to the relevance table, during the matching segments send proposals to features;
- select features to add by taking matched feature for each segment;
- add selected features to accumulated list of selected features taking into account that this list shouldn't exceed the size of ``top_k``;
- remove added features from future consideration.
"""

def __init__(
Expand Down Expand Up @@ -290,7 +300,8 @@ def _compute_gale_shapley_steps_number(top_k: int, n_segments: int, n_features:
return 1
if top_k < n_segments:
warnings.warn(
f"Given top_k={top_k} is less than n_segments. Algo will filter data without Gale-Shapley run."
f"Given top_k={top_k} is less than n_segments={n_segments}. "
f"Algo will filter data without Gale-Shapley run."
)
return 1
return ceil(top_k / n_segments)
Expand All @@ -309,7 +320,7 @@ def _gale_shapley_iteration(

Returns
-------
matching dict: Dict[str, str]
matching dict:
dict of segment x feature
"""
gssegments = [
Expand Down
54 changes: 36 additions & 18 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,17 @@ We have prepared a set of tutorials for an easy introduction:

#### 03. [EDA](https://github.com/tinkoff-ai/etna/tree/master/examples/EDA.ipynb)
- Visualization
- Plot
- Partial autocorrelation
- Cross-correlation
- Correlation heatmap
- Distribution
- Plot
- Partial autocorrelation
- Cross-correlation
- Correlation heatmap
- Distribution
- Outliers
- Median method
- Density method
- Median method
- Density method
- Change Points
- Change points plot
- Interactive change points plot
- Change points plot
- Interactive change points plot

#### 04. [Regressors and exogenous data](https://github.com/tinkoff-ai/etna/tree/master/examples/exogenous_data.ipynb)
- What is regressor?
Expand All @@ -35,7 +35,7 @@ We have prepared a set of tutorials for an easy introduction:
- EDA
- Forecast with regressors

#### 05. [Custom model and transform](https://github.com/tinkoff-ai/etna/tree/master/examples/exogenous_data.ipynb)
#### 05. [Custom model and transform](https://github.com/tinkoff-ai/etna/tree/master/examples/custom_transform_and_model.ipynb)
- What is Transform and how it works
- Custom Transform
- Per-segment Custom Transform
Expand All @@ -56,10 +56,10 @@ We have prepared a set of tutorials for an easy introduction:

#### 08. [Outliers](https://github.com/tinkoff-ai/etna/tree/master/examples/outliers.ipynb)
- Point outliers
- Median method
- Density method
- Prediction interval method
- Histogram method
- Median method
- Density method
- Prediction interval method
- Histogram method
- Sequence outliers
- Interactive visualization
- Outliers imputation
Expand Down Expand Up @@ -96,11 +96,11 @@ We have prepared a set of tutorials for an easy introduction:

#### 13. [AutoML notebook](https://github.com/tinkoff-ai/etna/tree/master/examples/automl.ipynb)
- Hyperparameters tuning
- How `Tune` works
- Example
- How `Tune` works
- Example
- General AutoML
- How `Auto` works
- Example
- How `Auto` works
- Example

#### 14. Hyperparameter search
- [Optuna](https://github.com/tinkoff-ai/etna/tree/master/examples/optuna)
Expand All @@ -115,3 +115,21 @@ We have prepared a set of tutorials for an easy introduction:
- Hierarchical structure
- Reconciliation methods
- Exogenous variables for hierarchical forecasts

#### 17. [Classification](https://github.com/tinkoff-ai/etna/tree/master/examples/classification.ipynb)
- Classification
- Load Dataset
- Feature extraction
- Cross validation
- Predictability analysis
- Load Dataset
- Load pretrained analyzer
- Analyze segments predictability

#### 18. [Feature selection](https://github.com/tinkoff-ai/etna/tree/master/examples/feature_selection.ipynb)
- Loading Dataset
- Feature selection methods
- Intro to feature selection
- `TreeFeatureSelectionTransform`
- `GaleShapleyFeatureSelectionTransform`
- `MRMRFeatureSelectionTransform`
736 changes: 736 additions & 0 deletions examples/feature_selection.ipynb

Large diffs are not rendered by default.