Skip to content

Don't fill first timestamps in TimeSeriesImputerTransform #634

Merged
merged 3 commits into from
Apr 5, 2022

Conversation

Mr-Geekman
Copy link
Contributor

@Mr-Geekman Mr-Geekman commented Apr 1, 2022

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Proposed Changes

Look #601, this is the implementation of the second solution.

Related Issue

#601.

Closing issues

Closes #601.

@Mr-Geekman Mr-Geekman added the enhancement New feature or request label Apr 1, 2022
@Mr-Geekman Mr-Geekman self-assigned this Apr 1, 2022
@Mr-Geekman Mr-Geekman added bug Something isn't working and removed enhancement New feature or request labels Apr 1, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 1, 2022

Codecov Report

Merging #634 (8b6e151) into master (c4a71ac) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #634   +/-   ##
=======================================
  Coverage   84.26%   84.27%           
=======================================
  Files         118      118           
  Lines        6242     6243    +1     
=======================================
+ Hits         5260     5261    +1     
  Misses        982      982           
Impacted Files Coverage Δ
etna/transforms/missing_values/imputation.py 100.00% <100.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@@ -93,12 +99,6 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
result_df = df.copy()
cur_nans = result_df[result_df[self.in_column].isna()].index

# check if all values are nans
if cur_nans.shape[0] == result_df.shape[0] and self.strategy != ImputerMode.zero:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure we can drop this check?
how about transform call in TSDataset.make_future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In make_future we have train_values + future_values. If all values are NaNs in make_future then on fit stage they also was NaN (because there was only train_values) and we will get an exception on fit stage.

However, if in the future we will make a transform in make_future on part of train data (for optimization) we can face the situation when we have non-nan values on fit, but all nans on transform. But this whole situation looks troublesome: we want to make imputation by train values and we have non-nans train values on fit, but we lost them on transform stage and can't make a transformation. That means that we've already made a mistake by this separation of data on fit and transform and this mistake isn't really a problem of our transform.

@github-actions
Copy link

github-actions bot commented Apr 5, 2022

🚀 Deployed on https://deploy-preview-634--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request April 5, 2022 15:00 Inactive
@Mr-Geekman Mr-Geekman merged commit 858e894 into master Apr 5, 2022
@Mr-Geekman Mr-Geekman deleted the issue-601 branch April 5, 2022 15:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Different handling of nans at the beginning in TimeSeriesImputerTransform
3 participants