Skip to content

Fix TSDataset.make_future to handle hierarchy, quantiles, target components #1248

Merged
merged 5 commits into from
Apr 28, 2023

Conversation

Mr-Geekman
Copy link
Contributor

@Mr-Geekman Mr-Geekman commented Apr 26, 2023

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Look #1244.

Closing issues

Closes #1244.

@Mr-Geekman Mr-Geekman self-assigned this Apr 26, 2023
@Mr-Geekman Mr-Geekman changed the title Fix make_future to handle hierarchy, quantiles, target components Fix TSDataset.make_future to handle hierarchy, quantiles, target components Apr 26, 2023
@github-actions
Copy link

github-actions bot commented Apr 26, 2023

@github-actions github-actions bot temporarily deployed to pull request April 26, 2023 13:00 Inactive
@codecov-commenter
Copy link

codecov-commenter commented Apr 26, 2023

Codecov Report

Merging #1248 (e9c8880) into master (5b9783f) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #1248      +/-   ##
==========================================
- Coverage   87.88%   87.86%   -0.02%     
==========================================
  Files         186      186              
  Lines       10639    10643       +4     
==========================================
+ Hits         9350     9352       +2     
- Misses       1289     1291       +2     
Impacted Files Coverage Δ
etna/datasets/tsdataset.py 92.95% <100.00%> (+0.05%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -295,6 +297,13 @@ def make_future(
f"NaN-s will be used for missing values"
)

# remove components and quantiles
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of them are not added to raw_df, not sure that we really need to drop the explicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our fixtures we create some datasets, giving components and quantiles in raw_df. For example, you can look at fixtures that was used in test_make_future_removes_quantiles and test_make_future_removes_target_components. We can try to fix this fixtures, I think.

For components we can in some sense guarantee that there will be no components (because they are added using special method). But for quantiles we can't make such guarantee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a good solution should consist of two steps:

  • Rework quantiles to be like components
  • Fix our fixtures to follow the procedure of adding quantiles and components

We can probably create some task for this.

Offtop, but I also think that we should somehow make a more formal restriction on columns that can be passed into df of TSDataset.__init__. Because we suggest that it is only endog, but we don't check this and in a lot of inner code don't follow this rule, so it can lead to confusing situations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, the right solution is to rework quantiles. What kind of restrictions do you mean? Like allow only target?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we going to do with this issue? Leave dropping this way and fix it after fixing quantiles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss do we want this restriction or not, but the proposal is to allow in df to be only columns timestamp, segment, target. We can also generate a good error message if smth went wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we going to do with this issue? Leave dropping this way and fix it after fixing quantiles?

Probably, we can do like this, yes. This piece of code won't break anything (as I understand), but can safe us for now from some strange moments. We can add todo about fixing it after reworking quantiles. But we have a problem that we aren't really fixing todos. We can create a task on fixing quantiles and point out in that task to remove these redundant lines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let's leave TODO
  2. We can restrict df to have only timestamp, segment, target -- I guess it should prevent people adding exogs incorrectly. However it seems that in some places we create dataset out of df with extra columns ourselvs

tests/test_datasets/test_dataset.py Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request April 27, 2023 11:28 Inactive
@github-actions github-actions bot temporarily deployed to pull request April 28, 2023 07:19 Inactive
@alex-hse-repository alex-hse-repository merged commit 7cbc065 into master Apr 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Lose hierarchical structure after make_future
3 participants