Skip to content

make plot_holiday work with prophet holiday format #708

Merged
merged 15 commits into from
May 31, 2022

Conversation

iKintosh
Copy link
Contributor

@iKintosh iKintosh commented May 24, 2022

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Proposed Changes

Make plot_holiday work with prophet holiday format.

Related Issue

Closing issues

Closes #702

@iKintosh
Copy link
Contributor Author

import pandas as pd
from etna.datasets import TSDataset
from etna.analysis import plot_holidays

ts = TSDataset(data_in_etna_format, freq="D")
new_years = pd.DataFrame({
  'holiday': 'New Year',
  'ds': pd.to_datetime(['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01']),
  'lower_window': 10,
  'upper_window': 10,
})
holidays = pd.concat((new_years,))

plot_holidays(ts, holidays=holidays, segments=["Finland_KaggleMart_Kaggle Hat"], columns_num=1, figsize=(10,7))

Снимок экрана 2022-05-24 в 16 40 00

@iKintosh
Copy link
Contributor Author

iKintosh commented May 24, 2022

I suggest we add converter function from old etna format to current (prophet) format.

No, it is better to use flag in my opinion.

@github-actions
Copy link

github-actions bot commented May 24, 2022

🚀 Deployed on https://deploy-preview-708--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request May 24, 2022 13:44 Inactive
@codecov-commenter
Copy link

codecov-commenter commented May 24, 2022

Codecov Report

Merging #708 (36ce990) into master (cb28c3a) will decrease coverage by 32.68%.
The diff coverage is 15.55%.

❗ Current head 36ce990 differs from pull request most recent head 91a275e. Consider uploading reports for the commit 91a275e to get more accurate results

@@             Coverage Diff             @@
##           master     #708       +/-   ##
===========================================
- Coverage   83.73%   51.05%   -32.69%     
===========================================
  Files         120      120               
  Lines        6577     6615       +38     
===========================================
- Hits         5507     3377     -2130     
- Misses       1070     3238     +2168     
Impacted Files Coverage Δ
etna/analysis/plotters.py 11.59% <15.55%> (-9.27%) ⬇️
etna/commands/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
etna/commands/backtest_command.py 0.00% <0.00%> (-96.43%) ⬇️
etna/commands/forecast_command.py 0.00% <0.00%> (-93.94%) ⬇️
etna/commands/__main__.py 0.00% <0.00%> (-87.50%) ⬇️
etna/commands/resolvers.py 0.00% <0.00%> (-80.00%) ⬇️
etna/analysis/outliers/density_outliers.py 22.44% <0.00%> (-75.52%) ⬇️
etna/datasets/datasets_generation.py 27.02% <0.00%> (-72.98%) ⬇️
etna/transforms/timestamp/time_flags.py 27.02% <0.00%> (-72.98%) ⬇️
etna/transforms/timestamp/fourier.py 28.00% <0.00%> (-72.00%) ⬇️
... and 70 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@github-actions github-actions bot temporarily deployed to pull request May 25, 2022 09:36 Inactive
@iKintosh iKintosh force-pushed the feat/holiday_datetime_format_converter branch from f7b3802 to 264c08f Compare May 25, 2022 12:23
@github-actions github-actions bot temporarily deployed to pull request May 25, 2022 12:30 Inactive
ds = holidays[holidays["holiday"] == name]["ds"]
dt = [ds]
if "upper_window" in holidays.columns:
ds_upper_bound = ds + pd.to_timedelta(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it work correctly if ts will have 15-minute frequency for example?

Try:

pd.to_timedelta(10, unit="15T")

It fails.

We should add smth like these into tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it and added test for this case

holidays_df.loc[dt, name] = 1
return holidays_df


def plot_holidays(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add Raises block for error with as_is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?
I've done it
raise ValueError("Parameter as_isshould be used withholiday: pd.DataFrame, not string.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean Raises block in documentation:

Raises
------
ValueError:
    error description

@@ -1336,27 +1395,22 @@ def plot_holidays(

* if str, then this is code of the country in `holidays <https://pypi.org/project/holidays/>`_ library;

* | if DataFrame, then dataframe with holidays is expected to have timestamp index with holiday names columns.
| In a holiday column values 0 represent absence of holiday in that timestamp, 1 represent the presence.
* if DataFrame, then dataframe is expected to be in prophet`s holiday format;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add here some info about as_is logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added on line 1409

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User starts reading documentation, during the reading about holidays param he fixed in his head the expected format, but few lines later there is a description of as_is format that discovers that holidays can be in the other format.

I think that documentation of holidays should cover that its behavior depends on as_is parameter and explain it. I don't think that description of format should be spread across two parameters: holidays and as_is.

@github-actions github-actions bot temporarily deployed to pull request May 25, 2022 14:49 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 25, 2022 15:08 Inactive
assert df.sum().sum() == 4


def test_create_holidays_df_non_day_freq():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try use not only "H" but "15T" we can have digits in our frequency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess, I've resolved it

@github-actions github-actions bot temporarily deployed to pull request May 25, 2022 16:21 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 25, 2022 17:02 Inactive
for bound in ds_upper_bound:
ds_add = ds + bound
dt.append(ds_add)
if "lower_window" in holidays.columns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prophet expects lower_window to be non-positive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistake

ds = holidays[holidays["holiday"] == name]["ds"]
dt = [ds]
if "upper_window" in holidays.columns:
periods = holidays[holidays["holiday"] == name]["upper_window"].fillna(0).tolist()[0] + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if window=0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing will happen, but I will add it as a test case.



def test_create_holidays_df_upper_window_only(simple_df):
"""Test if upper_window bounds are used even in case where holiday and TSDataset do not intersect."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be you need to rename the test to avoid this doctoring, the current name is misleading

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets name it "test_create_holidays_df_upper_window_out_of_index" and remove doctoring

def test_create_holidays_df_non_day_freq():
classic_df = generate_ar_df(periods=30, start_time="2020-01-01", n_segments=1, freq="H")
ts = TSDataset.to_dataset(classic_df)
holidays = pd.DataFrame({"holiday": "Christmas", "ds": pd.to_datetime(["2020-01-01"]), "upper_window": 3})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you need to add time into date and check that method generate exactly 3 points after the passed hour

@iKintosh iKintosh force-pushed the feat/holiday_datetime_format_converter branch from 3ec22cc to e639c56 Compare May 31, 2022 07:27
@github-actions github-actions bot temporarily deployed to pull request May 31, 2022 07:33 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 31, 2022 07:41 Inactive


def test_create_holidays_df_upper_window_only(simple_df):
"""Test if upper_window bounds are used even in case where holiday and TSDataset do not intersect."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets name it "test_create_holidays_df_upper_window_out_of_index" and remove doctoring

assert "New Year" in df.columns


def test_create_holidays_df_upper_window(simple_df):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the same two tests for the lower_window

if "lower_window" in holidays.columns:
periods = holidays[holidays["holiday"] == name]["lower_window"].fillna(0).tolist()[0]
if periods > 0:
raise ValueError("Lower windows should be negative.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-positive

if "upper_window" in holidays.columns:
periods = holidays[holidays["holiday"] == name]["upper_window"].fillna(0).tolist()[0]
if periods < 0:
raise ValueError("Upper windows should be positive.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-negative

Copy link
Collaborator

@alex-hse-repository alex-hse-repository left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alex-hse-repository alex-hse-repository enabled auto-merge (squash) May 31, 2022 09:37
@github-actions github-actions bot temporarily deployed to pull request May 31, 2022 09:43 Inactive
@alex-hse-repository alex-hse-repository merged commit e33f583 into master May 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dataframe format converter to plot_holidays
4 participants