-
Notifications
You must be signed in to change notification settings - Fork 80
make plot_holiday work with prophet holiday format #708
make plot_holiday work with prophet holiday format #708
Conversation
import pandas as pd
from etna.datasets import TSDataset
from etna.analysis import plot_holidays
ts = TSDataset(data_in_etna_format, freq="D")
new_years = pd.DataFrame({
'holiday': 'New Year',
'ds': pd.to_datetime(['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01']),
'lower_window': 10,
'upper_window': 10,
})
holidays = pd.concat((new_years,))
plot_holidays(ts, holidays=holidays, segments=["Finland_KaggleMart_Kaggle Hat"], columns_num=1, figsize=(10,7)) |
No, it is better to use flag in my opinion. |
🚀 Deployed on https://deploy-preview-708--etna-docs.netlify.app |
Codecov Report
@@ Coverage Diff @@
## master #708 +/- ##
===========================================
- Coverage 83.73% 51.05% -32.69%
===========================================
Files 120 120
Lines 6577 6615 +38
===========================================
- Hits 5507 3377 -2130
- Misses 1070 3238 +2168
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
f7b3802
to
264c08f
Compare
etna/analysis/plotters.py
Outdated
ds = holidays[holidays["holiday"] == name]["ds"] | ||
dt = [ds] | ||
if "upper_window" in holidays.columns: | ||
ds_upper_bound = ds + pd.to_timedelta( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it work correctly if ts
will have 15-minute frequency for example?
Try:
pd.to_timedelta(10, unit="15T")
It fails.
We should add smth like these into tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it and added test for this case
holidays_df.loc[dt, name] = 1 | ||
return holidays_df | ||
|
||
|
||
def plot_holidays( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add Raises block for error with as_is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
I've done it
raise ValueError("Parameter
as_isshould be used with
holiday: pd.DataFrame, not string.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean Raises block in documentation:
Raises
------
ValueError:
error description
@@ -1336,27 +1395,22 @@ def plot_holidays( | |||
|
|||
* if str, then this is code of the country in `holidays <https://pypi.org/project/holidays/>`_ library; | |||
|
|||
* | if DataFrame, then dataframe with holidays is expected to have timestamp index with holiday names columns. | |||
| In a holiday column values 0 represent absence of holiday in that timestamp, 1 represent the presence. | |||
* if DataFrame, then dataframe is expected to be in prophet`s holiday format; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add here some info about as_is logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added on line 1409
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User starts reading documentation, during the reading about holidays
param he fixed in his head the expected format, but few lines later there is a description of as_is
format that discovers that holidays
can be in the other format.
I think that documentation of holidays
should cover that its behavior depends on as_is
parameter and explain it. I don't think that description of format should be spread across two parameters: holidays
and as_is
.
assert df.sum().sum() == 4 | ||
|
||
|
||
def test_create_holidays_df_non_day_freq(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try use not only "H" but "15T" we can have digits in our frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess, I've resolved it
for bound in ds_upper_bound: | ||
ds_add = ds + bound | ||
dt.append(ds_add) | ||
if "lower_window" in holidays.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prophet expects lower_window
to be non-positive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my mistake
etna/analysis/plotters.py
Outdated
ds = holidays[holidays["holiday"] == name]["ds"] | ||
dt = [ds] | ||
if "upper_window" in holidays.columns: | ||
periods = holidays[holidays["holiday"] == name]["upper_window"].fillna(0).tolist()[0] + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if window=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing will happen, but I will add it as a test case.
tests/test_analysis/test_plotters.py
Outdated
|
||
|
||
def test_create_holidays_df_upper_window_only(simple_df): | ||
"""Test if upper_window bounds are used even in case where holiday and TSDataset do not intersect.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be you need to rename the test to avoid this doctoring, the current name is misleading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets name it "test_create_holidays_df_upper_window_out_of_index" and remove doctoring
tests/test_analysis/test_plotters.py
Outdated
def test_create_holidays_df_non_day_freq(): | ||
classic_df = generate_ar_df(periods=30, start_time="2020-01-01", n_segments=1, freq="H") | ||
ts = TSDataset.to_dataset(classic_df) | ||
holidays = pd.DataFrame({"holiday": "Christmas", "ds": pd.to_datetime(["2020-01-01"]), "upper_window": 3}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you need to add time into date and check that method generate exactly 3 points after the passed hour
3ec22cc
to
e639c56
Compare
tests/test_analysis/test_plotters.py
Outdated
|
||
|
||
def test_create_holidays_df_upper_window_only(simple_df): | ||
"""Test if upper_window bounds are used even in case where holiday and TSDataset do not intersect.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets name it "test_create_holidays_df_upper_window_out_of_index" and remove doctoring
assert "New Year" in df.columns | ||
|
||
|
||
def test_create_holidays_df_upper_window(simple_df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the same two tests for the lower_window
etna/analysis/plotters.py
Outdated
if "lower_window" in holidays.columns: | ||
periods = holidays[holidays["holiday"] == name]["lower_window"].fillna(0).tolist()[0] | ||
if periods > 0: | ||
raise ValueError("Lower windows should be negative.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-positive
etna/analysis/plotters.py
Outdated
if "upper_window" in holidays.columns: | ||
periods = holidays[holidays["holiday"] == name]["upper_window"].fillna(0).tolist()[0] | ||
if periods < 0: | ||
raise ValueError("Upper windows should be positive.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-negative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
7111893
to
f5163a6
Compare
IMPORTANT: Please do not create a Pull Request without creating an issue first.
Before submitting (must do checklist)
Type of Change
Proposed Changes
Make plot_holiday work with prophet holiday format.
Related Issue
Closing issues
Closes #702