-
Notifications
You must be signed in to change notification settings - Fork 80
Add duplicate_data
#305
Add duplicate_data
#305
Conversation
Codecov Report
@@ Coverage Diff @@
## master #305 +/- ##
==========================================
+ Coverage 87.70% 87.76% +0.06%
==========================================
Files 96 97 +1
Lines 4742 4766 +24
==========================================
+ Hits 4159 4183 +24
Misses 583 583
Continue to review full report at Codecov.
|
# Conflicts: # CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
etna/datasets/utils.py
Outdated
long = "long" | ||
|
||
|
||
def duplicate_data(df: pd.DataFrame, segments: Sequence[str], format: str = DataFrameFormat.long) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would choose wide format by default, because it is etna format. But it's is just my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Examples | ||
-------- | ||
>>> from etna.datasets import generate_const_df | ||
>>> from etna.datasets import duplicate_data | ||
>>> from etna.datasets import TSDataset | ||
>>> df = generate_const_df( | ||
... periods=50, start_time="2020-03-10", | ||
... n_segments=2, scale=1 | ||
... ) | ||
>>> timestamp = pd.date_range("2020-03-10", periods=100, freq="D") | ||
>>> is_friday_13 = (timestamp.weekday == 4) & (timestamp.day == 13) | ||
>>> df_exog_raw = pd.DataFrame({"timestamp": timestamp, "regressor_is_friday_13": is_friday_13}) | ||
>>> df_exog = duplicate_data(df_exog_raw, segments=["segment_0", "segment_1"], format="wide") | ||
>>> df_ts_format = TSDataset.to_dataset(df) | ||
>>> ts = TSDataset(df=df_ts_format, df_exog=df_exog, freq="D") | ||
>>> ts.head() | ||
segment segment_0 segment_1 | ||
feature regressor_is_friday_13 target regressor_is_friday_13 target | ||
timestamp | ||
2020-03-10 False 1.0 False 1.0 | ||
2020-03-11 False 1.0 False 1.0 | ||
2020-03-12 False 1.0 False 1.0 | ||
2020-03-13 True 1.0 True 1.0 | ||
2020-03-14 False 1.0 False 1.0 | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example!
commit 381aeb3 Author: Carlosbg <[email protected]> Date: Sat Nov 27 14:23:16 2021 +0100 Changes tests to keep consistency with tinkoff-ai#313 Fixes tinkoff-ai#313 to close tinkoff-ai#290 commit c074d3b Author: Andrey Alekseev <[email protected]> Date: Fri Nov 26 15:28:12 2021 +0300 add acf plot; change eda notebook; (tinkoff-ai#318) * add acf plot; change eda notebook; * add changed to changelog Co-authored-by: an.alekseev <[email protected]> commit 38623dc Author: Mr-Geekman <[email protected]> Date: Thu Nov 25 19:44:42 2021 +0300 Add `duplicate_data` (tinkoff-ai#305) * Add utils file, function , tests for it * Add example for * Update changelog * Correct typos in docstring * Change default value for duplicate_data commit c2070a1 Author: Andrey Alekseev <[email protected]> Date: Thu Nov 25 19:44:17 2021 +0300 add inverse transform as final step in forecast method; also rephrase… (tinkoff-ai#316) * add inverse transform as final step in forecast method; also rephrase _validate_backtest_dataset docstring * add inverse transform as final step in fit method; change test; change example Co-authored-by: an.alekseev <[email protected]> commit e66058f Author: Andrey Alekseev <[email protected]> Date: Wed Nov 24 18:13:03 2021 +0300 Parsing type hints in Sphinx documentation (tinkoff-ai#205) * update sphinx in order to parse type hints; make flake8-docstyle numpydocstyle compatible * update deps commit e814219 Author: Martin Gabdushev <[email protected]> Date: Wed Nov 24 18:10:04 2021 +0300 :bomb: release 1.3.3 (tinkoff-ai#312)
IMPORTANT: Please do not create a Pull Request without creating an issue first.
Before submitting (must do checklist)
Type of Change
Proposed Changes
Look #291. But format has changed to require only
timestamp
column.Related Issue
#291.
Closing issues
Closes #291.