Skip to content

Add duplicate_data #305

Merged
merged 6 commits into from
Nov 25, 2021
Merged

Add duplicate_data #305

merged 6 commits into from
Nov 25, 2021

Conversation

Mr-Geekman
Copy link
Contributor

@Mr-Geekman Mr-Geekman commented Nov 22, 2021

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Proposed Changes

Look #291. But format has changed to require only timestamp column.

Related Issue

#291.

Closing issues

Closes #291.

@Mr-Geekman Mr-Geekman added the enhancement New feature or request label Nov 22, 2021
@Mr-Geekman Mr-Geekman self-assigned this Nov 22, 2021
@Mr-Geekman Mr-Geekman marked this pull request as ready for review November 22, 2021 12:25
@codecov-commenter
Copy link

codecov-commenter commented Nov 22, 2021

Codecov Report

Merging #305 (1f89bfc) into master (644b33a) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #305      +/-   ##
==========================================
+ Coverage   87.70%   87.76%   +0.06%     
==========================================
  Files          96       97       +1     
  Lines        4742     4766      +24     
==========================================
+ Hits         4159     4183      +24     
  Misses        583      583              
Impacted Files Coverage Δ
etna/datasets/__init__.py 100.00% <100.00%> (ø)
etna/datasets/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 644b33a...1f89bfc. Read the comment docs.

iKintosh
iKintosh previously approved these changes Nov 24, 2021
Copy link
Contributor

@iKintosh iKintosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

long = "long"


def duplicate_data(df: pd.DataFrame, segments: Sequence[str], format: str = DataFrameFormat.long) -> pd.DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would choose wide format by default, because it is etna format. But it's is just my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +42 to +66
Examples
--------
>>> from etna.datasets import generate_const_df
>>> from etna.datasets import duplicate_data
>>> from etna.datasets import TSDataset
>>> df = generate_const_df(
... periods=50, start_time="2020-03-10",
... n_segments=2, scale=1
... )
>>> timestamp = pd.date_range("2020-03-10", periods=100, freq="D")
>>> is_friday_13 = (timestamp.weekday == 4) & (timestamp.day == 13)
>>> df_exog_raw = pd.DataFrame({"timestamp": timestamp, "regressor_is_friday_13": is_friday_13})
>>> df_exog = duplicate_data(df_exog_raw, segments=["segment_0", "segment_1"], format="wide")
>>> df_ts_format = TSDataset.to_dataset(df)
>>> ts = TSDataset(df=df_ts_format, df_exog=df_exog, freq="D")
>>> ts.head()
segment segment_0 segment_1
feature regressor_is_friday_13 target regressor_is_friday_13 target
timestamp
2020-03-10 False 1.0 False 1.0
2020-03-11 False 1.0 False 1.0
2020-03-12 False 1.0 False 1.0
2020-03-13 True 1.0 True 1.0
2020-03-14 False 1.0 False 1.0
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example!

@iKintosh iKintosh merged commit 38623dc into master Nov 25, 2021
Carlosbogo added a commit to Carlosbogo/etna that referenced this pull request Nov 27, 2021
commit 381aeb3
Author: Carlosbg <[email protected]>
Date:   Sat Nov 27 14:23:16 2021 +0100

    Changes tests to keep consistency with tinkoff-ai#313

    Fixes tinkoff-ai#313 to close tinkoff-ai#290

commit c074d3b
Author: Andrey Alekseev <[email protected]>
Date:   Fri Nov 26 15:28:12 2021 +0300

    add acf plot; change eda notebook; (tinkoff-ai#318)

    * add acf plot; change eda notebook;

    * add changed to changelog

    Co-authored-by: an.alekseev <[email protected]>

commit 38623dc
Author: Mr-Geekman <[email protected]>
Date:   Thu Nov 25 19:44:42 2021 +0300

    Add `duplicate_data` (tinkoff-ai#305)

    * Add utils file, function , tests for it

    * Add example for

    * Update changelog

    * Correct typos in docstring

    * Change default value for duplicate_data

commit c2070a1
Author: Andrey Alekseev <[email protected]>
Date:   Thu Nov 25 19:44:17 2021 +0300

    add inverse transform as final step in forecast method; also rephrase… (tinkoff-ai#316)

    * add inverse transform as final step in forecast method; also rephrase _validate_backtest_dataset docstring

    * add inverse transform as final step in fit method; change test; change example

    Co-authored-by: an.alekseev <[email protected]>

commit e66058f
Author: Andrey Alekseev <[email protected]>
Date:   Wed Nov 24 18:13:03 2021 +0300

    Parsing type hints in Sphinx documentation (tinkoff-ai#205)

    * update sphinx in order to parse type hints; make flake8-docstyle numpydocstyle compatible
    * update deps

commit e814219
Author: Martin Gabdushev <[email protected]>
Date:   Wed Nov 24 18:10:04 2021 +0300

    :bomb: release 1.3.3 (tinkoff-ai#312)
@iKintosh iKintosh deleted the issue-291 branch December 9, 2021 17:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add function to duplicate exogenous data
3 participants