New data access methods in TSDataset #809

alex-hse-repository · 2022-07-21T12:27:58Z

Before submitting (must do checklist)

Did you read the contribution guide?
Did you update the docs? We use Numpy format for all the methods and classes.
Did you write any new necessary tests?
Did you update the CHANGELOG?

Proposed Changes

Closing issues

closes #793

github-actions · 2022-07-21T12:31:32Z

🚀 Deployed on https://deploy-preview-809--etna-docs.netlify.app

etna/datasets/tsdataset.py

Mr-Geekman · 2022-07-21T12:39:14Z

etna/datasets/tsdataset.py

-        return self.to_flatten(self.df)
+            if columns is None:
+                return self.df.copy()
+            return self.df.loc[:, self.idx[:, columns]].copy()


We should investigate is it a problem to pass ":" here in place of segments.

The problem is described here: #775.

I am not sure that it is important here, as this method just return the copy of dataframe with the specified columns

If it leads to swapping values of columns (that was a mistake), then you will get a broken dataframe. Ask the author of that bug about it.

Made it deterministic, add comment in #775 to fix this place either

etna/datasets/tsdataset.py

codecov-commenter · 2022-07-21T12:57:37Z

Codecov Report

Merging #809 (cb14b02) into tsdataset_2 (e475d5d) will decrease coverage by 34.58%.
The diff coverage is 36.66%.

@@               Coverage Diff                @@
##           tsdataset_2     #809       +/-   ##
================================================
- Coverage        84.01%   49.43%   -34.59%     
================================================
  Files              125      125               
  Lines             7193     7218       +25     
================================================
- Hits              6043     3568     -2475     
- Misses            1150     3650     +2500

Impacted Files	Coverage Δ
etna/datasets/tsdataset.py	`64.32% <36.66%> (-26.38%)`	⬇️
etna/commands/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
etna/commands/backtest_command.py	`0.00% <0.00%> (-97.06%)`	⬇️
etna/commands/forecast_command.py	`0.00% <0.00%> (-94.88%)`	⬇️
etna/commands/__main__.py	`0.00% <0.00%> (-87.50%)`	⬇️
etna/commands/resolvers.py	`0.00% <0.00%> (-80.00%)`	⬇️
etna/analysis/outliers/density_outliers.py	`22.44% <0.00%> (-75.52%)`	⬇️
etna/datasets/datasets_generation.py	`27.02% <0.00%> (-72.98%)`	⬇️
etna/transforms/timestamp/time_flags.py	`27.02% <0.00%> (-72.98%)`	⬇️
etna/transforms/timestamp/fourier.py	`28.00% <0.00%> (-72.00%)`	⬇️
... and 75 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Mr-Geekman · 2022-07-21T13:03:39Z

etna/datasets/tsdataset.py

        """Return pandas DataFrame with flatten index.

        Parameters
        ----------
        df:
            DataFrame in ETNA format.
-
+        columns:
+            List of columns to return


Don't we want to check that columns are present? Or we are content with error from pandas itself?

Actually I do not get an error on pandas 1.3.5, anyway error from pandas won't be ambiguous, it will be clear that there is no such column in the dataset

Mr-Geekman · 2022-07-21T13:03:58Z

etna/datasets/tsdataset.py

@@ -595,7 +602,9 @@ def to_pandas(self, flatten: bool = False) -> pd.DataFrame:
            * If False, return pd.DataFrame with multiindex

            * If True, return with flatten index
-
+        columns:


Same here

Don't we want to check that columns are present? Or we are content with error from pandas itself?

tests/test_datasets/test_dataset.py

Mr-Geekman

Fix documentation and look at comments above.

tests/test_datasets/test_dataset.py

martins0n

👍

CHANGELOG.md

alex-hse-repository added 6 commits July 21, 2022 13:46

Add column attribute to to_pandas, to_flatten

5f7d70c

Add tests for to_pandas, to_flatten

3a0aef4

Add method remove_columns

4fa9691

Add tests

d35f087

Add method add_columns_from_pandas

b13ecc4

Add tests

14ce60b

alex-hse-repository added the enhancement New feature or request label Jul 21, 2022

alex-hse-repository self-assigned this Jul 21, 2022

Update Changelog

9e13e55

alex-hse-repository changed the title ~~Issue 793~~ New data access methods in TSDataset Jul 21, 2022

alex-hse-repository requested a review from Mr-Geekman July 21, 2022 12:31

github-actions bot temporarily deployed to pull request July 21, 2022 12:33 Inactive