You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It may be useful to impose the same order on both the return dataframe of TSDataset.to_dataset() and the dataframe df constructed during TSDataset.__init__() as the order imposed on the return dataframe of TSDataset.to_flatten() for the sake of consistency.
Current order of columns in both the return dataframe of TSDataset._to_dataset() and TSDataset.df places "target" along other features in alphabetical order, while order of columns in the return dataframe of TSDataset.to_flatten() places "target" after "timestamp" and "segment" and prior to other features in alphabetical order.
The order after TSDataset.to_flatten() makes observing "target" value more convenient (as it is not hidden among many other features) and emphasises its special role.
Proposal
I propose the following order of columns:
timestamp,
segment,
target,
other columns in alphabetical order.
How it can be done for TSDataset.to_dataset():
Find line df_copy = df_copy.pivot(index="timestamp", columns="segment") in etna.datasets.tsdataset.py
Prior to it reorder columns of df_copy in a way that puts "target" prior to other features, if said "target" is provided. It should look like feature_columns.remove("target") and in the next line df_copy = df_copy[["timestamp, "segment", "target"] + feature_columns]
How it can be done for TSDataset.__init__():
Find line df = pd.concat((df, self.df_exog), axis=1).loc[df.index].sort_index(axis=1, level=(0, 1)) in etna.datasets.tsdataset.py
Correct it in a way that puts "target" before other columns, still sorted in alphabetical order.
Test cases
Fix doctest of TSDataset.to_dataset().
Make sure current tests pass.
Add tests on order of columns for both modified methods to etna.tests.test_datasets.test_dataset.py:
test_to_dataset_correct_column_order for TSDataset.to_dataset()
test_init_with_exog_correct_column_order for TSDataset.__init__() with df_exog != None
Additional context
See issue#873 for similar issue for TSDataset.to_flatten()
The text was updated successfully, but these errors were encountered:
🚀 Feature Request
It may be useful to impose the same order on both the return dataframe of
TSDataset.to_dataset()
and the dataframedf
constructed duringTSDataset.__init__()
as the order imposed on the return dataframe ofTSDataset.to_flatten()
for the sake of consistency.Current order of columns in both the return dataframe of
TSDataset._to_dataset()
andTSDataset.df
places "target" along other features in alphabetical order, while order of columns in the return dataframe ofTSDataset.to_flatten()
places "target" after "timestamp" and "segment" and prior to other features in alphabetical order.The order after
TSDataset.to_flatten()
makes observing "target" value more convenient (as it is not hidden among many other features) and emphasises its special role.Proposal
I propose the following order of columns:
How it can be done for
TSDataset.to_dataset()
:df_copy = df_copy.pivot(index="timestamp", columns="segment")
inetna.datasets.tsdataset.py
feature_columns.remove("target")
and in the next linedf_copy = df_copy[["timestamp, "segment", "target"] + feature_columns]
How it can be done for
TSDataset.__init__()
:df = pd.concat((df, self.df_exog), axis=1).loc[df.index].sort_index(axis=1, level=(0, 1))
inetna.datasets.tsdataset.py
Test cases
TSDataset.to_dataset()
.etna.tests.test_datasets.test_dataset.py
:test_to_dataset_correct_column_order
forTSDataset.to_dataset()
test_init_with_exog_correct_column_order
forTSDataset.__init__()
withdf_exog != None
Additional context
See issue#873 for similar issue for
TSDataset.to_flatten()
The text was updated successfully, but these errors were encountered: