-
Notifications
You must be signed in to change notification settings - Fork 80
Conversation
Codecov Report
@@ Coverage Diff @@
## master #431 +/- ##
==========================================
+ Coverage 87.80% 87.98% +0.18%
==========================================
Files 114 115 +1
Lines 5354 5435 +81
==========================================
+ Hits 4701 4782 +81
Misses 653 653
Continue to review full report at Codecov.
|
name of added column. If not given, use `self.__repr__()` or `regressor_{self.__repr__()}` if it is a regressor | ||
strategy: | ||
filling encoding in not fitted values: | ||
- If "new_value", then replace missing dates with '-1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like "dates" isn't valid here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use values
here
|
||
|
||
class LE(preprocessing.LabelEncoder): | ||
def transform(self, y: pd.Series, strategy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to create Enum class like in other cases like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly okay. But need some changes.
Kind reminder: if you think that PR is not ready for merge you should make it a draft)
from etna.transforms.base import Transform | ||
|
||
|
||
class LE(preprocessing.LabelEncoder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it _LabelEncoder
, because this class is not intended for users and longer name makes it easier to understand class meaning.
class _OneSegmentLabelEncoderTransform(Transform): | ||
"""Replace the values in the column with the Label encoding""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no inverse transform
method on purpose?
name of added column. If not given, use `self.__repr__()` or `regressor_{self.__repr__()}` if it is a regressor | ||
strategy: | ||
filling encoding in not fitted values: | ||
- If "new_value", then replace missing dates with '-1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use values
here
self.strategy = strategy | ||
super().__init__(transform=_OneSegmentLabelEncoderTransform(self.in_column, self.out_column, self.strategy)) | ||
|
||
def _get_out_column(self, out_column: Optional[str]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be named get_column_name
as in other transforms
return self.__repr__() | ||
|
||
|
||
############################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to review what's below this line?
elif strategy == "mean": | ||
filling_value = np.mean(encoded[~np.isin(y, diff)]) | ||
else: | ||
raise ValueError(f"There are no '{strategy}' strategy exists") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message doesn't look very good.
May be better: The strategy {strategy} doesn't exist.
if self.out_column: | ||
return self.out_column | ||
if self.in_column.startswith("regressor"): | ||
temp_transform = LabelBinarizerTransform(in_column=self.in_column, out_column=self.out_column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use here just self.__repr__
?
if self.out_column: | ||
return self.out_column | ||
if self.in_column.startswith("regressor"): | ||
temp_transform = LabelEncoderTransform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use here just self.__repr__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is about this?
return result_df | ||
|
||
|
||
class LabelBinarizerTransform(PerSegmentWrapper): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add description to the class. What is its purpose.
tests/test_transforms/test_encoders/test_categorical_transform.py
Outdated
Show resolved
Hide resolved
|
||
@pytest.fixture | ||
def two_df_with_new_values(): | ||
df1 = TSDataset.to_dataset(generate_periodic_df(3, "2020-01-01", 10, 2, n_segments=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use named parameters here because it is not obvious what is the meaning of all these parameters.
- If "mean", then replace missing dates using the mean in encoded column | ||
- If "none", then replace missing dates with None | ||
inplace: | ||
if True, apply resampling inplace to in_column, if False, add transformed column to dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resampling is irrelevant here
self.strategy = strategy | ||
self.out_column = out_column | ||
super().__init__( | ||
transform=_OneSegmentLabelEncoderTransform(self.in_column, self.out_column, self.strategy, self.inplace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use keyword parameters here like in_column=self.in_column.
This can be a reason for warning during creation with out_column=None.
|
||
encoded = _encode(y, uniques=self.classes_, check_unknown=False).astype(float) | ||
|
||
if strategy == "None": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threre is no such value as "None", only "none".
("mean", np.array([[0, 0], [1, 0], [0.5, 0]])), | ||
], | ||
) | ||
def test_new_value_label(two_df_with_new_values, strategy, expected_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_encoder in the name of the test is omitted.
same issue with word order in description
) | ||
|
||
|
||
def test_label_encoder(df_for_categorical_encoding): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be better to name "test_label_encoder_simple"
|
||
|
||
def test_label_encoder(df_for_categorical_encoding): | ||
"""Test LabelEncoderTransform correct works.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be more correct to use
"Test that LabelEncoderTransform works correct in a simple case."
return result_df | ||
|
||
|
||
class OneHotEncoderTransform(PerSegmentWrapper): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe how it works with new values.
class OneHotEncoderTransform(PerSegmentWrapper): | ||
""" | ||
Encode categorical feature as a one-hot numeric features. | ||
If unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we have to add empty line after the first line with short description: numpydoc.
future_ts = train_ts.make_future(10) | ||
forecast_ts = model.forecast(future_ts) | ||
r2 = R2() | ||
assert 1 - r2(forecast_ts, test_ts)["segment_0"] < 1e-5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In metric we expect y_true
to be the first parameter.
return train_ts, test_ts | ||
|
||
|
||
def test_ohe_sanity(ts_for_ohe_sanity): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should
- Add description to the test.
- Move it to the very of the file (fixture should be moved to the end of fixtures for this file).
|
||
df_to_forecast["segment_0", "target"] = df_regressors["segment_0"]["regressor_0"][:100].apply(f) | ||
ts = TSDataset(df=df_to_forecast, freq="D", df_exog=df_regressors) | ||
train_ts, test_ts = ts.train_test_split(test_size=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move train_test_split into the test, because they have the same parameter horizon=10.
|
||
@pytest.fixture | ||
def ts_for_ohe_sanity(): | ||
df_to_forecast = generate_ar_df(100, start_time="2021-01-01", n_segments=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add periods parameter in cases like this.
self.strategy = strategy | ||
self.out_column = out_column | ||
self.out_column = self._get_column_name() | ||
super().__init__(transform=_OneSegmentLabelEncoderTransform(self.in_column, self.out_column, self.strategy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use keyword parameters to create object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like everything is okay.
IMPORTANT: Please do not create a Pull Request without creating an issue first.
Before submitting (must do checklist)
Type of Change
Proposed Changes
Related Issue
Closing issues
Closes #355