Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ScalingTransformer to Inherit from BaseNumericTransformer and Update Tests #284

Merged
merged 12 commits into from
Sep 19, 2024

Conversation

favouribude1
Copy link
Contributor

Summary of Refactoring

Class Inheritance: Changed inheritance from BaseTransformer to BaseNumericTransformer in the ScalingTransformer class

Scaler Initialization: Used a dictionary mapping for scaler types to avoid repetitive if-else statements a

Added checks for scaler_kwargs to ensure they are correctly formatted.

fit() and transform() methods now relying on BaseNumericTransformer for numeric checks.                                                 

Modified the tests based on refactored code

Copy link
Contributor

@davidhopkinson26 davidhopkinson26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @favouribude1, thanks for the work here. The changes to the transformer are good but there is still some more to do to refactor the tests into the new format. Please take another look at the issue #275 and an example like #268 to see what changes are needed. We should be able to delete a lot of tests here 🔥

@@ -427,7 +429,7 @@ def transform(self, X: pd.DataFrame) -> pd.DataFrame:
return X


class ScalingTransformer(BaseTransformer):
class ScalingTransformer(BaseNumericTransformer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P: thanks for the clean up here it is much nicer now!

@@ -123,7 +123,7 @@ def test_super_init_called(self, mocker):


class TestCheckNumericColumns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B: we now set up our test classes to inherit a set of generic tests from other test modules. See the guidance in the issue #275 #268 for an example in this numeric module. You need to set up the test classes for init, fit and transform to inherit from a parent test class and add

@classmethod def setup_class(cls): cls.transformer_name = "ScalingTransformer"

to each so that the inherited tests can function (they look up initialised transformers and minimal dataframes from conftest.py using this)

After doing this check_numeric will be covered by tests in BaseNumericTransformerFitTests and BaseNumericTransformerTransformTests so these tests can be deleted form here once those tests have been inherited.

You can also delete any implementation tests so we are just testing functionality. There should be far fewer tests in this module when you are done!

def test_exception_raised(self):
"""Test an exception is raised if non numeric columns are passed in X."""
df = d.create_df_2()
def test_non_numeric_exception_raised(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: This test is already inherited from BaseNumericTransformerFitTests so I don't think it is needed here. Am I missing osmething that is not covered by the test in the base class? https://github.com/lvgig/tubular/blob/feature/refactor_ScalingTransformer_tests/tests/numeric/test_BaseNumericTransformer.py

Copy link
Contributor Author

@favouribude1 favouribude1 Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the base class tests and saw that it doesn’t cover the specific error messages and checks needed for ScalingTransformer. The base test is more general and doesn’t handle the exact way ScalingTransformer raises errors with non-numeric data.

When I removed the specific test from ScalingTransformer, it caused the test to fail. So, I kept the test in ScalingTransformer to ensure everything works as expected. If you’d like, I can look into adjusting the base tests to cover this, or we can stick with the current setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how so? as far as I can tell the handling should be the same. fit calls super.fit which calls the same _check_numeric method so seems like something is going wrong if the error is returning something different here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had a bit more of a dig and it looks like we forgot to implement any tests for fit for LogTransformer so the inheritance of this test has not been tested before this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case the test doesn't work because the x = initialized_transformers[self.transformer_name] takes the default initialised transformer for ScalingTransformer and that transformer has only columns = ['a']. Since the df we want to check only has non numeric data in column b we are not throwing an error. adding b to columns in conftest.minimal_attribute_dict['ScalingTransformer] seems to fix the issue without breaking other tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. I have now resolved it and it's working fine now

Copy link
Contributor

@davidhopkinson26 davidhopkinson26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @favouribude1! All looks good apart from one possibly redundant test I have flagged in comments.

There are also a couple of merge conflicts to resolve and you need to update the changelog.

CHANGELOG.rst Outdated
@@ -46,6 +46,8 @@ Changed
- DatetimeInfoExtractor.mappings_provided changed from a dict.keys() object to list so transformer is serialisable. `#258 <https://github.com/lvgig/tubular/pull/258>`_
- Created BaseNumericTransformer class to support test refactor of numeric file `#266 <https://github.com/lvgig/tubular/pull/266>`_
- Updated testing approach for LogTransformer `#268 <https://github.com/lvgig/tubular/pull/268>`_
- Updated `ScalingTransformer` default columns to `["a", "b"]` in `minimal_attribute_dict` to fix test issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs a line in the changelog, its just part of the below point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise happy with changes so happy to approve once we delete this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the unnecessary changelog entry

@davidhopkinson26
Copy link
Contributor

Thanks favour, happy with this now

@davidhopkinson26 davidhopkinson26 merged commit 6b7905f into main Sep 19, 2024
12 checks passed
@davidhopkinson26 davidhopkinson26 deleted the feature/refactor_ScalingTransformer_tests branch September 19, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bring ScalingTransformer in line with new testing set up
3 participants