Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent args and generic tests for columns: str or list transformers #150

Open
davidhopkinson26 opened this issue Dec 18, 2023 · 6 comments
Labels
breaking change change which would break backwards compatibility

Comments

@davidhopkinson26
Copy link
Contributor

davidhopkinson26 commented Dec 18, 2023

What?
This is a ticket for homogenising and implementing the generic tests for the transformers which take columns: str or list. These makes up the majority of transformers. A list of the transformers can be read from the attached spreadsheet (see end of description).

Why?
Have identified as part of scoping #138 and #68 that for usability it makes sense to keep some inconsistency in transformers columns arguments. However for majority of transformers where columns can be a string or list of column names we can create the generic testing framework discussed in #138 in place of testing a base transformer and then implicitly testing functionality by testing calls to super() methods in child classes.

Other transformers can be grouped together in another issue, made consistent and tested together separately (e.g transformers which take two column inputs, transformers which take only a list)

How?
For all cases where it makes sense, ensure the argument defining which columns the transformer operates on is called 'columns' and is able to take a str or list input.

We will also include those which have columns: str, list or None. For now It seems best to remove the 'None' option as it is only included sporadically. (In future we may want to add in functionality to check column types and compatibility with transformer and run on all columns which pass these criteria for columns=None.)

Replace implementation tests in each transformer with a set of tests which loop over the identified transformers testing functionality defined in BaseTransformer.

Transformer Consistent Args Audit.xlsx

@davidhopkinson26 davidhopkinson26 added CI improvements to continuous integration breaking change change which would break backwards compatibility labels Dec 18, 2023
@TommyMatthews
Copy link
Contributor

For reference, there are 29 transformers in the columns str or list group, 17 of which need None removing and 7 of which will need more significant refactoring.

@TommyMatthews
Copy link
Contributor

TommyMatthews commented Feb 21, 2024

Overview:

This details the steps required to refactor the testing in the tubular package so that we test behaviour, not implementation.

In order to do this, we have created a tree of inheritable test classes. Between them these test classes contain tests for the full suite of behaviours that will be inherited when you inherit from the transformer in question, allowing these behaviours to be tested explicitly, rather than implicitly through implementation tests such as test_super_transform_called .

Refactoring a test file to bring it into the new framework mainly consists of selecting the correct classes to inherit from when writing the unit tests. The inherited tests will all run automatically and test for the inherited behaviours, thus the only tests that need to explicitly be written into the test file are the tests specific to that transformer (e.g. expected output tests). The full process for bringing a transformer into this framework is detailed in the sub-page "Test refactor PR process".

When working through this checklist it is important to check that test classes have been written for the relevant inherited transformers before beginning the refactor of a child transformer, for example make sure that CappingTransformer has been done before it's child class OutOfRangeNullTransformer.

This work looks at every transformer and so provides a good opportunity to audit Tubular! If you see anything you think could be improve, or spot any redundancy or duplicity in the package, please raise a github issue and give David H a msg (smile)

Checklist (tickets to be added):

Base:

BaseTransformer
DataFrameMethodTransformer

Capping:

CappingTransformrmer
OutOfRangeNullTransformer (do Capping first!)

Comparison

EqualityChecker

Dates:

These need a bit more thought - BaseDateTransformer only provides helper functions - not obvious how to check that the desired behaviour is obtained in child classes without repeating test code.

BaseDateTransformer
DateDiffLeapYearTransformer
DateDifferenceTransformer
ToDatetimeTransformer
SeriesDtMethodTransformer
BetweenDatesTransformer
DatetimeInfoExtractor

Imputers:

BaseImputer
ArbitraryImputer
MedianImputer
MeanImputer
ModeImputer
NearestMeanResponseImputer

Mapping:

BaseMappingTransformer
BaseMappingTransformMixin
MappingTransformer
CrossColumnMappingTransformer
CrossColumnMultiplyTransfornmer
CrossColumnAddTransformer

Misc:

SetValueTransformer
SetColumnDtype

Nominal

BaseNominalTransformer
NominalToIntegerTransformer
GroupRareLevelsTransformer
MeanResponseTransformer
OrdinalEncoderTransformer
OneHotEncodingTransformer

Numeric

LogTransformer
CutTransformer
TwoColumnOperatorTransformer
ScalingTransformer
InteractionTransformer
PCATransformer

Strings

SeriesStrMethodTransformer
StringConcatenator

@TommyMatthews
Copy link
Contributor

TommyMatthews commented Feb 21, 2024

Test Refactor PR Process:

This is the procedure for ticking a single sub group of transformers from the checklist in the overview page. Included are example PRs for each category of transformer.

Contents:

Process overview: step by step approach
Example PRs: example PRs covering the different scenarios
Consistent arg standards
Inheriting the appropriate test classes

Process overview:

For each transformer you need to:

  • Refactor the transformer to meet consistent argument standards (see below).
  • Update "minimal attribute dict" fixture in conftest.py with the correct set of minimal arguments required
  • Set up the TestInit, TestFit and TestTransform methods to inherit the appropriate test classes (see below) and then copy the setup_class class method into each test class, changing transformer_name to the transformer you are testing.
  • Add a "TestOtherBaseBehaviour" class to the bottom of the test file.
  • Debug the failing tests from the base test suite. This may require:
  • Changing the transformer to access different columns from the data
  • Updating the test data with something that works for this transformer (and changing the columns targeted by the transformer)
  • Delete any unnecessary tests from original transformer test file (i.e. tests testing inherited behaviour that is now tested by the inherited test classes).
  • Bring the test file in line with best practices by insuring there are no implementation tests and all desired behaviours are tested for (which are not covered by the base test).
  • Create PR into main

Example PRs: (TO ADD)

Str or list:
Two column:
Columns from dict:
Special case:

Consistent arg standards:

Columns str or list:

Transformers taking columns as a single string or list should have a columns and, where relevant, new_column_names arguments. The transformer should operate on each column in columns (performing the same operation each time). Remove None as an option for columns - we are no longer keeping this functionality.

Two column:

Transformers taking two columns should have an upper_column and lower_column argument - there is no need for operating on multiple pairs of columns at once at this stage.

From dict:

These transformers read the columns in from e.g. a mapping dictionary. They can be left as they are - they're already all consistent.

Other cases:

Bring them inline with the above as close as possible but don't worry too much.

Inheriting the appropriate test classes:

Some thought is required to inherit the appropriate test classes but if should be fairly clear by looking at the inheritance tree of the transformer you are testing - the only quirks are:

If the transformer inherits directly from BaseTransformer, you will have to pick the correct interface class given how the transformer takes the columns input. (e.g. ColumnsFromDictInitTests)
If your transformer is not a direct child of BaseTransformer, but the intermediate class only affects the e.g. transform method, the transform class may inherit from the direct parent transform test class but the other methods will inherit from the BaseTransformer test classes.

@TommyMatthews
Copy link
Contributor

Please make a Jira ticket for each transformer when you start work

@limlam96
Copy link
Contributor

limlam96 commented Apr 2, 2024

Capturing state-of-play

NOTE: this work is now tracked by this milestone

Base:
BaseTransformer
DataFrameMethodTransformer
UpperColumnLowerColumnInitTests PR

Capping:
BaseCappingTransformer PR
CappingTransformer PR
OutOfRangeNullTransformer (do Capping first!) PR

Comparison:
EqualityChecker PR

Dates (grouping these in larger issue initially as more complicated (Issue) )
Refactor module in prep for test refactors PR
BaseDateTransformer #273
DateDiffLeapYearTransformer
DateDifferenceTransformer
ToDatetimeTransformer
SeriesDtMethodTransformer
BetweenDatesTransformer
DatetimeInfoExtractor

Imputers:

BaseImputer
ArbitraryImputer PR
MedianImputer PR
MeanImputer PR
ModeImputer PR
NearestMeanResponseImputer Issue

Mapping:
BaseMappingTransformer
BaseMappingTransformMixin
MappingTransformer
Base CrossColumnMapping Test Classes PR1, PR2
CrossColumnMappingTransformer PR
CrossColumnMultiplyTransfornmer PR
CrossColumnAddTransformer PR

Misc:
SetValueTransformer PR
SetColumnDtype PR

Nominal:
BaseNominalTransformer PR
NominalToIntegerTransformer Issue
GroupRareLevelsTransformer Issue
MeanResponseTransformer PR
OrdinalEncoderTransformer Issue
OneHotEncodingTransformer Issue

Numeric:
LogTransformer PR
CutTransformer #272
TwoColumnOperatorTransformer #274
ScalingTransformer #275
InteractionTransformer #276
PCATransformer #277

Strings:
SeriesStrMethodTransformer #278
StringConcatenator #279

Total complete = 26/43

@limlam96
Copy link
Contributor

@davidhopkinson26 same Q here - I think this is now fully captured by individual issues, am I okay to just close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change change which would break backwards compatibility
Projects
None yet
Development

No branches or pull requests

3 participants