-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent args and generic tests for columns: str or list transformers #150
Comments
For reference, there are 29 transformers in the columns str or list group, 17 of which need None removing and 7 of which will need more significant refactoring. |
Overview: This details the steps required to refactor the testing in the tubular package so that we test behaviour, not implementation. In order to do this, we have created a tree of inheritable test classes. Between them these test classes contain tests for the full suite of behaviours that will be inherited when you inherit from the transformer in question, allowing these behaviours to be tested explicitly, rather than implicitly through implementation tests such as test_super_transform_called . Refactoring a test file to bring it into the new framework mainly consists of selecting the correct classes to inherit from when writing the unit tests. The inherited tests will all run automatically and test for the inherited behaviours, thus the only tests that need to explicitly be written into the test file are the tests specific to that transformer (e.g. expected output tests). The full process for bringing a transformer into this framework is detailed in the sub-page "Test refactor PR process". When working through this checklist it is important to check that test classes have been written for the relevant inherited transformers before beginning the refactor of a child transformer, for example make sure that CappingTransformer has been done before it's child class OutOfRangeNullTransformer. This work looks at every transformer and so provides a good opportunity to audit Tubular! If you see anything you think could be improve, or spot any redundancy or duplicity in the package, please raise a github issue and give David H a msg (smile) Checklist (tickets to be added): Base: BaseTransformer Capping: CappingTransformrmer Comparison EqualityChecker Dates: These need a bit more thought - BaseDateTransformer only provides helper functions - not obvious how to check that the desired behaviour is obtained in child classes without repeating test code. BaseDateTransformer Imputers: BaseImputer Mapping: BaseMappingTransformer Misc: SetValueTransformer Nominal BaseNominalTransformer Numeric LogTransformer Strings SeriesStrMethodTransformer |
Test Refactor PR Process: This is the procedure for ticking a single sub group of transformers from the checklist in the overview page. Included are example PRs for each category of transformer. Contents: Process overview: step by step approach Process overview: For each transformer you need to:
Example PRs: (TO ADD) Str or list: Consistent arg standards: Columns str or list: Transformers taking columns as a single string or list should have a columns and, where relevant, new_column_names arguments. The transformer should operate on each column in columns (performing the same operation each time). Remove None as an option for columns - we are no longer keeping this functionality. Two column: Transformers taking two columns should have an upper_column and lower_column argument - there is no need for operating on multiple pairs of columns at once at this stage. From dict: These transformers read the columns in from e.g. a mapping dictionary. They can be left as they are - they're already all consistent. Other cases: Bring them inline with the above as close as possible but don't worry too much. Inheriting the appropriate test classes: Some thought is required to inherit the appropriate test classes but if should be fairly clear by looking at the inheritance tree of the transformer you are testing - the only quirks are: If the transformer inherits directly from BaseTransformer, you will have to pick the correct interface class given how the transformer takes the columns input. (e.g. ColumnsFromDictInitTests) |
Please make a Jira ticket for each transformer when you start work |
Capturing state-of-play NOTE: this work is now tracked by this milestone Base: Capping: Comparison: Dates (grouping these in larger issue initially as more complicated (Issue) ) Imputers:
Mapping: Misc: Nominal: Numeric: Strings: Total complete = 26/43 |
@davidhopkinson26 same Q here - I think this is now fully captured by individual issues, am I okay to just close? |
What?
This is a ticket for homogenising and implementing the generic tests for the transformers which take columns: str or list. These makes up the majority of transformers. A list of the transformers can be read from the attached spreadsheet (see end of description).
Why?
Have identified as part of scoping #138 and #68 that for usability it makes sense to keep some inconsistency in transformers columns arguments. However for majority of transformers where columns can be a string or list of column names we can create the generic testing framework discussed in #138 in place of testing a base transformer and then implicitly testing functionality by testing calls to super() methods in child classes.
Other transformers can be grouped together in another issue, made consistent and tested together separately (e.g transformers which take two column inputs, transformers which take only a list)
How?
For all cases where it makes sense, ensure the argument defining which columns the transformer operates on is called 'columns' and is able to take a str or list input.
We will also include those which have columns: str, list or None. For now It seems best to remove the 'None' option as it is only included sporadically. (In future we may want to add in functionality to check column types and compatibility with transformer and run on all columns which pass these criteria for columns=None.)
Replace implementation tests in each transformer with a set of tests which loop over the identified transformers testing functionality defined in BaseTransformer.
Transformer Consistent Args Audit.xlsx
The text was updated successfully, but these errors were encountered: