Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding type hints #116 #128

Merged
merged 62 commits into from
Dec 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
54b4be7
Merge pull request #15 from lvgig/develop
richardangell Oct 5, 2021
878fa0d
Merge pull request #23 from lvgig/develop
richardangell Nov 3, 2021
facd429
Merge pull request #29 from lvgig/develop
richardangell Nov 9, 2021
20994ad
Merge pull request #36 from lvgig/develop
richardangell Nov 13, 2021
f544249
Merge pull request #39 from lvgig/develop
richardangell Jan 28, 2022
423b14c
Update quick-start.rst
davidhopkinson26 Jan 19, 2023
d122842
Revert "Update quick-start.rst"
davidhopkinson26 Jan 19, 2023
0098637
v0.3.3 changes into main (#64)
davidhopkinson26 Jan 19, 2023
cf158de
Revert "v0.3.3 changes into main (#64)"
davidhopkinson26 Mar 15, 2023
fee9828
Merge pull request #79 from lvgig/revert-64-develop
davidhopkinson26 Mar 21, 2023
a547304
Merge pull request #78 from lvgig/develop
davidhopkinson26 Mar 21, 2023
242a8fe
Merge pull request #82 from lvgig/develop
davidhopkinson26 Apr 27, 2023
7da13ce
Merge pull request #88 from lvgig/develop
davidhopkinson26 May 24, 2023
77681f3
Merge pull request #89 from lvgig/develop
davidhopkinson26 May 24, 2023
16682bc
Merge pull request #91 from lvgig/develop
Sarah-TaylorKnight Jul 5, 2023
cfd1dd8
Merge pull request #95 from lvgig/develop
davidhopkinson26 Jul 5, 2023
700f384
Merge pull request #97 from lvgig/develop
davidhopkinson26 Jul 10, 2023
a393710
Merge pull request #100 from lvgig/develop
davidhopkinson26 Jul 10, 2023
7655bc3
base.py passing ANN001
Sarah-TaylorKnight Jul 20, 2023
c759b3e
base.py passing ANN201, ANN202, ANN205
Sarah-TaylorKnight Jul 20, 2023
a4785e8
capping.py passing ANN001, ANN002
Sarah-TaylorKnight Jul 20, 2023
3a3486c
ANN auto fix
Sarah-TaylorKnight Jul 21, 2023
f806c34
Remove Number type
Sarah-TaylorKnight Jul 21, 2023
7e541bc
Merge pull request #126 from lvgig/develop
davidhopkinson26 Jul 24, 2023
119e58a
capping.py passing ANN201
Sarah-TaylorKnight Jul 27, 2023
98f414e
dates.py passing ANN001
Sarah-TaylorKnight Jul 27, 2023
0e1dfa5
dates.py passing ANN201
Sarah-TaylorKnight Jul 27, 2023
a99be85
dates.py passing ANN202
Sarah-TaylorKnight Jul 27, 2023
150e91d
imputers now compliant with ANN except ANN101 and ANN003
davidhopkinson26 Sep 11, 2023
3d27141
mapping.py passes ANN except ANN101 and ANN003
davidhopkinson26 Sep 11, 2023
85c6e3f
misc.py now passes ANN except ANN101 and ANN003
davidhopkinson26 Sep 11, 2023
12cd3fd
updated columns typehints to str | list[str] in imputers.py and misc.py
davidhopkinson26 Sep 11, 2023
8d90589
updated mappings typehint to dict[str, dict]
davidhopkinson26 Sep 11, 2023
d61d0f8
nominal.py passing ANN except ANN101 and ANN003
davidhopkinson26 Sep 11, 2023
cdc71a2
numeric.py passing ANN except ANN101 and ANN003
davidhopkinson26 Sep 11, 2023
cea9079
added np.random.RandomState to typehint for PCATransformer
davidhopkinson26 Sep 11, 2023
fcf2248
strings.py passing ANN except ANN101 and ANN003
davidhopkinson26 Sep 11, 2023
9f671f9
added ANN to ruff rules with ignore ANN101, ANN003 and all ANN for te…
davidhopkinson26 Sep 12, 2023
4950399
added annotations to examples/Data-Science-Festival-Workshop/plotting…
davidhopkinson26 Sep 12, 2023
70d573b
Merge branch 'develop' into feature/116-ann-flake8-annotations
davidhopkinson26 Sep 12, 2023
ce6917e
corrected merge conflict resolution errors in DatetimeInfoExtractor
davidhopkinson26 Sep 12, 2023
bca1eea
added from __future__ import annotations to allow use of | in type hi…
davidhopkinson26 Sep 12, 2023
5b20aa9
removed ANN003 from exclusions and added typehints to **kwargs across…
davidhopkinson26 Sep 12, 2023
d709704
Merge pull request #154 from lvgig/develop
davidhopkinson26 Dec 19, 2023
8fb3c5a
:rotating_light: Continue to import pandas as it is used throughout t…
adamsardar Dec 28, 2023
d2af343
:label: Include None type for potential capping values
adamsardar Dec 28, 2023
70c8e3a
:label: More explicit dict structure for argument splicing into corre…
adamsardar Dec 28, 2023
31707bb
:label: More explicit dict structure for argument splicing into corre…
adamsardar Dec 28, 2023
e00ac7d
:label: More explicit dict structure for argument splicing into corre…
adamsardar Dec 28, 2023
3b4d149
:label: Move from use of Any to object on advice from MyPy.
adamsardar Dec 28, 2023
55e5a6d
Merge branch 'main' into feature/116-ann-flake8-annotations
adamsardar Dec 28, 2023
7a8b6b4
:label: More explicit types for splice args into pd method
adamsardar Dec 28, 2023
d623a3b
:label: Add typehints for methods merged in from main. Mostly private…
adamsardar Dec 28, 2023
3a9ff8e
:art: black consistency
adamsardar Dec 28, 2023
a87d70a
:alien: Failing tests as the result of stipulation on an array in pd.…
adamsardar Dec 28, 2023
9d20ed9
:art: Appease black
adamsardar Dec 28, 2023
354d1cd
:art: Fix COM812 and black format
adamsardar Dec 28, 2023
fab2640
:label: Prefer `object` over `Any`
adamsardar Dec 28, 2023
1f12d44
:label: Erroneous inclusion of None in typehint - parent class constr…
adamsardar Dec 29, 2023
4b3f757
:sparkles: DateDiffLeapYearTransformer now uses a sensible default wh…
adamsardar Dec 29, 2023
e875124
:sparkles: DateDiffLeapYearTransformer now uses a sensible default wh…
adamsardar Dec 29, 2023
102b5ea
:label: Expand on type behaviour in docs
adamsardar Dec 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .ruff.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
# McCabe complexity (`C901`) by default.
select = ["E", "F", "W", "I", "UP", "ASYNC", "YTT", "A", "COM", "C4", "T10", "EM",
"FA", "ISC", "PIE", "PYI", "Q", "RSE", "RET", "SLOT", "SIM", "TID", "TCH", "INT",
"PD", "PGH", "PLC", "PLE", "PLW", "FLY", "NPY", "PERF", "B", "DTZ"]
"PD", "PGH", "PLC", "PLE", "PLW", "FLY", "NPY", "PERF", "B", "DTZ", "ANN"]


# ignore E501 - linelength limit (covered by black except in docstrings)
# and PD901 - use of df variable name
ignore = ["E501", "PD901"]
ignore = ["E501", "PD901", "ANN101"]

# Allow autofix for all enabled rules (when `--fix`) is provided.
fixable = ["ALL"]
Expand All @@ -33,4 +34,5 @@ target-version = "py38"

# Ignore `E402` (import violations) in all `__init__.py` file.
[per-file-ignores]
"__init__.py" = ["E402", "F401"]
"__init__.py" = ["E402", "F401"]
"tests/*" = ["ANN"]
9 changes: 6 additions & 3 deletions examples/Data-Science-Festival-Workshop/plotting.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
def one_way_summary_plot(df, column, response="y"):
import pandas as pd


def one_way_summary_plot(df: pd.DataFrame, column: str, response: str = "y") -> None:
"""Function to produce a rough one-way summary plot of a specific column.

Specifically plot averge response (right y axis) and number of records (left
y axis) by the selected column (x axis).

"""
agg = df.groupby(column).agg({column: ["count"], "y": ["mean"]})
agg = df.groupby(column).agg({column: ["count"], response: ["mean"]})

ax = agg.plot.bar(y=(column, "count"), ylabel="count", figsize=(8, 5))

agg.plot(
y=("y", "mean"),
y=(response, "mean"),
style=":",
marker=".",
c="k",
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
raise RuntimeError(msg)


def list_reqs(fname="requirements.txt"):
def list_reqs(fname: str = "requirements.txt") -> list:
with open(fname) as fd:
return fd.read().splitlines()

Expand Down
19 changes: 19 additions & 0 deletions tests/dates/test_DateDiffLeapYearTransformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,25 @@ def test_inputs_set_to_attribute(self):
msg="Attributes for DateDiffLeapYearTransformer set in init",
)

def test_inputs_set_to_attribute_name_not_set(self):
"""Test that the value passed for new_column_new_column_name and units are saved in attributes of the same new_column_name."""
x = DateDiffLeapYearTransformer(
column_lower="dummy_1",
column_upper="dummy_2",
drop_cols=True,
)

ta.classes.test_object_attributes(
obj=x,
expected_attributes={
"column_lower": "dummy_1",
"column_upper": "dummy_2",
"columns": ["dummy_1", "dummy_2"],
"new_column_name": "dummy_2_dummy_1_datediff",
},
msg="Attributes for DateDifferenceTransformer set in init",
)


class TestTransform:
"""Tests for DateDiffLeapYearTransformer.transform()."""
Expand Down
2 changes: 1 addition & 1 deletion tests/numeric/test_CutTransformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def test_pd_cut_call(self, mocker):

expected_call_args = {
0: {
"args": (d.create_df_9()["a"],),
"args": (d.create_df_9()["a"].to_numpy(),),
"kwargs": {"bins": 3, "right": False, "precision": 2},
},
}
Expand Down
39 changes: 23 additions & 16 deletions tubular/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
from. These transformers contain key checks to be applied in all cases.
"""

from __future__ import annotations

import warnings

import pandas as pd
Expand Down Expand Up @@ -46,11 +48,16 @@ class BaseTransformer(TransformerMixin, BaseEstimator):

"""

def classname(self):
def classname(self) -> str:
"""Method that returns the name of the current class when called."""
return type(self).__name__

def __init__(self, columns=None, copy=True, verbose=False) -> None:
def __init__(
self,
columns: list[str] | str = None,
copy: bool = True,
verbose: bool = False,
) -> None:
self.version_ = __version__

if not isinstance(verbose, bool):
Expand Down Expand Up @@ -92,7 +99,7 @@ def __init__(self, columns=None, copy=True, verbose=False) -> None:

self.copy = copy

def fit(self, X, y=None):
def fit(self, X: pd.DataFrame, y: pd.Series | None = None) -> BaseTransformer:
"""Base transformer fit method, checks X and y types. Currently only pandas DataFrames are allowed for X
and DataFrames or Series for y.

Expand Down Expand Up @@ -130,7 +137,7 @@ def fit(self, X, y=None):

return self

def _combine_X_y(self, X, y):
def _combine_X_y(self, X: pd.DataFrame, y: pd.Series) -> pd.DataFrame:
"""Combine X and y by adding a new column with the values of y to a copy of X.

The new column response column will be called `_temporary_response`.
Expand Down Expand Up @@ -171,7 +178,7 @@ def _combine_X_y(self, X, y):

return X_y

def transform(self, X):
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
"""Base transformer transform method; checks X type (pandas DataFrame only) and copies data if requested.

Transform calls the columns_check method which will check columns in columns attribute are in X.
Expand Down Expand Up @@ -201,7 +208,7 @@ def transform(self, X):

return X

def check_is_fitted(self, attribute):
def check_is_fitted(self, attribute: str) -> None:
"""Check if particular attributes are on the object. This is useful to do before running transform to avoid
trying to transform data without first running the fit method.

Expand All @@ -215,7 +222,7 @@ def check_is_fitted(self, attribute):
"""
check_is_fitted(self, attribute)

def columns_check(self, X):
def columns_check(self, X: pd.DataFrame) -> None:
"""Method to check that the columns attribute is set and all values are present in X.

Parameters
Expand All @@ -240,7 +247,7 @@ def columns_check(self, X):
if c not in X.columns.to_numpy():
raise ValueError(f"{self.classname()}: variable " + c + " is not in X")

def columns_set_or_check(self, X):
def columns_set_or_check(self, X: pd.DataFrame) -> None:
"""Function to check or set columns attribute.

If the columns attribute is None then set it to all columns in X. Otherwise run the columns_check method.
Expand All @@ -262,7 +269,7 @@ def columns_set_or_check(self, X):
self.columns_check(X)

@staticmethod
def check_weights_column(X, weights_column):
def check_weights_column(X: pd.DataFrame, weights_column: str) -> None:
"""Helper method for validating weights column in dataframe.

Args:
Expand Down Expand Up @@ -345,12 +352,12 @@ class DataFrameMethodTransformer(BaseTransformer):

def __init__(
self,
new_column_name,
pd_method_name,
columns,
pd_method_kwargs=None,
drop_original=False,
**kwargs,
new_column_name: list[str] | str,
pd_method_name: str,
columns: list[str] | str | None,
pd_method_kwargs: dict[str, object] | None = None,
drop_original: bool = False,
**kwargs: dict[str, bool],
) -> None:
super().__init__(columns=columns, **kwargs)

Expand Down Expand Up @@ -397,7 +404,7 @@ def __init__(
msg = f'{self.classname()}: error accessing "{pd_method_name}" method on pd.DataFrame object - pd_method_name should be a pd.DataFrame method'
raise AttributeError(msg) from err

def transform(self, X):
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
"""Transform input pandas DataFrame (X) using the given pandas.DataFrame method and assign the output
back to column or columns in X.

Expand Down
46 changes: 31 additions & 15 deletions tubular/capping.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""This module contains a transformer that applies capping to numeric columns."""

from __future__ import annotations

import copy
import warnings

Expand Down Expand Up @@ -61,10 +63,10 @@ class CappingTransformer(BaseTransformer):

def __init__(
self,
capping_values=None,
quantiles=None,
weights_column=None,
**kwargs,
capping_values: dict[str, list[int | float | None]] | None = None,
quantiles: dict[str, list[int | float]] | None = None,
adamsardar marked this conversation as resolved.
Show resolved Hide resolved
weights_column: str | None = None,
**kwargs: dict[str, bool],
) -> None:
if capping_values is None and quantiles is None:
msg = f"{self.classname()}: both capping_values and quantiles are None, either supply capping values in the capping_values argument or supply quantiles that can be learnt in the fit method"
Expand Down Expand Up @@ -100,7 +102,11 @@ def __init__(
self.weights_column = weights_column
self._replacement_values = copy.deepcopy(self.capping_values)

def check_capping_values_dict(self, capping_values_dict, dict_name):
def check_capping_values_dict(
self,
capping_values_dict: dict[str, list[int | float | None]],
dict_name: str,
) -> None:
"""Performs checks on a dictionary passed to ."""
if type(capping_values_dict) is not dict:
msg = f"{self.classname()}: {dict_name} should be dict of columns and capping values"
Expand Down Expand Up @@ -139,7 +145,7 @@ def check_capping_values_dict(self, capping_values_dict, dict_name):
msg = f"{self.classname()}: both values are None for key {k}"
raise ValueError(msg)

def fit(self, X, y=None):
def fit(self, X: pd.DataFrame, y: None = None) -> CappingTransformer:
"""Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied
Expand Down Expand Up @@ -185,7 +191,12 @@ def fit(self, X, y=None):

return self

def prepare_quantiles(self, values, quantiles, sample_weight=None):
def prepare_quantiles(
self,
values: pd.Series | np.array,
quantiles: list[float],
sample_weight: pd.Series | np.array | None = None,
) -> list[int | float]:
"""Method to call the weighted_quantile method and prepare the outputs.

If there are no None values in the supplied quantiles then the outputs from weighted_quantile
Expand Down Expand Up @@ -230,7 +241,12 @@ def prepare_quantiles(self, values, quantiles, sample_weight=None):

return results

def weighted_quantile(self, values, quantiles, sample_weight=None):
def weighted_quantile(
self,
values: pd.Series | np.array,
quantiles: list[float],
sample_weight: pd.Series | np.array | None = None,
) -> list[int | float]:
"""Method to calculate weighted quantiles.

This method is adapted from the "Completely vectorized numpy solution" answer from user
Expand Down Expand Up @@ -328,7 +344,7 @@ def weighted_quantile(self, values, quantiles, sample_weight=None):

return list(np.interp(quantiles, weighted_quantiles, values))

def transform(self, X):
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
"""Apply capping to columns in X.

If cap_value_max is set, any values above cap_value_max will be set to cap_value_max. If cap_value_min
Expand Down Expand Up @@ -440,10 +456,10 @@ class OutOfRangeNullTransformer(CappingTransformer):

def __init__(
self,
capping_values=None,
quantiles=None,
weights_column=None,
**kwargs,
capping_values: dict[str, list[int | float | None]] | None = None,
quantiles: dict[str, list[int | float]] | None = None,
weights_column: str | None = None,
**kwargs: dict[str, bool],
) -> None:
super().__init__(
capping_values=capping_values,
Expand All @@ -454,7 +470,7 @@ def __init__(

self.set_replacement_values()

def set_replacement_values(self):
def set_replacement_values(self) -> None:
"""Method to set the _replacement_values to have all null values.

Keeps the existing keys in the _replacement_values dict and sets all values (except None) in the lists to np.NaN. Any None
Expand All @@ -468,7 +484,7 @@ def set_replacement_values(self):

self._replacement_values[k] = null_replacements_list

def fit(self, X, y=None):
def fit(self, X: pd.DataFrame, y: None = None) -> OutOfRangeNullTransformer:
"""Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied
Expand Down
6 changes: 4 additions & 2 deletions tubular/comparison.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import pandas as pd
from __future__ import annotations

import pandas as pd # noqa: TCH002

from tubular.base import BaseTransformer

Expand Down Expand Up @@ -27,7 +29,7 @@ def __init__(
columns: list,
new_col_name: str,
drop_original: bool = False,
**kwargs,
**kwargs: dict[str, bool],
) -> None:
super().__init__(columns=columns, **kwargs)

Expand Down
Loading
Loading