TYP: interval.pyi #44922

twoertwein · 2021-12-16T03:43:27Z

~~Currently rebased on top of #44339.~~

This is the second last part of @erictraut's #43744 (offsets.pyi is still missing).

pep8speaks · 2021-12-16T03:43:34Z

Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file pandas/io/excel/_odfreader.py:

Line 89:89: E501 line too long (89 > 88 characters)

Comment last updated at 2022-02-21 04:04:50 UTC

twoertwein · 2021-12-16T16:36:00Z

pandas/core/arrays/interval.py

@@ -200,6 +201,9 @@ class IntervalArray(IntervalMixin, ExtensionArray):
    ndim = 1
    can_hold_na = True
    _na_value = _fill_value = np.nan
+    _left: np.ndarray
+    _right: np.ndarray


I'm not sure about the type of _left and _right, see mypy error for _from_sequence in this file.

twoertwein · 2021-12-16T20:04:36Z

pandas/_typing.py

@@ -83,7 +83,7 @@
 PythonScalar = Union[str, int, float, bool]
 DatetimeLikeScalar = Union["Period", "Timestamp", "Timedelta"]
 PandasScalar = Union["Period", "Timestamp", "Timedelta", "Interval"]
-Scalar = Union[PythonScalar, PandasScalar]
+Scalar = Union[PythonScalar, PandasScalar, np.datetime64, np.timedelta64]


Before this PR, Interval resolved to Any. Some functions that are supposed to return a Scalar can return np.datetime64 / np.timedelta64 which was previously unnoticed.

pandas/core/indexes/interval.py

jreback · 2021-12-19T22:43:42Z

I liket he _ScalarT better for a name of the typevar (as compared to _S in timestamps.pyi)

jreback · 2021-12-19T22:43:48Z

cc @simonjayhawkins

jreback · 2021-12-19T22:44:18Z

cc @jbrockmendel

pandas/_libs/interval.pyi

jbrockmendel · 2021-12-20T15:31:40Z

pandas/_libs/interval.pyi

+    @property
+    def mid(self) -> float: ...
+    @property
+    def length(self) -> float: ...


could be timedelta?

It is now generic but IntervalMixin[datetime].length is annotated to return a datetime. I don't think this exception can be achieved with overloads (overlapping overloads with different return type).

pandas/_libs/interval.pyi

pandas/core/algorithms.py

jbrockmendel · 2021-12-20T15:36:57Z

pandas/core/reshape/pivot.py

@@ -482,11 +482,20 @@ def pivot(
    if columns is None:
        raise TypeError("pivot() missing 1 required argument: 'columns'")

-    columns_listlike = com.convert_to_list_like(columns)
+    # error: Argument 1 to "convert_to_list_like" has incompatible type "Hashable";


does this suggest that the 'columns' annotation may be too loose?

convert_to_list_like seems to be too strict: it literally accepts anything.

twoertwein · 2021-12-21T03:17:14Z

I just realized that typing interval.pyi is much messier than I expected:

A TypeVar cannot be defined by a generic type, but Interval is now a generic type which depends on Scalar (includes Interval) and uses it as a TypeVar
IntervalMixin should also be a Generic (the two float return types should be _ScalarT) but IntervalTree inherits also from IntervalMixin which means that the generic type of IntervalMixin is not only Scalar but also np.ndarray
__sub__ and length should reflect the datetime/timedelta mechanics (datetime - datetime = timedelta)
and __add__ should probably not allow datetime + datetime (but datetime + timedelta)

I think the only way to address the above is to use define a Protocol (needs to support +, -, <, ...) and then use this protocol as a bound for a TypeVar which is then used by IntervalMixin&co. The datetime/timedelta mechanics would then need to be addressed by overloads.

Marking this PR as a draft until I have a working version with a protocol.

twoertwein · 2022-01-08T17:04:23Z

Version 2: more aligned with the documentation (any "orderable scalar" is accepted) which also breaks the Scalar-Interval dependency.

twoertwein · 2022-01-08T17:07:31Z

pandas/core/common.py

@@ -517,7 +516,7 @@ def f(x):


 def convert_to_list_like(
-    values: Scalar | Iterable | AnyArrayLike,
+    values: Hashable | Iterable | AnyArrayLike,


All scalars should also be hashable.

jreback · 2022-01-16T17:36:29Z

@twoertwein if you can merge master

jreback · 2022-01-17T14:02:58Z

cc @simonjayhawkins if any comments

twoertwein · 2022-02-19T03:31:34Z

pandas/io/parsers/python_parser.py

@@ -893,7 +893,7 @@ def _clear_buffer(self) -> None:

    def _get_index_name(
        self, columns: list[Hashable]
-    ) -> tuple[list[Hashable] | None, list[Hashable], list[Hashable]]:
+    ) -> tuple[Sequence[Hashable] | None, list[Hashable], list[Hashable]]:


Without this, we would get:

pandas/io/parsers/python_parser.py:949: error: Incompatible return value type (got "Tuple[List[Union[Union[str, int, float, bool], Union[Period, Timestamp, Timedelta, Interval[Any]]git , datetime64, timedelta64]], List[Hashable], List[Hashable]]", expected "Tuple[Optional[List[Hashable]], List[Hashable], List[Hashable]]") [return-value]

Dr-Irv · 2022-02-20T16:44:55Z

I think that this is based on stuff I've put in the Microsoft stubs. I recently did a PR there that addresses a few issues. See https://github.com/microsoft/python-type-stubs/pull/167/files

One thing that I had to test was getting the type right for intervals based on int or float compared to Timestamp.

I added the following test in that PR in the MS stubs:

def test_interval_length() -> None:
    i1 = pd.Interval(pd.Timestamp("2000-01-01"), pd.Timestamp("2000-01-02"), closed="both")
    reveal_type(i1.length, expected_string="Timedelta")
    i1.length.total_seconds()

    i2 = pd.Interval(10, 20)
    reveal_type(i2.length, expected_type=int)

    i3 = pd.Interval(13.2, 19.5)
    reveal_type(i3.length, expected_type=float)

mypy doesn't support the expected_type or expected_string arguments to reveal_type, but you ought to make sure that your stubs are figuring out the above in terms of the type of length being dependent on the constructor for Interval

twoertwein · 2022-02-20T21:52:37Z

Thanks @Dr-Irv It definitely makes sense to add exceptions for Pandas/important stdlib types. I think I found a way to encode this (using your workaround for overload+decorators):

# note: mypy doesn't support overloading properties
# based on github.com/microsoft/python-type-stubs/pull/167
class _LengthDescriptor:
    @overload
    def __get__(self, instance: IntervalMixin[Timestamp], owner: Any) -> Timedelta: ...
    @overload
    def __get__(self, instance: IntervalMixin[datetime], owner: Any) -> timedelta: ...
    @overload
    def __get__(
        self, instance: IntervalMixin[_OrderableT], owner: Any
    ) -> _OrderableT: ...

class IntervalMixin(Generic[_OrderableT]):
    ...
    length: _LengthDescriptor

Unfortunately, there doesn't seem to be a generic way to handle other exceptions (Interval can be used with more classes than just int/float/Timestamp).

Still need to debug this locally - getting some "has-type/misc" mypy errors and pyright doesn't pick up on it (probably an issue with TypeVar'ed Protocol in Generic classes - will create an issue on pyright).

Dr-Irv · 2022-02-20T22:00:15Z

Unfortunately, there doesn't seem to be a generic way to handle other exceptions (Interval can be used with more classes than just int/float/Timestamp).

Not sure about that. See below:

pd.Interval(datetime.datetime(year=2022,month=3, day=15), datetime.datetime(year=2022,month=4, day=15))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas\_libs\interval.pyx", line 325, in pandas._libs.interval.Interval.__init__
  File "pandas\_libs\interval.pyx", line 345, in pandas._libs.interval.Interval._validate_endpoint
ValueError: Only numeric, Timestamp and Timedelta endpoints are allowed when constructing an Interval.

So I think we don't need to worry about datetime endpoints.

twoertwein · 2022-02-20T22:26:27Z

In that case, 1) your approach might be much simpler and 2) the documentation needs to be updated.

pandas/_libs/interval.pyi

Dr-Irv · 2022-02-21T04:11:40Z

pandas/_libs/interval.pyi

+    def __str__(self) -> str: ...
+    # TODO: could return Interval with different type
+    def __add__(
+        self, y: numbers.Number | np.timedelta64 | timedelta


I think you need to make the operators type specific. For Interval[Timestamp], you can only add and subtract Timedelta. For the numeric ones, you can only add/subtract/multiply/divide float or int. See the latest in microsoft/python-type-stubs#167

Dr-Irv · 2022-02-21T04:12:48Z

pandas/_libs/interval.pyi

+    def __sub__(
+        self, y: numbers.Number | np.timedelta64 | timedelta
+    ) -> Interval[_OrderableT]: ...
+    def __mul__(self, y: numbers.Number) -> Interval[_OrderableT]: ...


multiply and divide don't apply for the Timestamp intervals

pandas/_libs/interval.pyi

Dr-Irv · 2022-02-21T04:17:48Z

pandas/_libs/interval.pyi

+    def __truediv__(self, y: numbers.Number) -> Interval[_OrderableT]: ...
+    def __floordiv__(self, y: numbers.Number) -> Interval[_OrderableT]: ...
+    def __hash__(self) -> int: ...
+    def __contains__(self: Interval[_OrderableT], key: _OrderableT) -> bool: ...


I think you have to be explicit here about the 4 different types. You can't have __contains__() support testing an integer inside a Timestamp interval. See PR microsoft/python-type-stubs#167

The two _OrderableT bind to the same type simultaneously.

i2 = pd.Interval(10, 20) i2.__contains__(4) # ok i2.__contains__(4.0) # error: Unsupported operand types for in ("float" and "Interval[int]") [operator] i3 = pd.Interval(13.2, 19.5) i3.__contains__(4) # ok i3.__contains__(4.0) # ok i3.__contains__(pd.Timestamp(0)) # error: Unsupported operand types for in ("Timestamp" and "Interval[float]") [operator]

Thank you for your feedback! I will integrate it over the next days. If you prefer, I can also leave the operators as-is and leave it to you - doesn't feel great copy&pasting your code ;)

I think that i2.__contains__(4.0) should be allowed. If you have an interval based on integers, you can test whether a float is inside.

Dr-Irv · 2022-02-21T04:25:19Z

pandas/core/indexes/interval.py

@@ -663,7 +672,9 @@ def _get_indexer(
            # homogeneous scalar index: use IntervalTree
            # we should always have self._should_partial_index(target) here
            target = self._maybe_convert_i8(target)
-            indexer = self._engine.get_indexer(target.values)
+            # error: Argument 1 to "get_indexer" of "IntervalTree" has incompatible type


why not add typing to _maybe_convert_i8 ?

simonjayhawkins · 2022-02-21T10:54:06Z

cc @simonjayhawkins if any comments

yep. lgtm. always happy to have PRs that add more types merged if mypy is green.

@Dr-Irv comments and pre-commit failure outstanding.

twoertwein · 2022-02-23T14:53:02Z

Closing in favor of #46098

twoertwein marked this pull request as draft December 16, 2021 03:43

twoertwein changed the title ~~interval.pyi~~ TYP: interval.pyi Dec 16, 2021

twoertwein mentioned this pull request Dec 16, 2021

TYP: enable reportGeneralTypeIssues but add many exceptions #44855

Closed

twoertwein commented Dec 16, 2021

View reviewed changes

twoertwein marked this pull request as ready for review December 16, 2021 16:39

twoertwein added the Typing type annotations, mypy/pyright type checking label Dec 16, 2021

twoertwein commented Dec 16, 2021

View reviewed changes

jreback added this to the 1.4 milestone Dec 19, 2021

jreback requested changes Dec 19, 2021

View reviewed changes

pandas/core/indexes/interval.py Outdated Show resolved Hide resolved

twoertwein commented Dec 20, 2021

View reviewed changes

pandas/_libs/interval.pyi Outdated Show resolved Hide resolved