feat(python): Hide `polars.testing.*` in pytest stack traces #14399

kalekundert · 2024-02-09T21:05:12Z

Currently, pytest generates very long and uninformative tracebacks for failing tests that use polars.testing.assert_frame_equal():

import polars as pl
from polars.testing import assert_frame_equal

def test_df():
    df1 = pl.DataFrame({'a': [1, 2]})
    df2 = pl.DataFrame({'a': [1, 3]})
    assert_frame_equal(df1, df2)

Long stack trace

============================= test session starts ==============================
platform linux -- Python 3.10.0, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/kale
plugins: cov-4.0.0, env-0.6.2, subtests-0.7.0, xdoctest-1.1.1, hypothesis-6.39.4, devtools-0.12.2, unordered-0.4.1, hydra-core-1.3.2, pytest_tmp_files-0.0.0, xonsh-0.12.1, xdist-2.5.0, forked-1.4.0, anyio-3.7.1, typeguard-4.0.0
collected 1 item

test_df.py F                                                             [100%]

=================================== FAILURES ===================================
___________________________________ test_df ____________________________________

left = shape: (2,)
Series: 'a' [i64]
[
	1
	2
]
right = shape: (2,)
Series: 'a' [i64]
[
	1
	3
]

    def _assert_series_values_equal(
        left: Series,
        right: Series,
        *,
        check_exact: bool,
        rtol: float,
        atol: float,
        categorical_as_str: bool,
    ) -> None:
        """Assert that the values in both Series are equal."""
        # Handle categoricals
        if categorical_as_str:
            if left.dtype == Categorical:
                left = left.cast(String)
            if right.dtype == Categorical:
                right = right.cast(String)

        # Handle decimals
        # TODO: Delete this branch when Decimal equality is implemented
        # https://github.com/pola-rs/polars/issues/12118
        if left.dtype == Decimal:
            left = left.cast(Float64)
        if right.dtype == Decimal:
            right = right.cast(Float64)

        # Determine unequal elements
        try:
            unequal = left.ne_missing(right)
        except ComputeError as exc:
            raise_assertion_error(
                "Series",
                "incompatible data types",
                left=left.dtype,
                right=right.dtype,
                cause=exc,
            )

        # Check nested dtypes in separate function
        if _comparing_nested_floats(left.dtype, right.dtype):
            try:
                _assert_series_nested_values_equal(
                    left=left.filter(unequal),
                    right=right.filter(unequal),
                    check_exact=check_exact,
                    rtol=rtol,
                    atol=atol,
                    categorical_as_str=categorical_as_str,
                )
            except AssertionError as exc:
                raise_assertion_error(
                    "Series",
                    "nested value mismatch",
                    left=left.to_list(),
                    right=right.to_list(),
                    cause=exc,
                )
            else:  # All nested values match
                return

        # If no differences found during exact checking, we're done
        if not unequal.any():
            return

        # Only do inexact checking for float types
        if check_exact or not left.dtype.is_float() or not right.dtype.is_float():
>           raise_assertion_error(
                "Series", "exact value mismatch", left=left.to_list(), right=right.to_list()
            )
E           AssertionError: Series are different (exact value mismatch)
E           [left]:  [1, 2]
E           [right]: [1, 3]

.pyenv/versions/3.10.0/lib/python3.10/site-packages/polars/testing/asserts/series.py:180: AssertionError

The above exception was the direct cause of the following exception:

    def test_df():
        df1 = pl.DataFrame({'a': [1, 2]})
        df2 = pl.DataFrame({'a': [1, 3]})
>       assert_frame_equal(df1, df2)
E       AssertionError: DataFrames are different (value mismatch for column 'a')
E       [left]:  [1, 2]
E       [right]: [1, 3]

test_df.py:7: AssertionError
=========================== short test summary info ============================
FAILED test_df.py::test_df - AssertionError: DataFrames are different (value mismatch for column 'a')
============================== 1 failed in 0.83s ===============================

The issue is that pytest includes internal polars.testing functions in the traceback. This is pretty much never helpful, since whatever bug is causing the test to fail is going to be in the user's code, not polar's code. To accommodate situations like this, pytest excludes frames that define __tracebackhide__ = True from the stack traces it generates.

In this PR, I added the above definition to every polars.testing function that might end up in a stack trace (i.e. any function that can raise an assertion error). I also wrote unit tests that actually run pytest and check that the resulting stdout doesn't mention any internal testing functions. The above unit test would now produce the following output:

============================= test session starts ==============================
platform linux -- Python 3.10.11, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/kbk8
plugins: hypothesis-6.97.4, xdist-3.5.0, cov-4.1.0
collected 1 item

../../../../test_df.py F                                                 [100%]

=================================== FAILURES ===================================
___________________________________ test_df ____________________________________

E   AssertionError: Series are different (exact value mismatch)
    [left]:  [1, 2]
    [right]: [1, 3]
All traceback entries are hidden. Pass `--full-trace` to see hidden and internal frames.

The above exception was the direct cause of the following exception:

    def test_df():
        df1 = pl.DataFrame({'a': [1, 2]})
        df2 = pl.DataFrame({'a': [1, 3]})
>       assert_frame_equal(df1, df2)
E       AssertionError: DataFrames are different (value mismatch for column 'a')
E       [left]:  [1, 2]
E       [right]: [1, 3]

/home/kbk8/test_df.py:7: AssertionError
=========================== short test summary info ============================
FAILED ../../../../test_df.py::test_df - AssertionError: DataFrames are different (value mismatch for column 'a')
============================== 1 failed in 1.48s ===============================

Going beyond the changes I actually made in this PR, I think the above stack trace is still more complicated than it needs to be. The issue is that assert_frame_equal() invokes _assert_series_values_equal(). The latter raises an assertion error, then the former raises another assertion error "from" the first. So we end up with two exceptions, one that says the data frames are unequal and another that says the reason is an "exact value mismatch".

I don't think this is an intuitive or clear way to present this information. Instead, I'd recommend something like this:

============================= test session starts ==============================
platform linux -- Python 3.10.11, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/kbk8
plugins: hypothesis-6.97.4, xdist-3.5.0, cov-4.1.0
collected 1 item

../../../../test_df.py F                                                 [100%]

=================================== FAILURES ===================================
___________________________________ test_df ____________________________________

    def test_df():
        df1 = pl.DataFrame({'a': [1, 2]})
        df2 = pl.DataFrame({'a': [1, 3]})
>       assert_frame_equal(df1, df2)
E       AssertionError: DataFrames are different (exact value mismatch for column 'a')
E       [left]:  [1, 2]
E       [right]: [1, 3]

/home/kbk8/test_df.py:7: AssertionError
=========================== short test summary info ============================
FAILED ../../../../test_df.py::test_df - AssertionError: DataFrames are different (value mismatch for column 'a')
============================== 1 failed in 0.97s ===============================

Note that there's now just a single exception, which combines the information from both exceptions. This would be a very easy change to make. I decided against doing it here, since it's outside the scope of what I was originally trying to do and seems potentially controversial. But if the maintainers would like this change, let me know and I'll either add it to this PR or make a new one.

stinodego · 2024-02-09T23:08:41Z

Thanks for this! We've looked into __tracebackhide__ before but didn't get it to work as nicely as we hoped.

I'll look into this more closely soon.

stinodego

Thanks - this looks great and can be merged as-is.

I only have a small concern that the test_tracebackhide tests are not very maintainable as there is a lot of code in there in the form of multiline strings. But I wouldn't know a better way to do this, so I guess it's fine.

Going beyond the changes I actually made in this PR, I think the above stack trace is still more complicated than it needs to be. The issue is that assert_frame_equal() invokes _assert_series_values_equal(). The latter raises an assertion error, then the former raises another assertion error "from" the first. So we end up with two exceptions, one that says the data frames are unequal and another that says the reason is an "exact value mismatch".

Please feel free to adjust this as well! Looking forward to the PR.

When that PR is merged, we can remove the recommendation to use --tb=short from the assertion util docstrings, as well as remove it from our own pyproject.toml pytest settings.

alexander-beedie · 2024-02-12T13:25:30Z

In this PR, I added the above definition to every polars.testing function that might end up in a stack trace (i.e. any function that can raise an assertion error).

Nice one; very helpful, thanks :)

fix(python): hide polars.testing.* in pytest stack traces

33cfb51

kalekundert requested review from ritchie46, stinodego, c-peters, alexander-beedie and MarcoGorelli as code owners February 9, 2024 21:05

github-actions bot added fix Bug fix python Related to Python Polars labels Feb 9, 2024

stinodego changed the title ~~fix(python): hide polars.testing.* in pytest stack traces~~ fix(python): Hide polars.testing.* in pytest stack traces Feb 11, 2024

stinodego changed the title ~~fix(python): Hide polars.testing.* in pytest stack traces~~ feat(python): Hide polars.testing.* in pytest stack traces Feb 11, 2024

stinodego removed the fix Bug fix label Feb 11, 2024

github-actions bot added the enhancement New feature or an improvement of an existing feature label Feb 11, 2024

stinodego approved these changes Feb 11, 2024

View reviewed changes

stinodego merged commit 81bd89c into pola-rs:main Feb 11, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): Hide `polars.testing.*` in pytest stack traces #14399

feat(python): Hide `polars.testing.*` in pytest stack traces #14399

kalekundert commented Feb 9, 2024 •

edited

Loading

stinodego commented Feb 9, 2024

stinodego left a comment •

edited

Loading

alexander-beedie commented Feb 12, 2024

feat(python): Hide polars.testing.* in pytest stack traces #14399

feat(python): Hide polars.testing.* in pytest stack traces #14399

Conversation

kalekundert commented Feb 9, 2024 • edited Loading

stinodego commented Feb 9, 2024

stinodego left a comment • edited Loading

Choose a reason for hiding this comment

alexander-beedie commented Feb 12, 2024

feat(python): Hide `polars.testing.*` in pytest stack traces #14399

feat(python): Hide `polars.testing.*` in pytest stack traces #14399

kalekundert commented Feb 9, 2024 •

edited

Loading

stinodego left a comment •

edited

Loading