Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Hide polars.testing.* in pytest stack traces #14399

Merged
merged 1 commit into from
Feb 11, 2024

Conversation

kalekundert
Copy link
Contributor

@kalekundert kalekundert commented Feb 9, 2024

Currently, pytest generates very long and uninformative tracebacks for failing tests that use polars.testing.assert_frame_equal():

import polars as pl
from polars.testing import assert_frame_equal

def test_df():
    df1 = pl.DataFrame({'a': [1, 2]})
    df2 = pl.DataFrame({'a': [1, 3]})
    assert_frame_equal(df1, df2)
Long stack trace
============================= test session starts ==============================
platform linux -- Python 3.10.0, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/kale
plugins: cov-4.0.0, env-0.6.2, subtests-0.7.0, xdoctest-1.1.1, hypothesis-6.39.4, devtools-0.12.2, unordered-0.4.1, hydra-core-1.3.2, pytest_tmp_files-0.0.0, xonsh-0.12.1, xdist-2.5.0, forked-1.4.0, anyio-3.7.1, typeguard-4.0.0
collected 1 item

test_df.py F                                                             [100%]

=================================== FAILURES ===================================
___________________________________ test_df ____________________________________

left = shape: (2,)
Series: 'a' [i64]
[
	1
	2
]
right = shape: (2,)
Series: 'a' [i64]
[
	1
	3
]

    def _assert_series_values_equal(
        left: Series,
        right: Series,
        *,
        check_exact: bool,
        rtol: float,
        atol: float,
        categorical_as_str: bool,
    ) -> None:
        """Assert that the values in both Series are equal."""
        # Handle categoricals
        if categorical_as_str:
            if left.dtype == Categorical:
                left = left.cast(String)
            if right.dtype == Categorical:
                right = right.cast(String)

        # Handle decimals
        # TODO: Delete this branch when Decimal equality is implemented
        # https://github.com/pola-rs/polars/issues/12118
        if left.dtype == Decimal:
            left = left.cast(Float64)
        if right.dtype == Decimal:
            right = right.cast(Float64)

        # Determine unequal elements
        try:
            unequal = left.ne_missing(right)
        except ComputeError as exc:
            raise_assertion_error(
                "Series",
                "incompatible data types",
                left=left.dtype,
                right=right.dtype,
                cause=exc,
            )

        # Check nested dtypes in separate function
        if _comparing_nested_floats(left.dtype, right.dtype):
            try:
                _assert_series_nested_values_equal(
                    left=left.filter(unequal),
                    right=right.filter(unequal),
                    check_exact=check_exact,
                    rtol=rtol,
                    atol=atol,
                    categorical_as_str=categorical_as_str,
                )
            except AssertionError as exc:
                raise_assertion_error(
                    "Series",
                    "nested value mismatch",
                    left=left.to_list(),
                    right=right.to_list(),
                    cause=exc,
                )
            else:  # All nested values match
                return

        # If no differences found during exact checking, we're done
        if not unequal.any():
            return

        # Only do inexact checking for float types
        if check_exact or not left.dtype.is_float() or not right.dtype.is_float():
>           raise_assertion_error(
                "Series", "exact value mismatch", left=left.to_list(), right=right.to_list()
            )
E           AssertionError: Series are different (exact value mismatch)
E           [left]:  [1, 2]
E           [right]: [1, 3]

.pyenv/versions/3.10.0/lib/python3.10/site-packages/polars/testing/asserts/series.py:180: AssertionError

The above exception was the direct cause of the following exception:

    def test_df():
        df1 = pl.DataFrame({'a': [1, 2]})
        df2 = pl.DataFrame({'a': [1, 3]})
>       assert_frame_equal(df1, df2)
E       AssertionError: DataFrames are different (value mismatch for column 'a')
E       [left]:  [1, 2]
E       [right]: [1, 3]

test_df.py:7: AssertionError
=========================== short test summary info ============================
FAILED test_df.py::test_df - AssertionError: DataFrames are different (value mismatch for column 'a')
============================== 1 failed in 0.83s ===============================

The issue is that pytest includes internal polars.testing functions in the traceback. This is pretty much never helpful, since whatever bug is causing the test to fail is going to be in the user's code, not polar's code. To accommodate situations like this, pytest excludes frames that define __tracebackhide__ = True from the stack traces it generates.

In this PR, I added the above definition to every polars.testing function that might end up in a stack trace (i.e. any function that can raise an assertion error). I also wrote unit tests that actually run pytest and check that the resulting stdout doesn't mention any internal testing functions. The above unit test would now produce the following output:

============================= test session starts ==============================
platform linux -- Python 3.10.11, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/kbk8
plugins: hypothesis-6.97.4, xdist-3.5.0, cov-4.1.0
collected 1 item

../../../../test_df.py F                                                 [100%]

=================================== FAILURES ===================================
___________________________________ test_df ____________________________________

E   AssertionError: Series are different (exact value mismatch)
    [left]:  [1, 2]
    [right]: [1, 3]
All traceback entries are hidden. Pass `--full-trace` to see hidden and internal frames.

The above exception was the direct cause of the following exception:

    def test_df():
        df1 = pl.DataFrame({'a': [1, 2]})
        df2 = pl.DataFrame({'a': [1, 3]})
>       assert_frame_equal(df1, df2)
E       AssertionError: DataFrames are different (value mismatch for column 'a')
E       [left]:  [1, 2]
E       [right]: [1, 3]

/home/kbk8/test_df.py:7: AssertionError
=========================== short test summary info ============================
FAILED ../../../../test_df.py::test_df - AssertionError: DataFrames are different (value mismatch for column 'a')
============================== 1 failed in 1.48s ===============================

Going beyond the changes I actually made in this PR, I think the above stack trace is still more complicated than it needs to be. The issue is that assert_frame_equal() invokes _assert_series_values_equal(). The latter raises an assertion error, then the former raises another assertion error "from" the first. So we end up with two exceptions, one that says the data frames are unequal and another that says the reason is an "exact value mismatch".

I don't think this is an intuitive or clear way to present this information. Instead, I'd recommend something like this:

============================= test session starts ==============================
platform linux -- Python 3.10.11, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/kbk8
plugins: hypothesis-6.97.4, xdist-3.5.0, cov-4.1.0
collected 1 item

../../../../test_df.py F                                                 [100%]

=================================== FAILURES ===================================
___________________________________ test_df ____________________________________

    def test_df():
        df1 = pl.DataFrame({'a': [1, 2]})
        df2 = pl.DataFrame({'a': [1, 3]})
>       assert_frame_equal(df1, df2)
E       AssertionError: DataFrames are different (exact value mismatch for column 'a')
E       [left]:  [1, 2]
E       [right]: [1, 3]

/home/kbk8/test_df.py:7: AssertionError
=========================== short test summary info ============================
FAILED ../../../../test_df.py::test_df - AssertionError: DataFrames are different (value mismatch for column 'a')
============================== 1 failed in 0.97s ===============================

Note that there's now just a single exception, which combines the information from both exceptions. This would be a very easy change to make. I decided against doing it here, since it's outside the scope of what I was originally trying to do and seems potentially controversial. But if the maintainers would like this change, let me know and I'll either add it to this PR or make a new one.

@stinodego
Copy link
Member

Thanks for this! We've looked into __tracebackhide__ before but didn't get it to work as nicely as we hoped.

I'll look into this more closely soon.

@stinodego stinodego changed the title fix(python): hide polars.testing.* in pytest stack traces fix(python): Hide polars.testing.* in pytest stack traces Feb 11, 2024
@stinodego stinodego changed the title fix(python): Hide polars.testing.* in pytest stack traces feat(python): Hide polars.testing.* in pytest stack traces Feb 11, 2024
@stinodego stinodego removed the fix Bug fix label Feb 11, 2024
@github-actions github-actions bot added the enhancement New feature or an improvement of an existing feature label Feb 11, 2024
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this looks great and can be merged as-is.

I only have a small concern that the test_tracebackhide tests are not very maintainable as there is a lot of code in there in the form of multiline strings. But I wouldn't know a better way to do this, so I guess it's fine.

Going beyond the changes I actually made in this PR, I think the above stack trace is still more complicated than it needs to be. The issue is that assert_frame_equal() invokes _assert_series_values_equal(). The latter raises an assertion error, then the former raises another assertion error "from" the first. So we end up with two exceptions, one that says the data frames are unequal and another that says the reason is an "exact value mismatch".

Please feel free to adjust this as well! Looking forward to the PR.

When that PR is merged, we can remove the recommendation to use --tb=short from the assertion util docstrings, as well as remove it from our own pyproject.toml pytest settings.

@stinodego stinodego merged commit 81bd89c into pola-rs:main Feb 11, 2024
15 checks passed
@alexander-beedie
Copy link
Collaborator

In this PR, I added the above definition to every polars.testing function that might end up in a stack trace (i.e. any function that can raise an assertion error).

Nice one; very helpful, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants