Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected PyErr_Fetch() behavior change #102594

Closed
rwgk opened this issue Mar 11, 2023 · 22 comments · Fixed by #102675
Closed

Suspected PyErr_Fetch() behavior change #102594

rwgk opened this issue Mar 11, 2023 · 22 comments · Fixed by #102675
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@rwgk
Copy link

rwgk commented Mar 11, 2023

Bug report

While testing pybind11 with Python 3.12alpha6 I ran into what looks like a PyErr_Fetch() behavior change. I also tried with the main branch @ c6858d1.

I think the issue reduces to:

The special twist is that the exception type involved raises a ValueError in its __init__:

For easy reference the same code copy-pasted here:

class FlakyException(Exception):
    def __init__(self, failure_point):
        if failure_point == "failure_point_init":
            raise ValueError("triggered_failure_point_init")
        self.failure_point = failure_point
  • Up to and including 3.12alpha3: PyErr_Fetch() produces the FlakyException type (this here).

  • With 3.12alpha6: PyErr_Fetch() produces the ValueError type instead.

Additional detail:

Up to and including 3.12alpha3: only PyErr_NormalizeException() hits the ValueError in FlakyException._init__, but not PyErr_Fetch().

Your environment

  • CPython versions tested on: 3.12alpha6, main branch @ c6858d1
  • Operating system and architecture: Debian 5.19.11 clang14, ubuntu22 gcc11

Linked PRs

@rwgk rwgk added the type-bug An unexpected behavior, bug, or error label Mar 11, 2023
@kumaraditya303 kumaraditya303 added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.12 bugs and security fixes labels Mar 11, 2023
@iritkatriel iritkatriel removed their assignment Mar 11, 2023
@iritkatriel
Copy link
Member

Thanks @corona10 . A cc in a comment is enough to bring an issue to someone's attention (assigned means something different and could prevent others from taking a look).

@corona10
Copy link
Member

Ooop sorry. I will care!

@iritkatriel
Copy link
Member

iritkatriel commented Mar 11, 2023

@rwgk Can you try to reproduce this in a unit test? We now have in Lib/test/test_capi/test_exceptions.py a test called test_set_object that calls SetObject from python. I will backport this test to 3.11 so that we can use it there too: #102602 .

@iritkatriel
Copy link
Member

@rwgk If not with the python unit test as described above, a c program reproducing the issue would also be helpful.

@rwgk
Copy link
Author

rwgk commented Mar 12, 2023

I created two PRs:

GHA for #102606 never finished, but local testing passes.

GHA result for the #102607:

======================================================================
FAIL: test_fetch_exception_with_broken_init (test.test_capi.test_exceptions.Test_ErrSetAndRestore.test_fetch_exception_with_broken_init)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_capi/test_exceptions.py", line 179, in test_fetch_exception_with_broken_init
    self.assertEqual(fetched_type_name, 'FlakyException')
AssertionError: 'ValueError' != 'FlakyException'
- ValueError
+ FlakyException
----------------------------------------------------------------------

FWIW, I also tested locally with efc985a: Behavior also/already changed.

@iritkatriel
Copy link
Member

Thank you for the full repro. When I tried to just call _testcapi.exc_set_object_fetch(FlakyException, ()) from python I didn't see the change, because the ValueError gets normalised while it's being raised.

The change in behaviour is in SetObject - the exception is now being normalised there, and stored in the interpreter in its normalised form, instead of being stored in the denormalised form and needing to be normalised after Fetch.

In the case of your exception, which raises a value error from its __init__, normalising it after fetch would result in the ValueError, so there is no difference when you are in python, but if you just call PyErr_Fetch and look at the result you see a change in behaviour, because you can observe the unnormalised exception. Note that PyErr_Fetch is deprecated in 3.12 in favour of PyErr_GetRaisedException(), which always returns a normalised exception.

I think we should also deprecate PyErr_SetObject, and direct users to use PyErr_SetRaisedException() instead (so they will have to create the exception instance beforehand, and there will be no room for confusion about which exception is being set as raised).

As for the change in behaviour reported here - I don't see a way to make this snippet behave as before, without reverting #101607. And I don't think we want to revert that for this edge case. I think we should document the change and deprecate PyErr_SetObject. There is a precedent for something similar in 3.11 - see the edge cases mentioned as changed due to bpo-45711 in the release notes.

CC @markshannon
CC @Yhg1s as release manager.

@markshannon
Copy link
Member

The docs are vague.

I think the old behavior was broken and that it is now fixed.

  1. The docs use the term, "error indicator" singular, which would imply that an exception is created by PyErr_SetObject() from the type, value arguments.
  2. The new behavior is consistent with Python:
>>> class BorkedException(Exception):
...    def __init__(self):
...        raise ValueError("wot?")
...
>>> raise BorkedException
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __init__
ValueError: wot?

Issue about the strange behavior of PyErr_SetObject() #101578

+1 to deprecating PyErr_SetObject().

@rwgk
Copy link
Author

rwgk commented Mar 13, 2023

The new behavior masks errors in the error handling, which is ... terrible, in lack of a more diplomatic word.

A behavior change seems desirable, but if the behavior is changed, I think it's important to make a decision based on a careful and thorough assessment of the pros and cons.

When I was working on pybind/pybind11#1895 last year, I was struck by the apparent blissful ignorance in the PyErr_* APIs. Implicitly the APIs assume that the error handling code is bug free. That's of course not a reality, in particular in large-scale systems (I'm at Google). I know from first-hand experience that masked errors in the error handling can take weeks to troubleshoot, especially in a complex system with hundreds of thousands of dependencies (no exaggeration). (Actually, we never got to the bottom of one particular problem that was triggered by a change I made deep down in the core of our stack. I wound up spending a couple months revamping some core components around PyCLIF, and then sinking a couple weeks just into pybind/pybind11#1895, in preparation for making pybind11 a critical part of the Google infrastructure.)

This code is the result of extensive experiments and discussions (under pybind/pybind11#1895) to find a way to NOT mask errors in the error handling:

https://github.com/pybind/pybind11/blob/442261da585536521ff459b1457b2904895f23b4/include/pybind11/pytypes.h#L474-L527

I believe

  • the PyErr_* APIs need to be extended to clearly diagnose and report errors in the error handling, similar in spirit to what's done in that pybind11 code (reporting a MISMATCH of original and normalized active exception types or better, e.g. terminating the process might be safest, although the fallout will probably be somewhere between expensive and impractical to handle).
  • Until there is a plan to do that cleanly, it will be best to not let apparently unintentional behavior changes slip in, because that will most likely become an additional obstacle on a path to a clean solution.
>>> raise BorkedException
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __init__
ValueError: wot?

That's not good, too, for the same reasons. It masks errors and can potentially add weeks of delays to projects like upgrading from Python 3.N to 3.N+1, or modernizing core components more generally.

@iritkatriel
Copy link
Member

The new APIs don’t need to deal with mismatch between normalised and non-normalised exceptions because they only have normalised exceptions now. We are in the process of deprecating the non-normalised form.

@markshannon
Copy link
Member

The new behavior masks errors in the error handling

I don't think it does. If instantiation of the BorkedException raises a ValueError, then that ValueError is what you get as the exception. Setting the exception to BorkedException would mask the error.

What did you expect to happen in this case?

A behavior change seems desirable, but if the behavior is changed, I think it's important to make a decision based on a careful and thorough assessment of the pros and cons.

We did. There really don't seem to be any cons. The earlier normalization seems to surface latent bugs (such as the one in your FlakyException).

If you are handling flaky exception classes, why not create the exception first, then set the exception?

Here's the code to create, then set the exception:

PyObject *ex = PyObject_CallOneArg(exc_type, value);
if (ex == NULL) {
    /* handle the error */
}
else {
    PyErr_SetRaisedException(ex);
}

@gvanrossum
Copy link
Member

I don't consider process termination when exception normalization fails acceptable, given that the example FlakyException has a failure in its pure Python code that doesn't compromise the Python VM state at all.

A possible compromise could be to raise a third exception, whose constructor by design cannot fail (except when running out of memory, in which case we can just let the MemoryError bubble up), and whose purpose is to clearly indicate that exception construction failed. It could be a C version of the following:

class ExceptionNormalizationFailed(BaseException):
    def __init__(self, intended: type[BaseException], arg: object, actual: BaseException):
        super().__init__(intended, arg, actual)
        self.intended = intended
        self.arg = arg
        self.actual = actual
        # maybe also set __cause__ or __context__ to actual? And/or set __traceback__ from actual.__traceback__?
    def __str__(self):
        return f"Attempt to construct {self.intended!r} instance with arg[s] {self.arg!r} raised {self.actual!r}"

(This is just a strawman design, all the details are up for discussion.)

This would be raised whenever normalization raises an exception that isn't an instance of the intended exception. In the BorkedException example we'd end up with something like this:

>>> raise BorkedException
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __init__
ExceptionNormalizationFailed: Attempt to construct <class '__main__.BorkedException> with arg[s] None raised ValueError('wot?')

Of course this would still be a change in behavior compared to Python 3.11 and before, so I don't know if it's acceptable to the OP.

@rwgk
Copy link
Author

rwgk commented Mar 13, 2023

Of course this would still be a change in behavior compared to Python 3.11 and before, so I don't know if it's acceptable to the OP.

I really like this suggestion and would be more than happy to tweak pybind11 accordingly.

@gvanrossum
Copy link
Member

I don't consider process termination when exception normalization fails acceptable

Was anyone suggesting that?

The OP:

the PyErr_* APIs need to be extended to clearly diagnose and report errors in the error handling, similar in spirit to what's done in that pybind11 code (reporting a MISMATCH of original and normalized active exception types or better, e.g. terminating the process might be safest, although the fallout will probably be somewhere between expensive and impractical to handle).

@rwgk
Copy link
Author

rwgk commented Mar 13, 2023

I don't consider process termination when exception normalization fails acceptable

Was anyone suggesting that?

The OP:

the PyErr_* APIs need to be extended to clearly diagnose and report errors in the error handling, similar in spirit to what's done in that pybind11 code (reporting a MISMATCH of original and normalized active exception types or better, e.g. terminating the process might be safest, although the fallout will probably be somewhere between expensive and impractical to handle).

I like ExceptionNormalizationFailed much better. (Terminating the process is only better than masking, IMO.)

@rwgk
Copy link
Author

rwgk commented Mar 13, 2023

If you are handling flaky exception classes, why not create the exception first, then set the exception?

I think in the context of pybind11 that's not really an option, because that would push the responsibility to the user code.

The pattern is:

... user code does something (anything) ...
throw pybind11::error_already_set();

When pybind11::error_already_set is constructed, the very first interaction with the Python C API is PyErr_Fetch() (here).

@iritkatriel
Copy link
Member

I really like this suggestion and would be more than happy to tweak pybind11 accordingly.

If you're already tweaking pybind11, wouldn't you rather follow @markshannon 's suggestion and have full control over how you handle errors?

I see the advantage of ExceptionNormalizationFailed in that it can provide full information about what happened in programs that call PyErr_SetObject as is. But really we should deprecate construct+set in one function.

@iritkatriel
Copy link
Member

How about if instead of ExceptionNormalizationFailed, we raise the exception that came from __init__ with a PEP-678 note about the normalisation failure?

@gvanrossum
Copy link
Member

A note sounds great because it is much simpler than the proposed new exception.

@rwgk
Copy link
Author

rwgk commented Mar 13, 2023

A note sounds great because it is much simpler than the proposed new exception.

Sounds good to me, too. (I didn't know about notes until just now TBH.)
As long as there is a discoverable trace of what happened, ideally with the suggested repr of the original args, I think we have what we need to avoid extremely expensive troubleshooting.

@gvanrossum
Copy link
Member

What if repr of the args raises? It's quite common to have user classes with poorly behaving __repr__. Or even repr of the intended class? (Harder but possible using a metaclass.) Or of the actual exception? We could just make the note contain a placeholder like "???" instead then.

@iritkatriel
Copy link
Member

I’ll check what the traceback printing code does with such errors and do the same thing.

@rwgk
Copy link
Author

rwgk commented Mar 13, 2023

What if repr of the args raises? It's quite common to have user classes with poorly behaving __repr__. Or even repr of the intended class? (Harder but possible using a metaclass.) Or of the actual exception? We could just make the note contain a placeholder like "???" instead then.

Sounds good. The approach I took for something similar/related in pybind11 is to generate messages like MESSAGE UNAVAILABLE DUE TO ANOTHER EXCEPTION (here).

I made an attempt to generate messages that can relatively easily be pin-pointed in potentially gigantic log files.

carljm added a commit to carljm/cpython that referenced this issue Mar 17, 2023
* main: (34 commits)
  pythongh-102701: Fix overflow in dictobject.c (pythonGH-102750)
  pythonGH-78530: add support for generators in `asyncio.wait` (python#102761)
  Increase stack reserve size for Windows debug builds to avoid test crashes (pythonGH-102764)
  pythongh-102755: Add PyErr_DisplayException(exc) (python#102756)
  Fix outdated note about 'int' rounding or truncating (python#102736)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102760)
  pythongh-99726: Improves correctness of stat results for Windows, and uses faster API when available (pythonGH-102149)
  pythongh-102192: remove redundant exception fields from ssl module socket (python#102466)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102743)
  pythongh-102737: Un-ignore ceval.c in the CI globals check (pythongh-102745)
  pythonGH-102748: remove legacy support for generator based coroutines from `asyncio.iscoroutine` (python#102749)
  pythongh-102721: Improve coverage of `_collections_abc._CallableGenericAlias` (python#102722)
  pythonGH-102653: Make recipe docstring show the correct distribution (python#102742)
  Add comments to `{typing,_collections_abc}._type_repr` about each other (python#102752)
  pythongh-102594: PyErr_SetObject adds note to exception raised on normalization error (python#102675)
  pythongh-94440: Fix issue of ProcessPoolExecutor shutdown hanging (python#94468)
  pythonGH-100112:  avoid using iterable coroutines in asyncio internally (python#100128)
  pythongh-102690: Use Edge as fallback in webbrowser instead of IE (python#102691)
  pythongh-102660: Fix Refleaks in import.c (python#102744)
  pythongh-102738: remove from cases generator the code related to register instructions (python#102739)
  ...
Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this issue Mar 27, 2023
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants