Fix functional.h bug + introduce test to verify that it is fixed #4254

EthanSteinberg · 2022-10-19T04:54:07Z

Description

Functional.h is currently buggy when working with manually defined CPython functions. You get segfaults whenever you try to convert a manually defined CPython function to a std::function.

This PR adds a test that illustrates the bug as well as some #ifdef guards around the broken code in functional.h

https://github.com/pybind/pybind11/pull/4254/files#diff-89e29c03817f899ff72bdb90772cb781df9c82044ae712fb68487be02dd7e3cdL49-R72 in particular is the cause of the bug.

As that code path is just an optimization, it can be removed without breaking other pieces of code. However, it will be a significant performance regression.

Suggested changelog entry:

* Fix segfault bug when passing foreign native functions to functional.h

Skylion007 · 2022-10-19T05:39:43Z

Oof, @rwgk got any Python experts to ping about this?

rwgk · 2022-10-19T05:51:30Z

            auto *cfunc_self = PyCFunction_GET_SELF(cfunc.ptr());
            if (isinstance<capsule>(cfunc_self)) {

Did you already try to do something like this?

            auto *cfunc_self = PyCFunction_GET_SELF(cfunc.ptr());
            if (cfunc_self == nullptr) {
                PyErr_Clear();
            } else if (isinstance<capsule>(cfunc_self)) {

include/pybind11/functional.h

Skylion007 · 2022-10-19T17:40:58Z

@rwgk @lalaland @henryiii I added workaround, but unclear if I should keep the new pytype ctor with it's current argument ordering. It might be the best way to disambiguate whether the dtor func takes the capsule itself or the underlying void value. Alternatively, we may want to create a custom tag long term.

include/pybind11/functional.h

include/pybind11/pytypes.h

include/pybind11/functional.h

EthanSteinberg · 2022-10-19T19:28:20Z

@Skylion007 I don't think it would be an abi break to set a name and check it in functional.h

This is an optimization path, so if we incorrectly ignore a capsule, that's ok as it will just run slower.

Skylion007 · 2022-10-20T14:30:14Z

Okay @Lalaland/@rwgk, whenever I try to set_name on the rec_capsules I am getting segfaults (likely from the lifetime of the name string???).

We could try to strdup the cstr in the relevant bindings and ensure to free it on our dtor, but that seems error prone. This is super weird as I never call free it on and I set it a global constexpr like `constexpr const char *FUNCTION_RECORD_CAPSULE_NAME = "pybind11_function_record" just set_name with either the method or the ctor so we shouldn't have to mange the lifetime at all since it's constexpr.

The current iteration at least ensures we won't try to convert pointers for named_capsules, but it would be great if we could give func_record_capsules their own name without segfaulting. Unfortunately, I do not have time to look into this further.

EthanSteinberg · 2022-10-20T15:00:17Z

Sorry for the precommit noise(can't get pre-commit to run on my machine right now), but that should fix the problem and assign a name.

include/pybind11/pybind11.h

Skylion007 · 2022-10-21T16:20:28Z

@lalaland One scenario I am worried about. Suppose we have multiple different translation units, is the address guaranteed to be the same across them? I know particularly it will be with gcc/llvm, but is this guaranteed or just their default behavior? Particularly with link time optimized turned off? I am worried where splitting up a large translation unit could accidentally cause a performance regression in this case which is unexpected behavior. Part of the issue is that we do not have a regression test for this optimization.

Thoughts on this behavior @rwgk? Should we really avoid an std::strcmp here and do this pointer black magic?

EthanSteinberg · 2022-10-21T16:27:27Z

Actually, my pointer black magic logic is/was incorrect. I just pushed a fixed version.

inline const char *function_capsule_name() {
-    static const char *name = "pybind11_function_capsule";
+    static char name[] = "pybind11_function_capsule";
     return name;
}

Skylion007 · 2022-10-21T16:37:45Z

Actually, my pointer black magic logic is/was incorrect. I just pushed a fixed version.
inline const char *function_capsule_name() {
-    static const char *name = "pybind11_function_capsule";
+    static char name[] = "pybind11_function_capsule";
     return name;
}

How did you test this? Do you have a URL link to why this works and the const char pointer did not?

I also just extracted out a helper function to make the code a bit more maintainable.

EthanSteinberg · 2022-10-21T16:39:54Z

I know particularly it will be with gcc/llvm, but is this guaranteed or just their default behavior?

Per the documentation of local_internals (which uses the same trick), this should work. Each .so should get their own copy.

pybind11/include/pybind11/detail/internals.h

Line 471 in f743bdf

    
           // the internals struct (above) is shared between all the modules. local_internals are only

EthanSteinberg · 2022-10-21T16:41:40Z

Actually, let's just put this key in local_internals. So we only have this hack implemented in one location.

Skylion007 · 2022-10-21T16:55:09Z

@lalaland It should definitely put in global internals if you want to do that, but then that definitely is an ABI break. Thoughts @rwgk ? I'd keep it as it is now.

EthanSteinberg · 2022-10-21T16:56:38Z

Ok, I moved this to local_internals, and bumped the ABI version flag.

How did you test this? Do you have a URL link to why this works and the const char pointer did not?

Per the C++ standard, in order to ensure that you have a unique pointer you need to actually allocate memory. See the text in https://stackoverflow.com/questions/50610274/what-is-the-rationale-behind-returning-unique-addresses-for-allocations-of-zero

@lalaland It should definitely put in global internals if you want to do that, but then that definitely is an ABI break. Thoughts @rwgk ? I'd keep it as it is now.

You still need it in local internals to avoid ODR violations.

Skylion007

@lalaland It really should be in internals, not local internals. Callbacks are frequently and often shared between modules so this will make performance worse.

Skylion007 · 2022-10-21T17:08:30Z

Also putting this in internals does not solve the issue I have with this, mainly that the check is too strict and overly brittle. Any minor change will cause issues. Instead, we could even have a function_record_specific ABI and store that in the function_name string itself. No reason to rely on all this pointer stuff and it would be a lot less brittle. We can manually increment the rec_capsule name when we change the function record too.

include/pybind11/detail/internals.h

EthanSteinberg · 2022-10-21T17:11:48Z

Also putting this in internals does not solve the issue I have with this, mainly that the check is too strict and overly brittle. Any minor change will cause issues. Instead, we could even have a function_record_specific ABI and store that in the function_name string itself. No reason to rely on all this pointer stuff and it would be a lot less brittle. We can manually increment the rec_capsule name when we change the function record too.

Yeah, this is too smart by half. Simply having a regular name that we strcmp against with a manual version check would be a lot easier to understand.

It's a tiny bit more error-prone, but that should be OK. I'll swap it out for that.

include/pybind11/pybind11.h

rwgk

Awesome work, thanks a lot for finding & fixing this bug!

include/pybind11/detail/internals.h

include/pybind11/pybind11.h

include/pybind11/detail/internals.h

rwgk · 2022-11-01T17:16:34Z

The one CI failure is a notorious flake again. Good to ignore.

I locally retested with all sanitizers (ASAN, MSAN, UBSAN, TSAN), everything passes (this PR rebased on the smart_holder branch).

I'll also redo the global testing (easy, but will take a few hours).

include/pybind11/detail/internals.h

rwgk · 2022-11-02T07:23:05Z

I'll also redo the global testing (easy, but will take a few hours).

Global testing passed again (internal ID OCL:482903768:BASE:485514840:1667371815628:a4b009b9).

Note that the global testing now includes running all pybind11 unit tests with all sanitizers (recent change).

rwgk · 2022-11-02T18:32:46Z

Thanks again Ethan for the fix, and Aaron for the reviews!

rwgk · 2022-11-17T15:52:19Z

Hi @lalaland, based on what I learned & implemented under PR #4329, I'm wondering if we could improve on this PR, by not making any changes to the internals struct.

What I have in mind would be similar to this commit: 8737bea

In case the hash changes in the future (b/o rebase), this is the commit message:

    Split out `get_native_enum_type_map()`, leaving `struct internals` untouched (compared to master).

What do you think?

The current implementation under PR #4329 has a lot of code duplication, but we could work on that pretty easily I think.

EthanSteinberg · 2022-11-17T16:01:05Z

@rwgk If I understand your suggested change correctly, it looks like you are proposing to split our global variable "internals" into multiple "internals-like" variables? (When I say "internals-like" I mean a global variable that is shared across modules and thus needs to be ABI stable).

The main advantage of that approach is that it would give us a bit more flexibility, but I think the maintenance costs would be really really bad.

Right now ABI issues are theoretically straightforward. There is one shared struct "internals" that has all the stuff that has to be ABI stable with one ABI version. Code on different ABI versions just doesn't talk to each other, keeping things safe.

With your change there would be many "internals" structs that all have to have ABI stability with a very confusing mix of versions. It would be a lot harder to understand what code under what ABI version is talking to what other code.

If you think we need that flexibility, go for it, but we should be really aware of the price we are paying for that flexibility.

rwgk · 2022-11-17T16:17:06Z

The main advantage of that approach is that it would give us a bit more flexibility, but I think the maintenance costs would be really really bad.

I totally agree, but ... the alternative is far worse, and it's connected to how widely pybind11 is in use: the moment we bump the internals version, we're essentially cutting off interoperability with all existing wheels, silently. There will suddenly be TypeErrors without any direct information about the root cause, and how to fix it.

Say, there are 10 extensions — in wheels — with cross-extension type conversion dependencies, in the general sense that's a graph with 10 nodes. Now one of them is rebuilt with a new internals version. It is disconnected from that graph, or to offer an alternative viewpoint, a new graph is started. The 10 interdependent extensions will be fully operational again only after all 10 wheels are rebuilt.

Now go from 10 to 20 to 100.

In the Google-internal world there is no problem: I can switch everybody with a single commit.

Externally, waiting for 10 wheels to get updated takes 10 weeks, 20 10 months, 100 takes years.

I think the only way to mitigate that is to

look as the internals-like structures as carved into stone when they are released,
copy-paste-edit new versions,
provide backwards compatibility for N years, before the old internals-like structures are finally purged.

EthanSteinberg · 2022-11-17T17:02:20Z

@rwgk Why don't we tie our internals version to the Python version? And only release new internals when a new Python version gets released?

Python already has an unstable ABI. If people are recompiling anyways, might as well force them to use a newer version of pybind11's ABI.

pybind11 will always assume that there is a 1-1 map between Python version and pybind11 ABI.

EthanSteinberg · 2022-11-17T17:08:20Z

Tying our internals version to the Python version will make upgrading the internals version much much easier, as we simply wait for the next Python release and people are already being forced to recompile the world.

rwgk · 2022-11-17T18:59:45Z

Interesting idea. I need to think about this more.

One issue that comes to mind: that basically freezes the pybind11 internals version for a given Python release. Unless we additionally have a migration mechanism***, we're stuck with it for 5 years (EOL for that Python release).

***If we have an additional migration mechanism, there isn't much gained by tying to Python releases.

High-level impression: It'll only help a little bit. A momentary free lunch at Python release time, only benefiting that Python release and future releases 1, 2, ... years later.

EthanSteinberg · 2022-11-17T21:15:55Z

We would of course add an option to force use the latest ABI for users like Google that do control everything.

rwgk · 2022-11-17T22:23:08Z

We would of course add an option to force use the latest ABI for users like Google that do control everything.

Two days ago (no kidding) I learned the hard way that that does not really work:

Since 2021-09-20 Google is running with internals version 5.
When the mujoco project is exported from the Google mono repo to https://github.com/deepmind/mujoco, @saran-t undoes my patch, to put it back to internals version 4, when building binaries for testing (I'm not sure Saran: are those uploaded to GitHub?).
Saran also ships a cmake file with the hash of the smart_holder branch that matches what we are using internally (modulo the internals version patch).

Stepping back:

Now imagine I put the native_enum feature I'm working on under PR WIP: Conversions between Python's native (stdlib) enum and C++ enums. #4329 behind an internals bump: people can happily use native_enums internally, but it will not be available externally.
Unless I bump the internals version on the smart_holder branch directly, instead of using a Google-internal patch.
But then smart_holder-based extensions will no longer interoperate with "normal" pybind11 extensions.

I think it will be hell, every day a little bit more, as both the use of pybind11 in general, and the smart_holder branch (exported Google projects), grow.

Going to "rolling internals" ...

removed - deprecated - current

would allow us to manage change, rather than hitting walls all the time.

Concretely:

internals == v4 (forever / until removed)

internals_like_a_v1, 2, 3 ...
internals_like_b_v1, 2, 3 ...

requires a little bookkeeping (in our minds & documentation), and keeping compatibility code around for a while.

But it will work much better for the world at large. — IOW a moderate amount of extra work for a handful of people working on pybind11 code (but also more flexibility!), happiness for hundreds or thousands of other projects.

What is the minimum version for any of the internals-like structures could be tied to the Python version. Actually, this could work out very nicely: when we drop a Python version, we know we can drop all deprecated internals-like structures that are only used with that Python version.

EthanSteinberg · 2022-11-18T00:20:35Z

Yeah, that sounds good to me. The key part is letting us bump the internals version with every Python release.

* Illustrate bug in functional.h * style: pre-commit fixes * Make functional casting more robust / add workaround * Make function_record* casting even more robust * See if this fixes PyPy issue * It still fails on PyPy sadly * Do not make new CTOR just yet * Fix test * Add name to ensure correctness * style: pre-commit fixes * Clean up tests + remove ifdef guards * Add comments * Improve comments, error handling, and safety * Fix compile error * Fix magic logic * Extract helper function * Fix func signature * move to local internals * style: pre-commit fixes * Switch to simpler design * style: pre-commit fixes * Move to function_record * style: pre-commit fixes * Switch to internals, update tests and docs * Fix lint * Oops, forgot to resolve last comment * Fix typo * Update in response to comments * Implement suggestion to improve test * Update comment * Simple fixes Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Aaron Gokaslan <[email protected]>

rwgk · 2022-11-21T12:21:24Z

Yeah, that sounds good to me. The key part is letting us bump the internals version with every Python release.

FYI: I folded cross_extension_shared_state into PR #4329. With that it is really easy to handle multiple internals-like shared states.

(That PR is way too big now and will need to be split up, but I'm currently focused on testing the new native_enum in the context I need it in, which could take a while.)

Skylion007 reviewed Oct 19, 2022

View reviewed changes

include/pybind11/functional.h Outdated Show resolved Hide resolved

Skylion007 reviewed Oct 19, 2022

View reviewed changes

include/pybind11/functional.h Outdated Show resolved Hide resolved

Skylion007 reviewed Oct 19, 2022

View reviewed changes

include/pybind11/pytypes.h Outdated Show resolved Hide resolved

Skylion007 reviewed Oct 19, 2022

View reviewed changes

include/pybind11/functional.h Outdated Show resolved Hide resolved

EthanSteinberg requested a review from henryiii as a code owner October 20, 2022 15:47

EthanSteinberg force-pushed the functional_bug branch from 8549ef2 to 460369f Compare October 20, 2022 15:48

EthanSteinberg changed the title ~~Add test to illustrate bug in functional.h~~ Fix function.h bug + introduce test to verify that it is fixed Oct 20, 2022

Skylion007 reviewed Oct 20, 2022

View reviewed changes