-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix functional.h bug + introduce test to verify that it is fixed #4254
Conversation
Oof, @rwgk got any Python experts to ping about this? |
Did you already try to do something like this?
|
@rwgk @lalaland @henryiii I added workaround, but unclear if I should keep the new pytype ctor with it's current argument ordering. It might be the best way to disambiguate whether the dtor func takes the capsule itself or the underlying void value. Alternatively, we may want to create a custom tag long term. |
@Skylion007 I don't think it would be an abi break to set a name and check it in functional.h This is an optimization path, so if we incorrectly ignore a capsule, that's ok as it will just run slower. |
Okay @Lalaland/@rwgk, whenever I try to set_name on the rec_capsules I am getting segfaults (likely from the lifetime of the name string???). We could try to strdup the cstr in the relevant bindings and ensure to free it on our dtor, but that seems error prone. This is super weird as I never call free it on and I set it a global constexpr like `constexpr const char *FUNCTION_RECORD_CAPSULE_NAME = "pybind11_function_record" just set_name with either the method or the ctor so we shouldn't have to mange the lifetime at all since it's constexpr. The current iteration at least ensures we won't try to convert pointers for named_capsules, but it would be great if we could give func_record_capsules their own name without segfaulting. Unfortunately, I do not have time to look into this further. |
Sorry for the precommit noise(can't get pre-commit to run on my machine right now), but that should fix the problem and assign a name. |
8549ef2
to
460369f
Compare
@lalaland One scenario I am worried about. Suppose we have multiple different translation units, is the address guaranteed to be the same across them? I know particularly it will be with gcc/llvm, but is this guaranteed or just their default behavior? Particularly with link time optimized turned off? I am worried where splitting up a large translation unit could accidentally cause a performance regression in this case which is unexpected behavior. Part of the issue is that we do not have a regression test for this optimization. Thoughts on this behavior @rwgk? Should we really avoid an std::strcmp here and do this pointer black magic? |
Actually, my pointer black magic logic is/was incorrect. I just pushed a fixed version.
|
How did you test this? Do you have a URL link to why this works and the const char pointer did not? I also just extracted out a helper function to make the code a bit more maintainable. |
Per the documentation of local_internals (which uses the same trick), this should work. Each .so should get their own copy. pybind11/include/pybind11/detail/internals.h Line 471 in f743bdf
|
Actually, let's just put this key in local_internals. So we only have this hack implemented in one location. |
Ok, I moved this to local_internals, and bumped the ABI version flag.
Per the C++ standard, in order to ensure that you have a unique pointer you need to actually allocate memory. See the text in https://stackoverflow.com/questions/50610274/what-is-the-rationale-behind-returning-unique-addresses-for-allocations-of-zero
You still need it in local internals to avoid ODR violations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lalaland It really should be in internals, not local internals. Callbacks are frequently and often shared between modules so this will make performance worse.
1068a80
to
9dd5de5
Compare
Also putting this in internals does not solve the issue I have with this, mainly that the check is too strict and overly brittle. Any minor change will cause issues. Instead, we could even have a function_record_specific ABI and store that in the function_name string itself. No reason to rely on all this pointer stuff and it would be a lot less brittle. We can manually increment the rec_capsule name when we change the function record too. |
Yeah, this is too smart by half. Simply having a regular name that we strcmp against with a manual version check would be a lot easier to understand. It's a tiny bit more error-prone, but that should be OK. I'll swap it out for that. |
7ae479e
to
58a4315
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thanks a lot for finding & fixing this bug!
The one CI failure is a notorious flake again. Good to ignore. I locally retested with all sanitizers (ASAN, MSAN, UBSAN, TSAN), everything passes (this PR rebased on the smart_holder branch). I'll also redo the global testing (easy, but will take a few hours). |
Global testing passed again (internal ID OCL:482903768:BASE:485514840:1667371815628:a4b009b9). Note that the global testing now includes running all pybind11 unit tests with all sanitizers (recent change). |
Thanks again Ethan for the fix, and Aaron for the reviews! |
Hi @lalaland, based on what I learned & implemented under PR #4329, I'm wondering if we could improve on this PR, by not making any changes to the What I have in mind would be similar to this commit: 8737bea In case the hash changes in the future (b/o rebase), this is the commit message:
What do you think? The current implementation under PR #4329 has a lot of code duplication, but we could work on that pretty easily I think. |
@rwgk If I understand your suggested change correctly, it looks like you are proposing to split our global variable "internals" into multiple "internals-like" variables? (When I say "internals-like" I mean a global variable that is shared across modules and thus needs to be ABI stable). The main advantage of that approach is that it would give us a bit more flexibility, but I think the maintenance costs would be really really bad. Right now ABI issues are theoretically straightforward. There is one shared struct "internals" that has all the stuff that has to be ABI stable with one ABI version. Code on different ABI versions just doesn't talk to each other, keeping things safe. With your change there would be many "internals" structs that all have to have ABI stability with a very confusing mix of versions. It would be a lot harder to understand what code under what ABI version is talking to what other code. If you think we need that flexibility, go for it, but we should be really aware of the price we are paying for that flexibility. |
I totally agree, but ... the alternative is far worse, and it's connected to how widely pybind11 is in use: the moment we bump the internals version, we're essentially cutting off interoperability with all existing wheels, silently. There will suddenly be Say, there are 10 extensions — in wheels — with cross-extension type conversion dependencies, in the general sense that's a graph with 10 nodes. Now one of them is rebuilt with a new Now go from 10 to 20 to 100. In the Google-internal world there is no problem: I can switch everybody with a single commit. Externally, waiting for 10 wheels to get updated takes 10 weeks, 20 10 months, 100 takes years. I think the only way to mitigate that is to
|
@rwgk Why don't we tie our internals version to the Python version? And only release new internals when a new Python version gets released? Python already has an unstable ABI. If people are recompiling anyways, might as well force them to use a newer version of pybind11's ABI. pybind11 will always assume that there is a 1-1 map between Python version and pybind11 ABI. |
Tying our internals version to the Python version will make upgrading the internals version much much easier, as we simply wait for the next Python release and people are already being forced to recompile the world. |
Interesting idea. I need to think about this more. One issue that comes to mind: that basically freezes the pybind11 ***If we have an additional migration mechanism, there isn't much gained by tying to Python releases. High-level impression: It'll only help a little bit. A momentary free lunch at Python release time, only benefiting that Python release and future releases 1, 2, ... years later. |
We would of course add an option to force use the latest ABI for users like Google that do control everything. |
Two days ago (no kidding) I learned the hard way that that does not really work:
Stepping back:
I think it will be hell, every day a little bit more, as both the use of pybind11 in general, and the smart_holder branch (exported Google projects), grow. Going to "rolling internals" ... removed - deprecated - current would allow us to manage change, rather than hitting walls all the time. Concretely: internals == v4 (forever / until removed) internals_like_a_v1, 2, 3 ... requires a little bookkeeping (in our minds & documentation), and keeping compatibility code around for a while. But it will work much better for the world at large. — IOW a moderate amount of extra work for a handful of people working on pybind11 code (but also more flexibility!), happiness for hundreds or thousands of other projects. What is the minimum version for any of the internals-like structures could be tied to the Python version. Actually, this could work out very nicely: when we drop a Python version, we know we can drop all deprecated internals-like structures that are only used with that Python version. |
Yeah, that sounds good to me. The key part is letting us bump the internals version with every Python release. |
* Illustrate bug in functional.h * style: pre-commit fixes * Make functional casting more robust / add workaround * Make function_record* casting even more robust * See if this fixes PyPy issue * It still fails on PyPy sadly * Do not make new CTOR just yet * Fix test * Add name to ensure correctness * style: pre-commit fixes * Clean up tests + remove ifdef guards * Add comments * Improve comments, error handling, and safety * Fix compile error * Fix magic logic * Extract helper function * Fix func signature * move to local internals * style: pre-commit fixes * Switch to simpler design * style: pre-commit fixes * Move to function_record * style: pre-commit fixes * Switch to internals, update tests and docs * Fix lint * Oops, forgot to resolve last comment * Fix typo * Update in response to comments * Implement suggestion to improve test * Update comment * Simple fixes Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Aaron Gokaslan <[email protected]>
FYI: I folded (That PR is way too big now and will need to be split up, but I'm currently focused on testing the new |
Description
Functional.h is currently buggy when working with manually defined CPython functions. You get segfaults whenever you try to convert a manually defined CPython function to a std::function.
This PR adds a test that illustrates the bug as well as some #ifdef guards around the broken code in functional.h
https://github.com/pybind/pybind11/pull/4254/files#diff-89e29c03817f899ff72bdb90772cb781df9c82044ae712fb68487be02dd7e3cdL49-R72 in particular is the cause of the bug.
As that code path is just an optimization, it can be removed without breaking other pieces of code. However, it will be a significant performance regression.
Suggested changelog entry: