-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and fail if using mismatched holders #2644
base: master
Are you sure you want to change the base?
Conversation
This adds a check when registering a class or a function with a holder return that the same wrapped type hasn't been previously seen using a different holder type. This fixes pybind#1138 by detecting the failure; currently attempting to use two different holder types (e.g. a unique_ptr<T> and shared_ptr<T>) in difference places can segfault because we don't have any type safety on the holder instances.
897e3cb
to
b6b5459
Compare
e2aa4ec
to
6360006
Compare
…stered type_info Having replaced the global map holders_seen, we can only check return values against already registered types. Hence, we need to replace the check at initialization time with a check at call time (when casting the return value).
A derived class needs to use the same holder type as its base class(es). So far, the check was constrained to the default holder vs. custom holder types. Thus, we replace the simple check (based on the default_holder flag) with a more elaborate one, comparing holder type names.
The two commits e2aa4ec and 502bf24 realize the idea suggested in #1161 (comment) to store holder types locally in a type's record. Another shortcoming of the current implementation is the checking of holder compatibility for derived types. A derived class needs to use the same holder type as its base class(es). So far, the check was restricted to the default holder vs. a custom holder type: pybind11/include/pybind11/attr.h Line 289 in 06b673a
However, holder compatibility should be ensured in general, also between two custom holder types. Thus, 7852e7d attempts to replace the simple check (based on the default_holder flag) with a more elaborate one, comparing holder type names. Unfortunately, I wasn't able to handle this via compile-time checks as the holder type info is only available as a std::type_info , not as a template type. For this reason, the check is limited to comparing stringified type names. Furthermore, I had to ignore any type details, e.g. deleters (cde5c11), to correctly handle a unit test changing the visibility of the destructor:pybind11/tests/test_smart_ptr.cpp Lines 207 to 212 in 06b673a
Maybe, a cleaner approach would be to implicitly auto-convert between holder types as suggested in #1161 (comment). However, this would add a lot of runtime overhead again. If desired, this can be handled by explicit wrappers for type conversion. |
…stered type_info Improve wording and variable name: - seen_holder_name -> holder_name - seen holder -> declared holder
9576632
to
7c4661e
Compare
…in registered type_info Runtime costs of checking holder compatibility for function return types during can be limited to definition time (instead of calling time), if we require types to be registered before defining a function. This allows to lookup the type_info record and validate the holder type as expected. As we cannot call functions with an unregistered return type anyway, I think this is a good compromise.
7c4661e
to
e57a3db
Compare
My last commit resolves this drawback by introducing the (new) constraint that a function's return type has to be registered before defining the function. As functions with an unregistered return type cannot be called anyway, I think this is a good compromise. I think this is ready for review. Before merging, I will clean up the commit history (fixups), which I kept for now to ease the reviewing. |
…r_type in registered type_info Simplify code using type_id<Return>()
To avoid costly runtime checks at function call time, compatibility of holder types is now checked once at functionn registration time. This assumes that all custom argument types are declared in advance!
As pointed out by @YannickJadoul, the costs of runtime checks at function call time are undesirable. The latest commit moves the check to function declaration time. |
(Sorry I posted this under the wrong PR a minute ago.) Hi @rhaschke , I looked through the changes, and to be totally honest, this PR is currently totally over my head. (I need to learn a lot more than I know at the moment to meaningfully review.) But I tried to use this PR Google-internal. I didn't get past the initial manual testing stage, first running into this error:
I changed the corresponding code to move the class_ definition for proto2::Message up. Trying again I got:
I know these bindings are used in ~100 extensions and pass thousands if not 100s of thousands of tests. Is it plausible that so many tests run successfully with mismatched holders? (Or could there be a problem with the mismatch detection?) I could dig in deeper, but wanted to check with you first. (The affected code is open source in theory, but the open source repo (pybind11_protobuf) is behind and doesn't have a working build system.) |
There's no conversion from a |
Internally we added specialization of I think a generic mechanism to convert holder types would be super useful. I didn't got through the PR in great detail, so apologies if I'm way off the mark here, but could you make a I'm curious, how does this PR handle the case when implicit conversions of the held value are possible? eg, one of the simplest cases would be the conversion from |
Sorry for the delayed answer. I didn't receive emails the last days 😢
Before this PR, pybind11 was extremely sloppy regarding holder type compatibility. It just reinterpreted the When I changed the memory layout of the smart pointer returned from C++ in this unit test, I've got a perfect segfault immediately.
To minimize runtime costs, I explicitly moved the check from call time (when the caster would be called as well) to function definition time. To realize the automatic holder type conversion, @rwgk is aiming for, I think one should replace the new compatibility check and try to inject a wrapper function instead that converts between holder types (if necessary). |
Do you have an idea how this could be implemented? At first sight it looks similar to solving this problem: https://stackoverflow.com/questions/4972795/how-do-i-typecast-with-type-info I can only imagine something like a lookup table from the holders' |
I thought about this a while the last days and came up with the idea of using templates: Let's assume the user declares an auto-converter like this: template <typename IntrinsicType>
struct holder_converter_for {
using holder_t = std::shared_ptr<IntrinsicType>; // Explicitly remember the holder type for the compiler
static holder_t cast(holder_t p) { return p; } // pass-through converter
static holder_t cast(std::unique_ptr<IntrinsicType> p) { return std::move(p); }
} we could simply wrap all cast ops in function calls using this converter's methods. Using a type trait to just define the holder type for a custom type, we could maybe even use this template by default and users would only need to specialize if they want to support other auto conversions. Not sure though, this could really work out. For now, it's just a coarse idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I've made some initial comments.
My recommendation is to defer the type conversion table, and maybe save that for a follow-up PR?
@@ -1607,6 +1604,39 @@ template <typename base, typename holder> struct is_holder_type : | |||
template <typename base, typename deleter> struct is_holder_type<base, std::unique_ptr<base, deleter>> : | |||
std::true_type {}; | |||
|
|||
template <typename holder> using is_holder = any_of< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit To me, it looks like holder
is really holder_caster
. Should this be is_holder_caster
?
struct HeldByUnique {}; | ||
// HeldByShared declared with shared_ptr holder, but used with unique_ptr later | ||
py::class_<HeldByShared, std::shared_ptr<HeldByShared>>(m, "HeldByShared"); | ||
m.def("register_mismatch_return", [](py::module m) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My brief reading of this PR leads me to believe that there's a sharp edge: a user could still hit a segfault if they call py::cast<mismatched_holder_type>(my_object)
. Totes get the idea of wanting to shift overhead from py::cast<>()
(runtime) to function declarations (binding-time, e.g. .def()
), but it now means there are now distinct entry points to this type-checking that users can trigger :(
Possible resolutions:
- if it's worth the risk, then just mention it in the holder / smart pointer docs (as a diff in this PR)
- If it's not worth the risk, somehow enable raw
py::cast<>()
to do a runtime check. This may make for some crazy dumb plumbing, though :( - Alternatively, eat the cost of runtime casts and funnel it through
type_caster<>
. That's what we do for our fork (RobotLocomotion/pybind11
), and we haven't yet had to point fingers at performance there (though our use case may be simpler).
@YannickJadoul or @rwgk Any chance y'all have a good (and mebbe easy?) timing performance benchmark for this, to see what the risk is in these terms?
EDIT: Hm... for docs/benchmark.py, it's only for compilation time and size, and doesn't really dip into the more nuanced things (e.g. inheritance, custom type casters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EricCousineau-TRI wrote:
Any chance y'all have a good (and mebbe easy?) timing performance benchmark for this, to see what the risk is in these terms?
The TensorFlow core team has very sophisticated pybind11 benchmarks, but it's currently Google-internal only. The author already gave me permission to extract most of it for external view, including sources, but it may take me a few days (I want to show what I extract to the author for approval).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!!! That'd be awesome! I'll see if I can gather some common "complex" patterns in Drake to see if I can get some patterns we have (but maintain feature-parity with upstream).
Have you thought any about where your benchmarks might live? I think the ones I'd generate would be simple enough (i.e. just pybind11
bits) to be in a benchmarking subdirectory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explicitly converted my previous code doing compatibility checks at cast time to this code, doing these tests at definition time only, because of efficiency concerns raised elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the principle of trading 1 bit for 32 bits and to be able to make these checks.
The main thing bothering me is this string comparison. I feel a bit bad for saying this, because I don't have a great alternative, but it feels very brittle. For example, what about non-templated holder types for a certain type, which doesn't have this nice Holder<Type>
name? (Maybe this is not a case that should be considered valid, though, but still).
One alternative would be to require a constexpr auto generic_holder_typeid;
field (or similar) in the holder_helper
struct, which would give a single typeid for all holders? E.g. for unique_ptr<T>
and shared_ptr<T>
it could just be unique_ptr<void>
and shared_ptr<void>
? (unique_ptr<void>
can't be instantiated but still has a typeid
: https://godbolt.org/z/1179xG) Alternatively, we use a dummy type instead of void
, but the main point is that all holder_helper<Holder<T>>::generic_holder_typeid
could map to the same unique std::type_info
.
I'm not sure how easy this would be, but I think we could find sensible defaults. Happy to do some work and try this out myself, if you think this might work?
/* true if this is a type registered with py::module_local */ | ||
bool module_local : 1; | ||
}; | ||
|
||
/// Tracks the `internals` and `type_info` ABI version independent of the main library version | ||
#define PYBIND11_INTERNALS_VERSION 4 | ||
#define PYBIND11_INTERNALS_VERSION 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks ABI, so should ideally be part of a batch of ABI-breaking PRs.
@@ -1607,6 +1604,39 @@ template <typename base, typename holder> struct is_holder_type : | |||
template <typename base, typename deleter> struct is_holder_type<base, std::unique_ptr<base, deleter>> : | |||
std::true_type {}; | |||
|
|||
template <typename holder> using is_holder = any_of< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already exists in cast.h
, it seems?
// PYBIND11_DECLARE_HOLDER_TYPE holder types:
template <typename base, typename holder> struct is_holder_type :
std::is_base_of<detail::type_caster_holder<base, holder>, detail::type_caster<holder>> {};
// Specialization for always-supported unique_ptr holders:
template <typename base, typename deleter> struct is_holder_type<base, std::unique_ptr<base, deleter>> :
std::true_type {};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I missed that. The existing definition should serve my purpose.
For the record: this is something that does need to pass @wjakob, once we have something we're proud of presenting. But @jagerman's old PR would probably be a good argument on this being a longstanding issue :-) |
It's 8bits vs 32bits actually 😉
Aren't holder types always templated? But, I understand your hesitation.
Good idea! A remaining open issue (with both approaches) might be extra optional template arguments, e.g. the deleter of pybind11/tests/test_smart_ptr.cpp Lines 207 to 208 in b491b46
|
A
Yes, probably, but if you insist, I can make sure I set up some kind of holder where that doesn't work. But you're right it wouldn't be a frequent scenario, and maybe not even officially supported (the docs don't really mention it as possible or not-possible) ;-
Hmmm, does that mean I can't return/pass a So yes (and actually,
Given your previous rebuttal on the string comparison, we'd need to test if this is worth the extra interface, though? And yes, the deleter is hairy. I'm not sure what to think of that. Would this be something to explicitly require users to opt-in to (saying, "yes, it's fine that these holder types are different, I take responsibility") ? |
I converted this to Draft and removed the needs review label. |
This is a rebase of #1161 augmented with a fix to handle mismatching holder types in function arguments and derived classes as well. All checks are performed at declaration time only. No runtime costs when calling functions!