Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factory function crashes #526

Closed
nimamox opened this issue Nov 23, 2016 · 24 comments
Closed

Factory function crashes #526

nimamox opened this issue Nov 23, 2016 · 24 comments

Comments

@nimamox
Copy link

nimamox commented Nov 23, 2016

I'm new in using pybind11, so I apologize if my problem is stupid. I had previously written a Python binding for this library called Pismo File Mount, that seems to be broken since I've updated Python and pybind11. The library has a functionint PfmApiFactory (PfmApi** api) which retrieves an instance of the PfmApi interface. I've written a wrapper function to call newPfmApi and return it back to Python.

PfmApi* newPfmApi(){
    PfmApi* pfm = 0;
    PfmApiFactory(&pfm);
    return pfm;
}
m.def("newPfmApi", &newPfmApi);

Now this worked fine, and the previously-built library still works. But now after I compile the code, when I call newPfmApi, I get segfault. I inspected newPfmApi and PfmApiFactory is actually running without error and the instance is created, but upon returning pfm, it causes a segfault. I've also tested other return value policies, but to no avail:

m.def("newPfmApi", &newPfmApi, py::return_value_policy::reference);

The complete binding code is at http://paste.ubuntu.com/23520659/

@jagerman
Copy link
Member

since I've updated Python and pybind11

There may be some bug here, but it would be nice if you could definitely rule out the Python upgrade (e.g. by doing a completely clean build with the new Python, and by checking out the older pybind11 revision where things worked). Is it possible that you have mismatched python headers and library versions?

What you're describing seems okay (whether you want reference or not really depends on whether your factory is creating new objects that you have to destroy yourself, or returning pointers to objects that it manages internally; the former is the default assumption, the latter needs reference).

@wjakob
Copy link
Member

wjakob commented Nov 23, 2016

We'd be happy to take a look at this, but you will need to submit a (much smaller) self-contained example that reproduces the issue (see CONTRIBUTING.md for details). I'll close this ticket until then.

@wjakob wjakob closed this as completed Nov 23, 2016
@nimamox
Copy link
Author

nimamox commented Nov 24, 2016

OK. The smallest example I can come up with:

#include "pybind11/pybind11.h"
#include "pfmapi.h"
#include <iostream>

namespace py = pybind11;
PfmApi* newPfmApi(){
    PfmApi* pfm = 0;
    int err = PfmApiFactory(&pfm);
    if (!err){
        std::cout << "Pismo (version " << pfm->Version() << ") loaded." << std::endl;
    }
    return pfm;
}

PYBIND11_PLUGIN(pismo) {
    py::module m("pismo", "Pismo Bindings for Python");
    
    py::class_<PfmApi>(m, "PfmApi");
    m.def("newPfmApi", &newPfmApi, "");
    return m.ptr();
}

As I'm running macOS and was worried that system Python and the one installed by homebrew may be conflicting, I tested it on a Linux machine:

nima@nima-vm:~/pismo/pfmkit-183/pybinding$ c++ pismo.cpp -fPIC -shared -std=c++11 -I ~/pismo/pybind11-master/include -I ../include -fpermissive `python-config --cflags --ldflags --libs` -o pismo.so
nima@nima-vm:~/pismo/pfmkit-183/pybinding$ python -c "import pismo; pismo.newPfmApi()"
Pismo (version pfm.1.0.0.183) loaded.
Segmentation fault (core dumped)

@nimamox
Copy link
Author

nimamox commented Nov 24, 2016

Made it work eventually! I had a number of older releases for pybind11 on my system. I checked each and the oldest one worked. I'm not sure which version it is though! It doesn't have a _version.py file and setup.py says it's version 1.0, but comparing it to v1.0 tarball I just downloaded suggests they're not identical (e.g. uses PYBIND11_* instead of PYBIND_*). I can upload this version of pybind11 if you like, so you can track the bug. Any thoughts?

@jagerman
Copy link
Member

jagerman commented Nov 24, 2016

Your code is fine: with a proper C++ class, everything there works. This sort of factory creation is a pretty basic feature of pybind11 that is tested under multiple variations in the test code. This definitely looks to be a problem with PfmApi.

The following is just guesswork, but might help you out.

I took a quick look at the online API docs, and I think one possible issue is that PfmApi is an abstract class, but does not have a virtual destructor (but it does have a virtual Release() method which has documentation that seems to do what I'd expect a destructor to do).

Not supporting destruction from a polymorphic base class pointer, besides being fairly bad C++ design, will break pybind11's default holder (std::unique_ptr) because that attempts to delete the allocated instance--but apparently PfmApi isn't designed to be destroyed that way. (This design does not give me a high level of confidence in its C++ code).

You can, temporarily, try this, which will basically just disable deleting of PfmApi objects controlled by pybind (they'll just be leaked instead):

struct PfmDeleter { void operator()(PfmApi *pfm) { /* FIXME: destroy/cleanup pfm */ } };
py::class_<PfmApi, std::unique_ptr<PfmApi, PfmDeleter>>(m, "PfmApi");

If that fixes the segfault, you'll need to figure out what goes in the comment to properly do the cleanup. (It might just be pfm->Release(), but I'm just basing that on the fairly inadequate documentation) so that the C++ object get cleaned up properly when the associated python object does.

@nimamox
Copy link
Author

nimamox commented Nov 24, 2016

Thanks @jagerman. I really appreciate the response. I asked Pismo's main developer to take a look at your comments. Just to be clear, doesn't the fact that pybind11 v1.0 doesn't crash imply that there may be something broken with newer releases of pybind11?

@wjakob
Copy link
Member

wjakob commented Nov 24, 2016

Agreed, this looks like a design issue with the underlying Library rather than a Pybind11 problem -- what version of pybind11 crashes in this context is not relevant.

@nimamox
Copy link
Author

nimamox commented Nov 24, 2016

@jagerman, what if it doesn't fix the segfault? I'm a little confused:

struct PfmDeleter { void operator()(PfmApi *pfm) { std::cout << "PfmDeleter called!" << std::endl; } };
typedef std::unique_ptr<PfmApi, PfmDeleter> PfmApiDel;

PfmApiDel newPfmApiDel(){
    PfmApi* pfm = 0;
    int err = PfmApiFactory(&pfm);
    if (!err)
        std::cout << pfm->Version() << std::endl;
    return PfmApiDel (pfm);
}

PfmApi* newPfmApi(){
    PfmApi* pfm = 0;
    int err = PfmApiFactory(&pfm);
    if (!err)
        std::cout << pfm->Version() << std::endl;
    return pfm;
}

PYBIND11_PLUGIN(pismo) {
    py::module m("pismo", "Pismo Bindings for Python");
    py::class_<PfmApi, PfmApiDel>(m, "PfmApi");
    m.def("newPfmApi", &newPfmApi, "");
    m.def("newPfmApiDel", &newPfmApiDel, "");
    return m.ptr();
}

Using pybind11 v1.0 headers:

nima@macbook$ python -c "import pismo; pismo.newPfmApi(); print 'fini'"
pfm.1.0.0.183
PfmDeleter called!
fini
nima@macbook$ python -c "import pismo; pismo.newPfmApiDel(); print 'fini'"
pfm.1.0.0.183
PfmDeleter called!
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: Unable to convert function return value to a Python type! The signature was
	() -> std::__1::unique_ptr<PfmApi, PfmDeleter>

Using latest pybind11 headers:

nima@macbook$ python -c "import pismo; pismo.newPfmApi(); print 'fini'"
pfm.1.0.0.183
Segmentation fault: 11
nima@macbook$ python -c "import pismo; pismo.newPfmApiDel(); print 'fini'"
pfm.1.0.0.183
Segmentation fault: 11

@jlowe48
Copy link

jlowe48 commented Dec 2, 2016

@jagerman, Regarding your comments:

This definitely looks to be a problem with PfmApi.
... possible issue is that PfmApi is an abstract class, but does not have a virtual destructor (but it does have a virtual Release() method which has documentation that seems to do what I'd expect a destructor to do).
... besides being fairly bad C++ design, will break pybind11's default holder (std::unique_ptr) because that attempts to delete the allocated instance--but apparently PfmApi isn't designed to be destroyed that way. (This design does not give me a high level of confidence in its C++ code).

The factory+release pattern used in the PFM API, is very intentional, provides significant benefits for the PFM implementation and for client code, and is not that unusual. Calling this pattern "bad C++ design" demonstrates some lack of experience on your part, or at least a fairly close-minded view of software engineering. A more useful way to develop confidence in the PFM code is to measure maintainability, stability, portability, and supportability with real world projects, not by subjectively grading the Stroustrup-erificness of the cross-platform native C/C++ PFM API.

@jagerman
Copy link
Member

jagerman commented Dec 2, 2016

This is not the forum for discussion of the interface you choose to provide for your commercial, closed source software.

@nimamox
Copy link
Author

nimamox commented Dec 4, 2016

Regardless of the comments on PfmApi design, can you kindly tell me why it still crashes even when I set a custom deleter for unique_ptr? I would really appreciate it.

@jagerman
Copy link
Member

jagerman commented Dec 4, 2016

@nimamox - at this point I'd suggest running Python under gdb and getting a stack trace for the segfault. Right now your minimal test case involves an entire library whose source code isn't available. (If you could reproduce the segfault without using Pfm at all we would definitely address it). It might also be helpful to store the Python value in a variable to see whether the segfault is happening during construction (you will not see fini) or destruction (you will see fini before the segfault).

@jagerman
Copy link
Member

jagerman commented Dec 4, 2016

Also note that you don't need the newPfmApiDel function: the important part is telling pybind to use it as a holder (the second template argument to py::class_).

@nimamox
Copy link
Author

nimamox commented Dec 4, 2016

As you suspected, segfault occurs during construction. I forgot to mention, but I had also tested the code above with garbage collector being disabled and Python would still crash before printing fini.
Moreover, I went through the commits of pybind11, and found the exact commit, that is d7efa4f, after which the behavior changed and the factory function would crash upon return. Most of changes appear related to casting between C++ and Python types in cast.h. I toyed with functions of interest in cast.h in latest revision, and the exact line which causes the crash is

auto it = internals.registered_types_cpp.find(std::type_index(*type_info));

Any thoughts?

@jagerman
Copy link
Member

jagerman commented Dec 5, 2016

It may have something to do with PfmApi being a pure virtual class; I'll look into this (but I can't do so for at least a day or two).

@dean0x7d
Copy link
Member

dean0x7d commented Dec 5, 2016

The linked commit started using run-time type information: typeid(type_name) vs. typeid(variable_name). The latter looks up RTTI in the variable's vtable which should be located inside the library which defined the polymorphic type. I'm guessing the issue here is that the destination library doesn't support RTTI: it's probably a native C library with a C++ interface (or a native C++ library compiled with -fno-rtti, but C is more likely based on the API). In that case typeid is following the vtable, but it doesn't have any RTTI so it returns a junk pointer -> segfault on deference.

To confirm this, try wrapping the pointer in a simple struct and then use py::class_ with that struct, but not PfmApi* directly:

struct PfmApiWrapper {
    PfmApi* pfm = nullptr;

    PfmApiWrapper() {
        int err = PfmApiFactory(&pfm);
        if (!err)
            std::cout << pfm->Version() << std::endl;
        // you may want to throw on error here
    }
    ~PfmApiWrapper() {
        // release procedure goes here
    }
};

PYBIND11_PLUGIN(pismo) {
    py::module m("pismo", "Pismo Bindings for Python");
    py::class_<PfmApiWrapper>(m, "PfmApi")
        .def(py::init<>());
    return m.ptr();
}

If the following works correctly, then the RTTI lookup was the problem.

>>> import pismo
>>> pismo.PfmApi()

@nimamox
Copy link
Author

nimamox commented Dec 5, 2016

it's probably a native C library with a C++ interface

Thanks. This is exactly the case. The code snippet works correctly.

@jagerman
Copy link
Member

jagerman commented Dec 5, 2016

Ah yes, good observation @dean0x7d, that indeed seems likely here.

@jagerman
Copy link
Member

jagerman commented Dec 5, 2016

@dean0x7d -- I can't seem to find any reference indicating what typeid() gives back when RTTI isn't supported--it would be nice to throw an informative pybind11_fail in that case if it can somehow be detected.

@wjakob
Copy link
Member

wjakob commented Dec 5, 2016

I would have expected this to fail with a linker error. I suspect that this is on the fringes of compiler/linker implementations, so there probably isn't a whole lot we can do in pybind11.

@dean0x7d
Copy link
Member

dean0x7d commented Dec 5, 2016

@jagerman I don't think there's anything that can be detected. Disabling RTTI isn't part of the C++ standard so the typeid() call becomes undefined behavior -- no way around that. The C++ interface for the underlying C library is tricking the C++ compiler into working with the type, but the C library doesn't supply the data needed at runtime.

@jagerman
Copy link
Member

jagerman commented Dec 5, 2016

Yeah, I suppose so.

It's a bit surprising that the code even links properly though--if I try to compile code invoking typeid() on via linking to an .so compiled with -fno-rtti, I get a linking error (as @wjakob expected).

@dean0x7d
Copy link
Member

dean0x7d commented Dec 5, 2016

As far as I can see from the docs, the library is not linked at all, it's loaded at runtime. The precompiled C library emulates a vtable for C++ (which is probably UB right there or at the very least implementation defined behavior).

@jlowe48
Copy link

jlowe48 commented Dec 6, 2016

Currently the PFM API is implemented in a run-time loaded C++ shared library, compiled with RTTI disabled.

The C compatible interface definitions in pfmapi.h are dependent on the underlying platform ABI, different for each platform, but well defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants