Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: invalid shared ptr conversion leads to crash? #4365

Open
3 tasks done
bluescarni opened this issue Nov 28, 2022 · 8 comments
Open
3 tasks done

[BUG]: invalid shared ptr conversion leads to crash? #4365

bluescarni opened this issue Nov 28, 2022 · 8 comments

Comments

@bluescarni
Copy link
Contributor

bluescarni commented Nov 28, 2022

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.10.1

Problem description

We are experiencing a crash in the exposition of a C++ class hierarchy which seems to be related to conversions between std::shared_ptrs within the class hierarchy. The attached minimal code is enough to reproduce the issue. Let me give a brief description.

On the C++ side, we have to base classes, Base0 and Base1, and a derived class Derived which inherits from the two bases. All classes are exposed via pybind11 using std::shared_ptr holder types, and the exposition of Derived signals that Base0 and Base1 are its bases.

Then, we expose a factory function make_object() which internally creates a shared ptr to Derived, converts it to std::shared_ptr< Base0 >, and then returns it. In the implementation of make_object(), we print to screen the memory addresses of the base and derived objects (more on that later).

Finally, we expose a print_object() function that takes in input a pointer to derived and tries to print to screen the size of its vector data member. It also prints to screen the pointer contained in the input shared pointer.

Now for the Python code:

In [1]: import example

In [2]: o = example.make_object(42)
ret der ptr: 0x557ad38c7a30
ret base ptr: 0x557ad38c7a50

In [3]: example.print_object(o)
der ptr: 0x557ad38c7a50
23501544953261

So there's something obviously going wrong here, since:

  • the vector data member definitely does not have 23501544953261 elements, and
  • the pointers seem to be all wrong. In particular, the memory address 0x557ad38c7a50 is a pointer to Base0 in make_object(), but it has magically been converted into a pointer to Derived in print_object(o), even though casting between Base0 and Derived should change the memory address in this hierarchy (as evidenced in the pointer values printed from make_object()).

In order to confirm that something fishy is going on, we can recompile the module with the address sanitizer on:

$ c++  -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) example.cpp -o example$(python3-config --extension-suffix) -O0 -fsanitize=address

Then we need to take care of invoking Python with the address sanitizer preloaded:

$ LD_PRELOAD=/home/yardbird/miniconda3/envs/heyoka_py_devel/lib/libasan.so ipython

(Note: this is from a conda installation, the libasan.so path should be adjusted as needed)

We can the rerun the code:

In [1]: import example

In [2]: o = example.make_object(42)
ret der ptr: 0x606000006c30
ret base ptr: 0x606000006c50

In [3]: example.print_object(o)
der ptr: 0x606000006c50
=================================================================
==13860==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x606000006c60 at pc 0x7fe2caf1dfcd bp 0x7ffe0813de20 sp 0x7ffe0813de18
READ of size 8 at 0x606000006c60 thread T0
    #0 0x7fe2caf1dfcc in std::vector<int, std::allocator<int> >::size() const (/home/yardbird/repos/bug_pybind11/example.cpython-39-x86_64-linux-gnu.so+0x8cfcc)
    #1 0x7fe2caedf9f7 in pybind11_init_example(pybind11::module_&)::{lambda(std::shared_ptr<Derived> const&)#2}::operator()(std::shared_ptr<Derived> const&) const (/home/yardbird/repos/bug_pybind11/example.cpython-39-x86_64-linux-gnu.so+0x4e9f7)
    #2 0x7fe2caee1dae in void pybind11::detail::argument_loader<std::shared_ptr<Derived> const&>::call_impl<void, pybind11_init_example(pybind11::module_&)::{lambda(std::shared_ptr<Derived> const&)#2}&, 0ul, pybind11::detail::void_type>(pybind11_init_example(pybind11::module_&)::{lambda(std::shared_ptr<Derived> const&)#2}&, pybind11::detail::index_sequence<0ul>, pybind11::detail::void_type&&) && (/home/yardbird/repos/bug_pybind11/example.cpython-39-x86_64-linux-gnu.so+0x50dae)
    #3 0x7fe2caee17aa in std::enable_if<std::is_void<void>::value, pybind11::detail::void_type>::type pybind11::detail::argument_loader<std::shared_ptr<Derived> const&>::call<void, pybind11::detail::void_type, pybind11_init_example(pybind11::module_&)::{lambda(std::shared_ptr<Derived> const&)#2}&>(pybind11_init_example(pybind11::module_&)::{lambda(std::shared_ptr<Derived> const&)#2}&) && (/home/yardbird/repos/bug_pybind11/example.cpython-39-x86_64-linux-gnu.so+0x507aa)

Thus it definitely looks like a wrong memory access.

I have tried to delve into the pybind11 source with a debugger to see what might be going on. I can't say I have fully understood the logic, but it seems to me that:

Please let me know if I can assist with more testing/triaging.

Reproducible example code

#include <memory>
#include <vector>
#include <iostream>

#include <pybind11/pybind11.h>

// The first base class.
struct Base0
{
    virtual ~Base0() { }
};

using Base0Ptr = std::shared_ptr< Base0 >;

// The second base class.
struct Base1
{
    virtual ~Base1() { }
    std::vector< int > vec = { 1, 2, 3, 4, 5 };
};

using Base1Ptr = std::shared_ptr< Base1 >;

// The derived class.
struct Derived : Base1, Base0
{
    virtual ~Derived() { }
};

using DerivedPtr = std::shared_ptr< Derived >;

PYBIND11_MODULE(example, m) {
    // Expose the bases.
    pybind11::class_< Base0, Base0Ptr > bs0( m, "Base0" );
    pybind11::class_< Base1, Base1Ptr > bs1( m, "Base1" );
    // Expose the derived class.
    pybind11::class_< Derived, DerivedPtr, Base0, Base1 >( m, "Derived" ).def( pybind11::init<>() );

    // A helper that returns a pointer to base.
    m.def( "make_object",
        []( int value ) -> Base0Ptr
        {
            auto ret_der = std::make_shared< Derived >();
            std::cout << "ret der ptr: " << ret_der.get() << std::endl;
            auto ret = Base0Ptr( ret_der );
            std::cout << "ret base ptr: " << ret.get() << std::endl;
            return ret;
        } );

    // A helper that accepts in input a pointer to derived.
    m.def( "print_object",
        []( const DerivedPtr& object )
        {
            std::cout << "der ptr: " << object.get() << std::endl;
            std::cout << object->vec.size() << std::endl;
        } );
}

Is this a regression? Put the last known working version here if it is.

Not a regression

@bluescarni bluescarni added the triage New bug, unverified label Nov 28, 2022
@bluescarni bluescarni changed the title [BUG]: invalid shared ptr conversion leads to crash [BUG]: invalid shared ptr conversion leads to crash? Nov 28, 2022
@Skylion007 Skylion007 added bug duplicate and removed triage New bug, unverified labels Nov 28, 2022
@Skylion007
Copy link
Collaborator

This is a known bug with multiple virtual bases unfortunately. See #3514

@bluescarni
Copy link
Contributor Author

@Skylion007 thanks for the pointer, I left a comment there.

@rwgk
Copy link
Collaborator

rwgk commented Nov 29, 2022

  • pybind11 forcefully casts via reinterpret_cast a shared_ptr<Base0> to shared_ptr<Derived>, which would be undefined behaviour in any case but works in practice if the pointers to Base0 and Derived coincide (which, in this case, they do NOT).

That's a great analysis that matches what I found some time ago:

I can only look more closely later.

The smart_holder code does not use that reinterpret_cast. Could you maybe try your reproducer with the smart_holder branch?

  • git clone https://github.com/pybind/pybind11.git -b smart_holder
  • Add PYBIND11_SMART_HOLDER_TYPE_CASTERS(Base0), etc. (see README_smart_holder.rst).
  • Replace class_ with classh, also remove Base0Ptr, Base1Ptr, DerivedPtr from the same lines.

Then recompile and run your reproducer again. Does it work?

@bluescarni
Copy link
Contributor Author

@rwgk thanks for the reply! I checked out the smart_holder branch and modified the reproducer to this:

#include <iostream>
#include <memory>
#include <vector>

#include <pybind11/pybind11.h>
#include <pybind11/smart_holder.h>

struct Base0 {
  virtual ~Base0() {}
};

using Base0Ptr = std::shared_ptr<Base0>;

struct Base1 {
  virtual ~Base1() {}
  std::vector<int> vec = {1, 2, 3, 4, 5};
};

using Base1Ptr = std::shared_ptr<Base1>;

struct Derived : Base1, Base0 {
  virtual ~Derived() {}
};

using DerivedPtr = std::shared_ptr<Derived>;

PYBIND11_SMART_HOLDER_TYPE_CASTERS(Base0)
PYBIND11_SMART_HOLDER_TYPE_CASTERS(Base1)
PYBIND11_SMART_HOLDER_TYPE_CASTERS(Derived)

PYBIND11_MODULE(example, m) {
  pybind11::classh<Base0> bs0(m, "Base0");
  pybind11::classh<Base1> bs1(m, "Base1");
  pybind11::classh<Derived, Base0, Base1>(m, "Derived").def(pybind11::init<>());

  m.def("make_object", [](int value) -> Base0Ptr {
    auto ret_der = std::make_shared<Derived>();
    std::cout << "ret der ptr: " << ret_der.get() << std::endl;
    auto ret = Base0Ptr(ret_der);
    std::cout << "ret base ptr: " << ret.get() << std::endl;
    return ret;
  });

  m.def("print_object", [](const DerivedPtr &object) {
    std::cout << "der ptr: " << object.get() << std::endl;
    std::cout << object->vec.size() << std::endl;
  });
}

Unfortunately, the behaviour has not changed and I still see the same problem. Am I doing something wrong?

@rwgk
Copy link
Collaborator

rwgk commented Nov 29, 2022

I need to find a block of time to look at this carefully.

Your reproducer looks valid at first sight.

It would give me a head start if you could transfer it to a pull request:

  • copy what you have to tests/test_mi_debug.cpp
  • also create a minimal tests/test_mi_debug.py (even an empty file will work as a start)
  • Add test_mi_debug to tests/CMakeLists.txt (around line 160, it's really obvious)

Then git push, follow the link shown by git push to create the PR. That will trigger the CI.

@Skylion007
Copy link
Collaborator

@rwgk Here is a reproducer from when I looked at it last: #3701

@bluescarni
Copy link
Contributor Author

@rwgk PR is up at #4374.

@rwgk
Copy link
Collaborator

rwgk commented Dec 1, 2022

@rwgk Here is a reproducer from when I looked at it last: #3701

Thanks for pointing out @Skylion007, this completely fell off my radar. I forgot what I was thinking when I was looking back then, maybe it was that the reproducer is not easy do understand. What's now under #4347 looks like an easier starting point for debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants