Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create a numpy array of objects? #647

Closed
mwoehlke-kitware opened this issue Feb 6, 2017 · 18 comments
Closed

How to create a numpy array of objects? #647

mwoehlke-kitware opened this issue Feb 6, 2017 · 18 comments

Comments

@mwoehlke-kitware
Copy link
Contributor

(Is there a mailing list or better place for questions? I couldn't find one...)

I'm trying to wrap an API that returns Eigen vectors/matrices of stuff like "symbols" (for symbolic algebra). I think I have wrapping of the actual value types working, but trying to convert them to numpy objects is harder, because they aren't POD data; basically I need a numpy array of objects.

I think I have a partly working solution, adapted from hacking together bits of eigen.h, numpy.h and stl.h:

    // in type_caster<MyType>::cast
    array a(
        npy_format_descriptor<Scalar>::dtype(), // type
        { (size_t) src.size() }                 // shape
    );
    for (size_t i = 0; i < src.size(); ++i) {
        auto value_ = reinterpret_steal<object>(value_conv::cast(src(i), policy, parent));
        if (!value_)
            return handle();
        auto p = PyArray_GETPTR1(a.ptr(), i);
        PyArray_SETITEM(a.ptr(), p, value_.release().ptr()); // steals a reference
    }
    return a.release();

...however, I don't understand the way that the numpy C API is called. Apparently there is some dynamic function loading magic such that no numpy headers are ever actually included, but the "functions" I need to call for this to work (PyArray_GETPTR1, PyArray_SETITEM) are actually macros. What's the right way in pybind11 to call these? Is there some other mechanism I don't know to fill the items in the newly created numpy array? Or is there some other approach than creating the numpy array and then filling it?

For performance reasons, and because this needs to work with matrices also, I really want to go directly to numpy without an intermediate Python list.

(Note: the dtype is PyArray_DescrFromType_(NPY_OBJECT_), which I believe is correct? Also, this is the first time I've ever tried to work with numpy; please go easy on my lack of knowledge / experience.)

@aldanor
Copy link
Member

aldanor commented Feb 6, 2017

Do you really need np.array of objects, just wondering what's the use case? NPY_OBJECT is a really nasty dtype which doesn't behave like all other dtypes, in many different ways, so we don't explicitly support it. If it's an option, it's always better to use plain lists or something of the sort for dynamic objects.

That being said, you can always get the .ptr() and include numpy.h yourself to get access to numpy macros if you need that; pybind exposes some numpy functionality (non-macros) through capsule API, but it's quite limited.

@jagerman
Copy link
Member

jagerman commented Feb 6, 2017

As for PyArray_GETPTR1(obj, i) I believe a.mutable_data(i) is a drop-in replacement (and PyArray_GETPTR2(obj, i, j) -> a.mutable_data(i, j), etc.). But yeah, I would think a list of lists would be a better approach here, especially because the C++ -> Python conversion is never going to be possible without conversion/object wrapping (i.e. it is impossible to make numpy directly reference the Eigen-stored C++ objects).

@mwoehlke-kitware
Copy link
Contributor Author

Do you really need np.array of objects, just wondering what's the use case?

As I understand it, yes (I've been tasked to do this, but I'm not the end user; @rdeits might have a more definitive answer). The use case is to be able to do vector/matrix operations on the data. Our current work-around is to call np.array([x[i] for i in range(len(x))]) on the Python side with the Eigen container wrapped as an opaque type, which as I understand is accomplishing the same end result.

That being said, you can always [...] include numpy.h yourself

Okay; that seems like it may be the best solution; I wasn't sure because currently it seems pybind11 never includes any Python (or at least numpy) headers itself, so I wasn't sure if that was "not kosher" or something...

As for PyArray_GETPTR1(obj, i) I believe a.mutable_data(i) is a drop-in replacement

That looks plausible (granting that I'm not an expert on the bowels of either pybind11 nor numpy); thanks!

@rdeits
Copy link

rdeits commented Feb 6, 2017

There are a couple of cases in which having an ndarray of objects is actually pretty useful. The particular case I'm working on right now is a symbolic Variable type that's used in the optimization tools in https://github.com/RobotLocomotion/drake . Having a numpy array of Variables (rather than a list) means that I can do matrix/vector math on those variables (for example, multiplying a matrix of coefficients by a vector of variables to get a vector of linear expressions). Of course, that's possible with lists, but it requires re-implementing matrix multiplication yourself, which seems like a waste.

I've used the gurobi python interface in the past, and while it's quite nice, the fact that it returns its variables in a non-numpy container type means a lot of annoying re-implementation of basic things like dot(). I'd love to avoid that by leaning on numpy's built-in algorithms.

Of course, as @mwoehlke-kitware said, the manual conversion to np.array in python is a workable solution if automatic type conversion proves to be a mess.

@mwoehlke-kitware
Copy link
Contributor Author

I wasn't sure because currently it seems pybind11 never includes any Python (or at least numpy) headers itself, so I wasn't sure if that was "not kosher" or something...

Maybe I should clarify my intent. I want to teach pybind11 how to apply its magic Eigen↔NumPy wrapping to not-so-scalar types. I don't plan that this would be 100% automatic, but that the user would write something like PYBIND11_NUMPY_<something>(MyNotSoScalarType) in order to enable the magic. Particularly, this PYBIND11_NUMPY_<something> macro (naming suggestions welcome!) would specialize npy_format_descriptor for the specified type, and I would add additional code to eigen.h to handle such types (e.g. the above snippet). That is, I'd like to make this available to everyone.

@jagerman
Copy link
Member

jagerman commented Feb 6, 2017

Take a look at PR #610, if you haven't noticed it already; it pretty much rewrites the Eigen support. It doesn't particularly help in your case (the main point is to let numpy and eigen reference each others' data, which is impossible here), but will save either you or me some major merge conflicts down the line.

@dean0x7d
Copy link
Member

dean0x7d commented Feb 7, 2017

If I understood everything correctly, I think there's a very simple solution here:

py::array convert(const EigenVectorWithCustomScalar& v) {
    auto l = py::list();
    for (int i = 0; i < v.size(); ++i) {
        l.append(v[i]);
    }
    return py::array(l);
}

The last line does the expected conversion from a Python list of objects to a numpy array of objects (just like np.array(some_list) does in Python). There's an intermediate list created here so it may not be optimal, but it's super easy to get started (the objects themselves aren't going to be copied from list to array (only ++refcount), so there's only the cost of creating the list). For matrices, it would be the same thing, but just create a list of list and pass it to py::array.

@mwoehlke-kitware
Copy link
Contributor Author

Take a look at PR #610, if you haven't noticed it already

"Noticed" is about the right word 😄... but thanks for the reminder anyway. I'll take another look when I get that far; right now I'm just trying to get this working locally.

Since I need to convert between a NumPy array of PyObject*s and an Eigen matrix of C++ objects, I'm not sure if there is any hope of data sharing? Certainly there is not for an object originating in Python; otherwise, is it possible that the PyObject*s are reference wrappers over the objects in the Eigen matrix?

I think there's a very simple solution here [...]

I'm not in love with creating a bunch of intermediary lists, especially when I do have this working with populating the NumPy array directly.

@jagerman
Copy link
Member

jagerman commented Feb 7, 2017

I'm not sure if there is any hope of data sharing?

Assuming T is a pybind-registered type, it should be entirely possible to cast the pointer using return_value_policy::reference to get a Python object that wraps (but doesn't copy) the pointed-at object. That, plus your numpy handling, will get you a numpy array of objects that reference the underlying instances in the Eigen matrix. (It is absolutely necessary to keep that Eigen matrix alive as long as the numpy array lives, however!)

Going the other way doesn't seem possible, as far as I can see.

@aldanor
Copy link
Member

aldanor commented Feb 7, 2017

It is absolutely necessary to keep that Eigen matrix alive as long as the numpy array lives, however!

You can do that by setting the owner explicitly on the array, no?

@dean0x7d
Copy link
Member

dean0x7d commented Feb 7, 2017

I'm not in love with creating a bunch of intermediary lists, especially when I do have this working with populating the NumPy array directly.

Sure, makes sense. I just thought I'd offer an alternative which can work in a pinch without additional numpy headers.

@wjakob
Copy link
Member

wjakob commented Feb 14, 2017

I assume that this is resolved and will close the ticket -- feel free to comment in case it isn't.

@wjakob wjakob closed this as completed Feb 14, 2017
@mwoehlke-kitware
Copy link
Contributor Author

Does closing this imply you will not accept patches to upstream this functionality? I'm waiting on #610, but my intent was to refactor after that lands and then try to get this feature upstreamed, which would close this issue in the process.

To the extent that the issue is implicitly "...in stock pybind11", I don't think it should be closed unless it is fixed in upstream or is "wontfix".

@wjakob
Copy link
Member

wjakob commented Feb 14, 2017

No, that was definitely not my intent. It seemed to me that the discussion had converged with a number of alternative proposals/workaround. If your plan is still to submit a PR for this functionality, then we'll definitely take a good look at it. (though I generally can't promise to accept things before they've gone through a code review)

@wjakob wjakob reopened this Feb 14, 2017
@jagerman
Copy link
Member

I've only known @wjakob to not accept something if it was overly intrusive or of very marginal value: this seems neither. But I think closing this issue makes sense as a housekeeping measure: it's going to require a PR (where further discussion can happen).

@mwoehlke-kitware
Copy link
Contributor Author

I generally can't promise to accept things before they've gone through a code review

I wouldn't expect you to 😄.

I think closing this issue makes sense as a housekeeping measure

Well, obviously you are free to manage your project as you like. I usually leave feature request issues open until a PR provides the feature, and then the PR will close it. I think that's common practice, but...

@jagerman
Copy link
Member

I wish github had a convert-this-bug-into-a-PR feature, to avoid that sort of thing, but sadly it does not.

@wjakob
Copy link
Member

wjakob commented Mar 22, 2017

I'm closing this as a housekeeping measure, since this is not a bug. Further discussion can happen on a separate PR if @mwoehlke-kitware would like to contribute this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants