Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pydrake: Enable automated binding generation #7889

Closed
EricCousineau-TRI opened this issue Jan 28, 2018 · 33 comments
Closed

pydrake: Enable automated binding generation #7889

EricCousineau-TRI opened this issue Jan 28, 2018 · 33 comments
Assignees

Comments

@EricCousineau-TRI
Copy link
Contributor

EricCousineau-TRI commented Jan 28, 2018

Motivation

Long story short, it's tedious to write bindings by hand. We have the current guidance, but even that is not comprehensive:
https://drake.mit.edu/doxygen_cxx/group__python__bindings.html

We have our bindings, they work, but they're a bit burdensome to write and update (and deprecated/delete).
We would like to eventually have a solution where Drake (and other) devs don't even have to think about their bindings. In the interim, though, it'd be nice to save thought for the more creative/critical aspects ("this doesn't fit my usage" or "I should add a keep alive here").

To that end, we want to be able to bootstrap (but no complete replace) manual writing of bindings. Eventually, we'd like these bindings to be completely automatic.

Design, Etc.

See Kitware-TRI shared Google Docs.

Alternatives

See options listed in "Alternative Comparison":

https://gitlab.kitware.com/autopybind11/autopybind11/-/blob/master/README.rst
(Permalink: https://gitlab.kitware.com/autopybind11/autopybind11/-/blob/0376168b2f40e5a3378ea09e90df95470c26d194/README.rst)


OLD:

Some possible alternatives:

  • Keep with current method. At present, Python bindings are written manually using a fork of pybind11.
    • Pros:
      • Plays nicely with our templating approaches, since we explicitly and directly bind things.
      • Our fork teaches pybind11 to: (a) handle AutoDiffXd and Symbolic matrixes, (b) handle unique_ptrs as arguments, (c) prevent Python derived class slicing,
    • Cons:
      • Requires a custom fork.
      • Slow compilation time?
      • Runtime overhead?
      • Extensibility?
  • Use binder to auto-generate pybind11 bindings
    • May need to teach it our own flavor of bindings, though.
    • Pros
      • Automated
      • Same Pros as pybind11.
    • Cons
      • Interface files may be only slightly less verbose than code (and less flexible; e.g. for-loops).
      • Overhead in compiling LLVM for the tools plugin.
      • Harder to explicitly bind
      • Same Cons as pybind11.
  • Use clif. Similar approach as binder: Uses LLVM for parsing, has interface files, etc.
    • Pros
      • Similar Pros to binder
      • Goals is for general IDL.
      • Supports unique_ptr as input and output
      • Supports inheritance and overriding virtual functions
    • Cons
      • Similar Cons to binder
      • Goals is for general IDL.
      • Does not mention support for shared_ptr
      • Does mention caveats for inheritance (does it prevent slicing?)
      • No out-of-the-box support for Eigen <-> NumPy. (Mentioned as a possibility for user extension.)
      • Extensibility?
  • Use a custom parsing solution, like gtSAM's approach with Cython.

At present, my goal with pybind11 is to provide a Pythonic view into the C++ implementation, not necessarily a pure Pythonic wrapping of Drake. Anything wanting to be purely Pythonic could wrap this implementation.

EDIT: Keywords 'cause GitHub's search functionality is a bit stiff:
automatic automate pyclif binder parsing generate generated

@jamiesnape
Copy link
Contributor

For reference #4486. "Requires a custom fork" is really the only real downside I see for pybind11, though it is a big one. Otherwise, I think we are still in good shape.

Looking at the evolution of the Bazel internals, obviously there is going to be some good integration with CLIF at some point, but it is such an immature project externally to Google.

@EricCousineau-TRI
Copy link
Contributor Author

Re-scoping this issue to focus on automated binding generation.

@EricCousineau-TRI EricCousineau-TRI changed the title Python Bindings: Consider alternatives pydrake: Enable automated binding generation Dec 8, 2018
@EricCousineau-TRI
Copy link
Contributor Author

FYI @thduynguyen

@EricCousineau-TRI
Copy link
Contributor Author

Per f2f: I would like to have a Kitware go-to person for pybind, and this would be their highest priority item. Will think about how to codify this into this issue (or a separate issue).

Will need to discuss this more in April (new contract).

@EricCousineau-TRI
Copy link
Contributor Author

\cc @m-chaturvedi

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Jun 6, 2019

Per discussions in this Slack thread:
https://drakedevelopers.slack.com/archives/C3YB3EV5W/p1559820276011400

@ggould-tri (and @jwnimmer-tri a while back) suggested having intermediate "starting point" bindings generated. This could be done with our current use of Python libclang in mkdoc, albeit not super robust, as it may lose information.

Subsequent steps could leverage gccxml CastXML or some components of cppyy (cling, reflex, etc.)

(For cppyy, the old PyPy-centric docs indicate that it uses gccxml with a plan to use cling, while the top-level summary of its own project indicates that it is now using cling).

@jamiesnape
Copy link
Contributor

FYI The Kitware tool of choice is CastXML which used for the bindings of the (template-heavy) ITK.

@EricCousineau-TRI
Copy link
Contributor Author

@jamiesnape mentioned that he could whip together a proof-of-concept for us.

@EricCousineau-TRI
Copy link
Contributor Author

I'll write up a wish list / loose "requirements doc" for this feature, e.g. features to preserve (possibly in spirals), features to add.

@EricCousineau-TRI
Copy link
Contributor Author

As part of this, we should make sure we spend some time visiting each of the solutions listed above, and make sure we include SWIG in that evaluation.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented May 26, 2020

Just as an FYI:
Had a chat with @rwgk a few days ago; he mentioned that he may pick up work again on PyCLIF to have it use pybind11, possibly switch over to tree-sitter if it's faster / lighterweight for AST parsing / C++ introspection than clang, and is considering using pytype as a Pythonic way of specifying a C++ interface (rather than *.pyi files).

@jamiesnape
Copy link
Contributor

What features in particular do you like in CLIF? Obviously, I am biased, but writing out all detail in the .clif files (or their replacement) is a little more work for the developer than I would like, though obviously more control.

@EricCousineau-TRI
Copy link
Contributor Author

It's not that I particularly like CLIF in it's current incantation (for the same reason as you mentioned - indirection via interface files). But wanted to record the convo and some related technologies.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Jun 21, 2020

Also listed via recent commit to pybind11 documentation:
https://github.com/StatisKit/AutoWIG/blob/e68f0bd7/src/py/autowig/pybind11_generator.py

Not a lot of documentation, but will see how LLVM stuff gets interfaced (e.g. IR like XML as gccxml does, or directly via something else).

EDIT: They use libclang, so may have questionable coverage, but I do like that it's all Python.
https://github.com/StatisKit/AutoWIG/blob/e68f0bd7/src/py/autowig/libclang_parser.py

EDIT 2: Example posted here:
https://github.com/StatisKit/FP17

@EricCousineau-TRI
Copy link
Contributor Author

Yet another binding project:
https://github.com/robotpy/robotpy-build

Came across in this pybind11 Gitter discussion:
https://gitter.im/pybind/Lobby?at=5ed1bc733ffa6106f1e28201

@rwgk
Copy link

rwgk commented Jul 18, 2020

Hi @EricCousineau-TRI autopybind11 looks more similar to PyCLIF than I realized before I just look at their README: they also have an interface file, a C++ parser, code generator, and runtime. In PyCLIF's planned future the code generator + runtime will target pybind11. I'm also strongly considering replacing the existing C++ parser with something else; with what is wide open; CastXML looks interesting. When I'm done with that, too, really the only strong difference between autopybind11 and PyCLIF will be the syntax of the interface files. I can imagine that they could even be combined into one system.

@wlav
Copy link

wlav commented Jul 19, 2020

@EricCousineau-TRI Pardon for jumping in, but cppyy is not a ROOT project: it's standalone, and is directly installable with PyPI (https://pypi.org/project/cppyy/) and/or conda (https://anaconda.org/conda-forge/cppyy). Note that the conda release is behind (I don't control it).

What happened with ROOT, and what you are linking above, is that cppyy was forked from PyROOT, with the ROOT team later re-integrating cppyy (the fork) back into PyROOT, replacing the old binder. But their version is crippled for backwards compatibility reasons. Those pythonizations are the ROOT-specific portions of the old PyROOT, ROOT-numpy integration etc. The rootpy project was an attempt to improve the "python" look-and-feel of PyROOT, but is dead AFAIK (the principal author left academia).

As for gccxml, yes that is long gone and you can just pip-install cppyy for PyPy, getting the same backend (Cling). However, the bindings portion of the PyPy version (_cppyy) is well behind the CPython one (CPyCppyy). I'm catching up with this, but there simply has been (and is) more demand on the CPython side, so that has been prioritized. All the basics work with PyPy, but esp. things like automatic cross-inheritance and callbacks are not there yet, and the CPython version is also way better at handling auto-instantiation of complicated templates (think stuff you'll find in Eigen or boost).

All parsing can be done at run-time by Cling. The cmake cppyy-generator is just there for precompiling and packaging; it's not material to the functioning of cppyy. Just cppyy.include() the headers you want to bind and cppyy.load_library() the shared libraries containing the compiled C++ that is being bound and you can try it out to see whether it works and performs on your most important use cases.

The biggest caveat, compared to pybind11 or SWIG, is the dependency on Cling, and thus LLVM. See my thoughts on it here: https://cppyy.readthedocs.io/en/latest/philosophy.html#llvm-dependency

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Jul 24, 2020

@rwgk That sounds awesome - looking forward to it!!! For now, I'm playing with how Drake might consume autopybind11, and see if (for our use case) we can minimize the need for interface files, or at least bootstrap and generate template interface files to go off of (the information is all there).

@wlav Thanks for the info! I'll try to update our summaries to reflect what you've mentioned here.

Regarding the philosophy on where to place the main overhead burden, I believe I understand what you wrote in terms of memory footprint for run-time vs. compile-time.
However, our project has some pretty template-heavy stuff in our public API :( For that reason, I fear it may be a bit slow with Cling-based workflows, and at least for my use cases, the main benefit of Python is being able to quickly iterate across different interpreter sessions.

As a data point, a user had tried out Cling via Jupyter, consuming Drake, here:
#8576 (comment)
They found that it was rather slow, and due to the (deferred) template instantiations, it was perhaps not an optimized user experience.
Perhaps we could optimize Drake more for Cling-based workflows (e.g. do our instantiations better), but I think at present we want to take the slightly simpler (though perhaps more binary-laden) approach of precompilation b/c it's part of our main use case (C++ binaries used directly), though at the cost of having to pre-specify our instantiations and pay the overhead cost of pybind11s code generation.
Additionally, for a Cling-based approach, we'd need to redistribute the C++ source-level dependencies, just the binary runtime dependencies, so it'd add a third category to our dependency list. EDIT: I'm dumb, our binary prereqs are for compiling with Drake 🤦

The other fear here is that if caching is the route for optimizing the user experience, then we may run into Julia-esque problems, and it may be hard to integrate with the hermetic setup we have with Bazel (e.g. having to define possibly incremental caching layers for Bazel targets).

I'll keep it in mind for the future, though!

@wlav
Copy link

wlav commented Jul 24, 2020

@EricCousineau-TRI The date on that bug report precedes this (long) report: https://bitbucket.org/wlav/cppyy/issues/38/interpreted-vs-compiled-code which has resolved many performance issues relating to templates.

Further improvement is expected from precompiled modules (of the headers, not of the template instantiations), which isn't quite as problematic as caching, since these don't grow in size. It also allows, in theory, for distribution of binary representations of the headers, but I'm finding it fickle. It is also not for today, however, or at least not easily: it only works on Linux so far, so it's not enabled. Upstream promises to fix the final issues on MS Windows by the end of the summer. If they don't, then that time scale slides (and I'll have to put in effort myself to get what I want). In that case, it may not be there before the end of the year.

Anyway, not trying to convince you; just wanted to add context to the comments made earlier in this thread about the old PyPy and PyROOT stuff.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Jul 24, 2020

Makes sense, and thank you for the reference!

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Jul 24, 2020

As a side note, while trying to track down different bindings of Clang itself (beyond clang.cindex), also forgot stuff QtPython:
https://doc.qt.io/qtforpython/shiboken2/index.html
(not that I'm advocating for it, but just to see how they use Clang too)

EDIT: Er, can hardly even find the source, noping outta that.
https://code.qt.io/cgit/pyside

@wlav
Copy link

wlav commented Jul 27, 2020

The thing of note for Shiboken is that their "type system" includes the in, out, and in/out specifications of Qt for function arguments. I.e. it is a specification outside of the bindings code itself, which gets used in the generation. I'm not aware of any other binding system to have such a feature: it's usually either part of the intermediate language (as e.g. in SWIG) or part of the manual encoding with flags (as e.g. in pybind11). In cppyy, I use a cop-out: modern C++ with smart pointers doesn't need it and well written code is predictable enough for pythonization rules to handle ownership. :) See the PyROOT pythonizations you reference above for some more examples. cppyy does resolve several clear cases automatically (e.g. automatic life lines if a return address falls within the memory of the instance it was called on). And one thing that upstream has indicated interest in, is to have a Clang pass generate this information. If instead of using function annotations for Cling it is written out in xml, then something like Shiboken could use it too.

Perhaps this is the repo (mirror, anyway) you were looking for: https://github.com/qtproject/pyside-pyside-setup

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Aug 1, 2020

Thanks! I routed around there and come across some of the examples from pyside (putting this here for myself ;)
https://github.com/qtproject/pyside-pyside-setup/blob/dev/README.shiboken2-generator.md

Also, I just stumbled across this project, genpybind: https://github.com/kljohann/genpybind
This person solved the annotation problem by putting it straight in the code itself (e.g. preambles, keep alives, etc.). Not sure I could convince anyone to adopt this setup explicitly, but it looks like handles a lot of the edge cases that I've been pondering about.

FTR I came across this when tinkering with writing pybind11 bindings on top of clang's LibTooling ('cause masochism) - see crappy prototype here.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Aug 1, 2020

FYI @wlav I've added an expanded writeup in autopybind11 on my perusal of cppyy and tried to reference our discussion here:
https://gitlab.kitware.com/autopybind11/autopybind11/-/merge_requests/32

@wlav
Copy link

wlav commented Aug 3, 2020

That genpybind looks awesome; I especially like how those annotations look like Python ones. It's not unheard of for present-day C++ projects to consider Python a first-class citizen. cppyy also allows for conventional callbacks for pythonization in the C++ class, rather than requiring the pythonization to be pre-registered. I was surprised when this was first asked for, but now I have already another open request for conventional callbacks to control the Python -> C++ conversion.

I'm thinking that the genpybind approach can also be done externally to the class. In effect that Shiboken xml file in C++ headers so that both remain in the same source base and the C++ compiler can do type and parameter checks. Such may be a more acceptable approach for existing projects where Python is an add-on.

I read the changes to the writeup and just want to note that lazy binding can never be a performance drain: that is purely on the Python side and since everything is run-time in Python, you can only gain, unless in the rare case where you absolutely use every class and every function that was bound. (And even then, it'd surprise me if the difference were measurable in practical settings).

Using Cling is a different matter: it is the inclusion of many header files in what is effectively a single translation unit that can kill performance. However, it is not fundamental to the approach. First, many projects that have large, complicated, sets of templates (e.g. Eigen, PCL, or boost for that matter), tend to subdivide the project into separate areas that tend to be used separately to make compilation simpler (e.g. Eigen/Dense v.s. Eigen/Sparse). It is not possible to keep this subdivision in Cling if all the parts are used (b/c of the single translation unit approach), but if not, then making separate python modules out of them, instead of putting all headers together as would be the default approach, does the trick. Second, I patched the underlying Clang and Cling to accept extern declared explicit template instantiations as a promise for an available linker symbol later, so Cling (and Clang) will not try to do the instantiation itself. Providing the explicit instantiations is manual work, to be sure, but no worse than is needed for any other binder.

To be sure, I'm not disagreeing with the sentiment of the statement: the promise is to have your cake and eat it too, and it falls flat on that point. And if you can't have your cake, then there is no benefit to the Cling/LLVM dependency and memory overhead. (The calculus will change with modules, to be sure: in that case there will be more cake.) But the approach itself is not technically limiting performance.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Sep 17, 2020

Updated overview with a brief motivational statement.

Also, @wlav: Sorry for the delay, but thank you for following up!
I'll make sure to include the mention about API coverage (e.g. "pay only for what you use").

And yeah, the annotations in genpybind11 look awesome. Basically, seems like one of the better forms of Python-binding attribute association out there (e.g. "ignore", "add keep alive", etc.).
To clarify, the ones that I've seen are:

  • Potentially ambiguous external associations: I have a symbol a::b::MyClass, so I use this (or some form) in a config file to associate some add'l attributes. This falls apart for complex symbols, e.g. overloads with many args, where the "coordinate" in this case is pretty much the full signature.
  • Explicit external assocuiations. Overcome the above ambiguity issue by fully specifying the symbol. Works, but is super verbose.
  • Explicit inline associations as compiler annotations. Annotate the code directly, as is done by genpybind.

FTR, I do like genpybind the most, but I think there could be one more approach that is more expressive:
Annotations via docstrings. We already have annotations, like @exclude_from_pydrake_mkdoc or @pydrake_mkdoc_identifier{foobar} :
https://drake.mit.edu/doxygen_cxx/group__python__bindings.html#PydrakeDoc
Ultimately, these help specify the "coordinates" for docstring identifiers so it's easy to refer to that symbol when the "formula" for it becomes too complex, or to completely exclude the symbol.

At the risk of introducing "yet another method", I would like to vie for adding another "approach" of bindings attribute association:

  • Explicit inline associations as documentation associations. In this way, we can express a bit more information, e.g. using YAML, JSON, free-form Python, add'l C++ expressions, etc. For information that is implicit, no annotation is necessary; otherwise, add some info.

If the inline annotations are too verbose, or perhaps it's best to group a set of symbols, then these explicit associations can help to disambiguate symbols in the least verbose, and most explicit (and discoverable) format.

@wlav
Copy link

wlav commented Sep 28, 2020

Yes, you're right and I don't know the complexity of Drake, but am currently dealing with two very complex (modern) C++ libraries where annotations would not be straightforward. For example, using template syntax to pick different types from different namespaces. This is, post-template instantiation, completely unambiguous, but not while parsing the source.

But anyway, there recently was a kick-off meeting to abstract the bulk of Cling into ClangREPL, which would then become an "official" Clang project (e.g. lldb would benefit), and a libIncremental to support, among others, language bindings. One thing that came up were custom passes that can collect specific information, incl. annotations, targeted at binders. You would then not be syntactically limited (could even put the info in comments). So e.g. an annotation could apply to a delimited set of overloads, rather than selecting them by name.

All this is long term: that project has a span of 3 years, at the end of which it is expected to deliver seamless integration of C++, Python, and some other yet-to-be-determined language, in jupyter notebooks. So also Python entities available to Cling as C++ proxies (I did this long time ago as a proof-of-concept, still requiring casts to do type extraction, but there is no technical limitation to eg. do cross-inheritance back-and-forth as often as you like).

Good things to come ... in 3 years. :)

@EricCousineau-TRI
Copy link
Contributor Author

Good things to come ... in 3 years. :)

Ooh, all of that looks absolutely awesome!!! Is there a website / project / mailing list that I can subscribe to if I wanna keep tabs on this as well?

@wlav
Copy link

wlav commented Oct 1, 2020

The original RFC, and its follow-up thread, is here (among many other places):
http://www.tuxmachines.org/node/139759

At the bottom is the Google group they intend to use, but it hasn't seen much activity yet (except for the initial kick-off meeting). The current focus of the main upstream devs is still Clang9, which is close to be released (I tried a port of cppyy already, and other then some initialization issues, it seems to work; that'll make a whole bunch of C++20 features available, although C++20 support as a whole isn't complete yet even in Clang master).

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Dec 31, 2020

FYI @rwgk Per recent convo, I've added a mention of FFIG here: https://gitlab.kitware.com/autopybind11/autopybind11/-/merge_requests/116
Marginally relates holder discussion: pybind/pybind11#2646

@jwnimmer-tri
Copy link
Collaborator

Closing out as "backlog unlikely to reach an actionable state within a reasonable timeframe".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants