Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit conversion from Python string to C++ enum type #483

Closed
stukowski opened this issue Nov 4, 2016 · 9 comments · May be fixed by #1122
Closed

Implicit conversion from Python string to C++ enum type #483

stukowski opened this issue Nov 4, 2016 · 9 comments · May be fixed by #1122

Comments

@stukowski
Copy link
Contributor

Let me propose an extension:
Sometimes it would be nice if a string could be passed from a Python script to a C++ function where an enum parameter is expected. For example:

struct Pet {
    enum Kind {
        Dog = 0,
        Cat
    };
    Pet(Kind type) : type(type) { }
    Kind type;
};

Binding code:

auto pet = py::class_<Pet>(m, "Pet")
    .def(py::init<Pet::Kind>());

py::enum_<Pet::Kind>(pet, "Kind")
    .value("Dog", Pet::Kind::Dog)
    .value("Cat", Pet::Kind::Cat);

Python:

>>> p = Pet(Pet.Cat) # Works
>>> p = Pet("Cat")   # Would like to make this work too

In principle it is possible to add optional support for this implicit conversion from a Python string to the enum type. I propose to add the following method to the pybind11::enum_ class:

template <typename Type> class enum_ : public class_<Type> {
    ...

    /// Set up atomatic Python string -> C++ enum conversion.
    enum_& enable_conversion_from_string() {
        auto entries = m_entries;
	def("__init__", [entries](Type& v, const char* str) {
            for(const auto& item : *entries) {
                if(!strcmp(str, item.second)) {
                    v = (Type)item.first;
                    return;
                }
            }
            std::string tname = typeid(Type).name();
            detail::clean_type_id(tname);
            throw value_error("\"" + std::string(str) + "\" is not a valid value for enum type " + tname);
        });
	implicitly_convertible<str, Type>();
        return *this;
    }
    ...
}

This would allow writing binding codes like this:

py::enum_<Pet::Kind>(pet, "Kind")
    .value("Dog", Pet::Kind::Dog)
    .value("Cat", Pet::Kind::Cat)
    .enable_conversion_from_string();

There is only one problem with this solution: The exception thrown by the conversion constructor when the string does not match any of the enum entries is swallowed by pybind11. Instead, a general cast error is raised because the conversion from string to the enum type failed.

I am wondering what you think about this idea. Does it make sense to offer an automatic conversion like this? At least in my project there is one situation where an automatic conversion from a string is desirable.

By the way: The pybind11::enum_ class constructor registers two __init__ methods: One initializing the value via assignment, the other via in-place new. I think one of them is redundant. Furthermore, I noticed that the std::unordered_map allocated by the constructor is never freed. Does it make sense to use a shared_ptr<unordered_map> here so that the map gets destroyed eventually?

@jagerman
Copy link
Member

jagerman commented Nov 4, 2016

The second __init__ indeed looks weird; and moreover I don't think it'll ever actually be invoked. It was added in 6fb4849 along with other changes to fix a problem, but I don't think the second constructor was actually the fix here. (I also don't think the second __init__ will ever be called, since pybind looks for things to call based on their signatures, and these have identical signatures).

The non-deleted pointer is intentional: registration of classes/enums is permanent, and that pointer needs to be alive (because __repr__ uses it) as long as the enum is registered, i.e. forever. The enum_ instance itself, however, only persists during the initialization code: but its effects, which depend on that unordered map, remain forever.

@jagerman
Copy link
Member

jagerman commented Nov 4, 2016

One other suggestion: the use of const char* as the value type in the map in the current code (and even more so in your code) seems to be complicating the code; changing it to std::string would be a decent cleanup. That would change __repr__ to:

        def("__repr__", [name, entries](Type value) -> std::string {
            auto it = entries->find((UnderlyingType) value);
            return name + (it == entries->end() ? std::string(".???") : "." + it->second);
        });

and would change your code to take a std::string, then use regular == comparison instead of the strcmp.

Another minor comment:

std::string tname = typeid(Type).name();
detail::clean_type_id(tname);
throw value_error("\"" + std::string(str) + "\" is not a valid value for enum type " + tname);

the get-name-and-clean-it is already available via pybind11::type_id<Type>(), so, with the change of str to std::string, you can replace all of the above with:

throw value_error("\"" + str + "\" is not a valid value for enum type " + type_id<T>());

Edit: actually, type_id<T>() isn't really what you want here: you actually want the name of the Python type, so you'd need to store the name given to the constructor in a private member, then use a capture-by-value of it in your added lambda.

@jagerman
Copy link
Member

jagerman commented Nov 4, 2016

(I also suggest you code this up, add some tests for it, and submit a PR--it'll make it a bit easier to discuss the code and see it in action).

@wjakob
Copy link
Member

wjakob commented Nov 6, 2016

all agreed -- please resubmit the modified version as a PR, and we can take it from there. I will close this ticket since this is a feature request and not a pybind11 issue.

@wjakob wjakob closed this as completed Nov 6, 2016
@stukowski
Copy link
Contributor Author

Thank you all for the suggestions. I'll submit a PR as soon as I find time to code this up.

As I already mentioned, I don't know yet how to generate a meaningful error from this conversion constructor. pybind11 swallows any exception thrown by the constructor function and generates a general casting error message instead. So the nice error message generated by my function never makes it to the Python caller. If you have any idea how to inform the user that his input string did not match any of the registered enum values, let me know.

@jagerman: My comment on the non-deleted map pointer was based on my (wrong) assumption that the enum_ instance (and with it all lamdas it has registered) would get destroyed on Py_Finalize(). I now learned that this is not the case and the memory leak is intentional. Of course, there is not point in using a shared_ptr then.

@wjakob
Copy link
Member

wjakob commented Nov 7, 2016

pybind11 swallows any exception thrown by the constructor function and generates a general casting error message instead.

This sounds quite mysterious to me. Why don't you submit your PR with that issue still present, and we can take a look?

@YannickJadoul
Copy link
Collaborator

Has there been any progress on this pull request? If @stukowski is not planning to make one, I might try to implement the proposed solution such that I'm able to use it in my project.

@TD22057
Copy link
Contributor

TD22057 commented Sep 28, 2017

+1 - I'd really like this feature as well. Our current python code generator does this and it's really useful. I can't find an open or closed PR for this anywhere.

@ezyang
Copy link

ezyang commented Sep 5, 2022

Using #1176 I was able to achieve this effect by defining a type_caster directly on the enum type while inheriting from type_caster_base. Example:

template <>
struct type_caster<c10::DispatchKey> : public type_caster_base<c10::DispatchKey> {
  using base = type_caster_base<c10::DispatchKey>;
  c10::DispatchKey tmp;
 public:
  bool load(handle src, bool convert) {
    if (base::load(src, convert)) {
      return true;
    } else if (py::isinstance(src, py::module_::import("builtins").attr("str"))) {
      tmp = c10::parseDispatchKey(py::cast<std::string>(src));
      value = &tmp;
      return true;
    } 
    return false;
  }   

  static handle cast(
      c10::DispatchKey src,
      return_value_policy policy,
      handle parent) {
    return base::cast(src, policy, parent);
  } 
};

Unfortunately, you will have to somehow implement the string parsing yourself. I had a helper C++ function which I used.

I'm not entirely sure this doesn't leak, as I'm doing a dynamic allocation in order to get a value of the correct form.

EDIT: Unfortunately this leaks memory

EDIT 2: Updated with a non-leaking version, though IDK how kosher it is. I know that converters have some safety to avoid returning temporaries so hoping this is kosher.

ezyang added a commit to pytorch/pytorch that referenced this issue Sep 7, 2022
There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

* We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered *before* every other dispatch key in the dispatch key, and shunting to a Python implementation
* The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like `__torch_dispatch__`, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
* I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See pybind/pybind11#483 (comment) for how this works; the technique is generally useful.
* I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

One missing piece is how to integrate this with voz's infra for Python operators.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
voznesenskym added a commit to pytorch/pytorch that referenced this issue Sep 13, 2022
…patcher"


Signed-off-by: Edward Z. Yang <ezyangfb.com>

From ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See pybind/pybind11#483 (comment) for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
voznesenskym added a commit to pytorch/pytorch that referenced this issue Sep 13, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

From ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See pybind/pybind11#483 (comment) for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this issue Sep 14, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

From @ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See pybind/pybind11#483 (comment) for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: #84826
Approved by: https://github.com/ezyang
voznesenskym added a commit to pytorch/pytorch that referenced this issue Sep 14, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

pick e219ce8

Integrate python-only dispatching, make pyop tests pass

Test fixes, mypy stuff

Lints and defines

small matmul change

Make matmul test happy

small test fix

fix tos

More test fixes

Save PythonDispatcherTLS to ThreadLocalState

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]

WIP: Python Dispatcher integration with C++ dispatcher

Signed-off-by: Edward Z. Yang <ezyangfb.com>

pick e219ce8

Integrate python-only dispatching, make pyop tests pass

Test fixes, mypy stuff

Lints and defines

small matmul change

Make matmul test happy

small test fix

fix tos

More test fixes

Save PythonDispatcherTLS to ThreadLocalState

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Make py_impl less error prone, make decompose use py_kernels

Wire up matmul and broadcast_tensors for real

Signed-off-by: Edward Z. Yang <[email protected]>

fake file object for better backtrace

Signed-off-by: Edward Z. Yang <[email protected]>

Make _dispatch_key_name and parse more useful

Signed-off-by: Edward Z. Yang <[email protected]>

Reorder

Merge branch 'gh/voznesenskym/12/head' of github.com:pytorch/pytorch into gh/voznesenskym/12/head

Fix typo

expect pass

Mode registration

Better comment

Better mode registration

Fix incorrect assertion

purge matmul, fix string

Test removal

Lints, small fixes

fix attr mypy

pybind hack...

lint

better check

More linter stuff

Please work, linter

More linter woes

Update base for Update on "Python Dispatcher integration with C++ dispatcher"

Signed-off-by: Edward Z. Yang <ezyangfb.com>

From ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See pybind/pybind11#483 (comment) for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]

WIP: Python Dispatcher integration with C++ dispatcher

Signed-off-by: Edward Z. Yang <ezyangfb.com>

pick e219ce8

Integrate python-only dispatching, make pyop tests pass

Test fixes, mypy stuff

Lints and defines

small matmul change

Make matmul test happy

small test fix

fix tos

More test fixes

Save PythonDispatcherTLS to ThreadLocalState

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Ed fixes my linter woes

Cleanup

Merge branch 'gh/voznesenskym/12/head' of github.com:pytorch/pytorch into gh/voznesenskym/12/head

Fix clowny stuff

Fix dynamo failure by unhacking op names

WIP: Python Dispatcher integration with C++ dispatcher

Signed-off-by: Edward Z. Yang <ezyangfb.com>

pick e219ce8

Integrate python-only dispatching, make pyop tests pass

Test fixes, mypy stuff

Lints and defines

small matmul change

Make matmul test happy

small test fix

fix tos

More test fixes

Save PythonDispatcherTLS to ThreadLocalState

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 2fca1533c653eb29c0ae1d300a6de5041420af4d
Pull Request resolved: #84826

Ed fixes my linter woes

ghstack-source-id: 01aa659a09a607af44583ac3813ba39817d9b420
Pull Request resolved: #84965

Cleanup

ghstack-source-id: 29b0515b75590fed65e4a2763be1348454df09fe
Pull Request resolved: #84966

WIP: Python Dispatcher integration with C++ dispatcher

Signed-off-by: Edward Z. Yang <ezyangfb.com>

pick e219ce8

Integrate python-only dispatching, make pyop tests pass

Test fixes, mypy stuff

Lints and defines

small matmul change

Make matmul test happy

small test fix

fix tos

More test fixes

Save PythonDispatcherTLS to ThreadLocalState

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: f7c212074fdd5fe7758b528acb1cab51d02e463e
Pull Request resolved: #84826

Make py_impl less error prone, make decompose use py_kernels

Wire up matmul and broadcast_tensors for real

Signed-off-by: Edward Z. Yang <[email protected]>

fake file object for better backtrace

Signed-off-by: Edward Z. Yang <[email protected]>

Make _dispatch_key_name and parse more useful

Signed-off-by: Edward Z. Yang <[email protected]>

Reorder

WIP: Python Dispatcher integration with C++ dispatcher

Signed-off-by: Edward Z. Yang <ezyangfb.com>

pick e219ce8

Integrate python-only dispatching, make pyop tests pass

Test fixes, mypy stuff

Lints and defines

small matmul change

Make matmul test happy

small test fix

fix tos

More test fixes

Save PythonDispatcherTLS to ThreadLocalState

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]

Fix typo

expect pass

Mode registration

Better comment

Better mode registration

Fix incorrect assertion

purge matmul, fix string

Test removal

Lints, small fixes

fix attr mypy

pybind hack...

lint

better check

More linter stuff

Please work, linter

More linter woes

Fix clowny stuff

Fix dynamo failure by unhacking op names
mehtanirav pushed a commit to pytorch/pytorch that referenced this issue Oct 4, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

From @ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See pybind/pybind11#483 (comment) for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: #84826
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants