Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Adding string constructor for enum #1122

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

henryiii
Copy link
Collaborator

@henryiii henryiii commented Sep 29, 2017

This adds the ability to access an enum using a string. So the following works now:

from m import MyEnum
a = MyEnum.A
b = MyEnum("A")
a == b

Besides making it a bit easier to access enum values using a variable (vs. getattr), this also fixes #483 in a general way (and was based on comments there, combined with newer procedures). Assuming you have a C++ enum and function like this:

enum class MyEnum {
    A = 1,
    B
};

int read_enum(MyEnum e) {
    return e==MyEnum::A ? 1 : 2;
}    

You can now add to the bindings:

py::implicitly_convertible<std::string, MyEnum>();

And then this works:

m.read_enum("A");

That reuses and elevates a nice feature (implicitly_convertible), rather than adding something new to enum_'s syntax. If this looks good, I'll add some docs and remove the WIP flag. The issue pointed out in the PR, which is that the error is swallowed up and converted to a less-useful one also needs to be addressed.

@wjakob
Copy link
Member

wjakob commented Sep 29, 2017

It's hard to read the diff because the code was moved in the file -- could you fix that so that it's easier to review the PR?

@henryiii
Copy link
Collaborator Author

Oops, sorry, I originally needed to move it to add implicitly_convertible, since that comes later in the file, but now it doesn't need to be moved. I'll fix it.

@henryiii henryiii force-pushed the enum_string branch 2 times, most recently from 287aa03 to f73bcd4 Compare September 29, 2017 15:52
@henryiii
Copy link
Collaborator Author

Fixed. I've tried several different error types for the throw, but they all get converted to:

>       m.test_conversion_enum("NotEnum")
E       TypeError: test_conversion_enum(): incompatible function arguments. The following argument types are supported:
E           1. (arg0: pybind11_tests.enums.ConversionEnum) -> str
E
E       Invoked with: 'NotEnum'

@henryiii henryiii force-pushed the enum_string branch 2 times, most recently from c3eeb98 to b02e440 Compare September 29, 2017 17:27
@henryiii
Copy link
Collaborator Author

henryiii commented Sep 29, 2017

Any idea why there's an error on some but not all versions of python+compiler? (bad alloc error) I can simply it to this:

def(init([this](std::string value) -> Type {
    for (const auto &kv : m_entries) {
        std::string key = cast<str>(kv.first);
        if(value == key || key == m_name + "::" + value) {
            return cast<Type>(kv.second);
        }
    }
    throw value_error("\"" + value + "\" is not a valid value for enum type " + m_name);
}));

But still get the error on GCC Py2.7 when I run:

python -c "import pybind11_tests as m; print(m.enums.ConversionEnum('Convert1'))"

Newer compilers and Pythons seem to be happier. Most irritatingly, LLVM on macOS is happy, so I can't easily attach to python and debug in Xcode.

Edit: fixed. It was from using [this], which was unnecessary since this is all inside the class constructor.

@jagerman
Copy link
Member

I haven't reviewed the PR in detail yet (I'm tied up with other things at the moment--and anyway I'll wait for you to finish it off before I do), but as a first observation I think using implicitly_convertible for this is a great approach (in fact that is exactly what I would have done for this).

One observation: I think you can use py::implicitly_convertible<std::string, EnumType>(); rather than py::str for the "From" type.

@henryiii henryiii force-pushed the enum_string branch 6 times, most recently from 44b2da3 to 8e279ac Compare October 1, 2017 01:09
@YannickJadoul
Copy link
Collaborator

Suggestion: I already had added something like this in my project. There's however 2 slight differences that might be interesting for this PR:

  1. I added the extra from-string constructor only optionally (outside of the enum_ constructor) (such that not each enum exports this constructor)
  2. I added the possibility to optionally case-insensitively search for the value. Most of the time, enums do not have values that only differ by case, so MyEnum('a') seems like it could be useful in certain points of an API?

Especially, if this __init__ is always added to the class, it seems hard to have a user enable the case-insensitive comparison (since .def(init(...)) will result in a secondary overload) ?

For reference, this is how my version looks: (Surely, it's not better, but it might contain some interesting things?)

template <typename Type>
void make_implicitly_convertible_from_string(pybind11::enum_<Type> &enumType, bool ignoreCase=false)
{
	enumType.def(pybind11::init([enumType, ignoreCase](const std::string &value)
	                            {
		                            auto values = enumType.attr("__members__").template cast<pybind11::dict>();

		                            auto strValue = pybind11::str(value);
		                            if (values.contains(strValue)) {
			                            return Type(values[strValue].template cast<Type>());
		                            }

		                            if (ignoreCase) {
			                            auto upperStrValue = strValue.attr("upper")();
			                            for (auto &item : values) {
				                            if (item.first.attr("upper")().attr("__eq__")(upperStrValue).template cast<bool>()) {
					                            return Type(item.second.template cast<Type>());
				                            }
			                            }
		                            }

		                            throw pybind11::value_error("\"" + value + "\" is not a valid value for enum type " + enumType.attr("__name__").template cast<std::string>());
	                            }));

	pybind11::implicitly_convertible<std::string, Type>();
};

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 1, 2017

I don't really like the idea of adding case insensitivity to just the string constructor; shouldn't MyEnum.a also work then? And there is no guarantee that enums be unique if you drop case insensitivity, either in Python 3 or C++. If someone really wanted string to enum case-insensitive, that could be simply added as an explicit overload to the function needing it.


However, to add that as a feature, a new py::case_insensitive flag that could be added to the constructor. I have a working test implementation. Main changes:

constexpr bool is_case_insensitive = detail::any_of<std::is_same<case_insensitive, Extra>...>::value;
// ... later ...
std::string key = pybind11::cast<pybind11::str>(is_case_insensitive ? kv.first.attr("upper") : kv.first);
if(is_case_insensitive) {
    value = pybind11::cast<pybind11::str>(pybind11::str(value).attr("upper")());
}

Eventually, one could also add a __getattr__ function and a check during construction to ensure there are no duplicates.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Oct 1, 2017

@henryiii Nono, agreed that that's not a good default! But I'm just thinking of this file extension enum I have: all values are capitalized (according to the Python code conventions), but writing a all-lowercase version seems very natural. So for example a flag was what I was thinking of, indeed :-)

If someone really wanted string to enum case-insensitive, that could be simply added as an explicit overload to the function needing it.

That's the problem. I don't think you could do that easily, could you? Since the non-case-sensitive overload is already there and will already accept all calls with a single string argument?

(But of course, if no one else would think this feature is worth it, I could find some way around it, or consider dropping my support for it...)

By the way: for the case-sensitive implementation you have now, do you need this loop? Wouldn't just accessing the dictionary with this key (~ m_entries[py::str(value)]) be easier/more efficient than a linear loop?

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 2, 2017

@YannickJadoul , shouldn't ".py" also be allowed for a file extension? I think what you really want is a custom string converter, not just case insensitivity, since you really have a string interface that then maps to an enum. In general, you probably should have two API functions:

.def("func", &func)
.def("func", [](std::string in){... find enum and call func ...});

However, "faking it" by adding a custom string converter for the enum was a nice trick to shorten the syntax, and that seems to be lost with this PR. The problem is that adding a new constructor doesn't precede the old one. Shouldn't the functions be traversed in reverse order, to allow new functions to override previously added ones?

Save for a better fix, a tag could be added to allow the string init to be turned on or off.

@YannickJadoul
Copy link
Collaborator

@henryiii Yeah, you could be right. That was just an example, of course. I do still think that case-insensitivity could be interesting (99% of all enums do not depend on case sensitivity, I'd guess), in some cases, but ... maybe it's too specific for a generic constructor, yes.

Shouldn't the functions be traversed in reverse order, to allow new functions to override previously added ones?

Most of the time, you'd all .def(...) them yourself on a class_ instance and you want to keep the order you define them in, no? (Also, such a change would potentially break lots of existing code, so I don't think you'll convince anyone ;-) )

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 2, 2017

How about the flag idea for case-insensitivity?

py::enum_< MyEnum >(m, "MyEnum", py::case_insensitive())
    .def("A")
    .def("B")
};

Or would it be better to go back to the separate method:

py::enum_< MyEnum >(m, "MyEnum")
    .def("A")
    .def("B")
    .add_string_constructor()
};

The later would allow you to add your own string constructor instead.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Oct 2, 2017

Hmm, I like both. I definitely like the flag thing; that would be perfect :-) (I also have a boolean in my quick code)

For other users, I don't know; maybe the .add_string_constructor() (possibly combined with that flag?) would be a nice idea? But then again, maybe that's too much; I actually do like that constructor :-)

Another option still would be enum.implicitly_convertible_from_string(), combining the .add_string_constructor() and implicitly_convertible? (Not sure I like it, but it's yet another option nevertheless.)

(EDIT: Thanks for considering, btw! Again, not claiming it nééds to be in there, but maybe others will like the feature, so since you're making a PR anyway, I could as well bring it up.)

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 2, 2017

Too many choices! Someone official please inform or vote:

  1. Mandatory string constructor, as it is now in the PR. Simple, clean, but doesn't allow new string constructors to be added.
  2. Mandatory string constructor, but with a new flag, py::case_insensitive, that would mark the struct as case insensitive. This could also throw an error if two identical values are inserted, and possibly even allow case insensitive access. (PS: I have a stash where a proof of principle of this is implemented)
  3. Optional string constructor, either with:
    a. A flag, either turning it off or on
    b. An explicit function (probably the later). The function would be either:
    b1. enum.implicitly_convertible_from_string (adds implicitly_convertible)
    b2. enum.add_string_constructor (separate implicitly_convertable call).
    Either function could have an optional case-insensitive boolean.

@TD22057
Copy link
Contributor

TD22057 commented Oct 3, 2017

I definitely need this for my project so thank you very much. For us, case insensitivity isn't important - to my thinking it's a bit strange in that it's easy to see how "FOO" maps to Enum::FOO but not "fOo" but that's just me.

I'm not sure what "doesn't allow new string constructors" means in choice 1) so I wouldn't know how to vote - other than yes, I want this :-)

@@ -1377,6 +1377,14 @@ template <typename Type> class enum_ : public class_<Type> {
return m;
}, return_value_policy::copy);
def(init([](Scalar i) { return static_cast<Type>(i); }));
def(init([name, m_entries_ptr](std::string value) -> Type {
pybind11::dict values = reinterpret_borrow<pybind11::dict>(m_entries_ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't your code, but let's clean it up anyway: instead of auto m_entries_ptr = m_entries.inc_ref().ptr(); being a raw, incremented pointer, we can change it to: auto entries = m_entries;. Then all the lambdas (this, plus the others that do a reinterpret_borrow) can just capture entries and can simplify their reinterpret_borrow<dict>(m_entries_ptr) to just entries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, much nicer the new way. Thanks!

def test_scoped_enum():
assert m.test_scoped_enum(m.ScopedEnum.Three) == "ScopedEnum::Three"
with pytest.raises(TypeError):
m.test_scoped_enum("Three")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check here against the specific exception message, via:

with pytest.raises(TypeError) as excinfo:
    # ... the test call
assert str(excinfo.value) == "the expected exception message"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is this gives the wrong error message. I need help figuring out why the error message is being swallowed. I've tried stepping through it in Xcode, but finding where exceptions (at least this one) is being caught is tricky. I think it's inside a constructor/destructor somewhere. I think that directly calling the constructor works, but not if it's part of an implicit conversion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for errors, one that passes (direct construction was okay) and one that fails (implicit construction).

@jagerman
Copy link
Member

jagerman commented Oct 6, 2017

I don't think case insensitivity is worth supporting or worrying about. It obviously isn't supported in C++, but neither is it supported by Python's (3.4+) enums string conversion:

import enum
class Color(enum.Enum):
    RED = 1
    green = 2

print(repr(Color["RED"])) 
print(repr(Color["green"])) 
print(repr(Color["Green"]))
<Color.RED: 1>
<Color.green: 2>
Traceback (most recent call last):
  File "<string>", line 9, in <module>
  File "/usr/lib/python3.5/enum.py", line 277, in __getitem__
    return cls._member_map_[name]
KeyError: 'Green'

@jagerman
Copy link
Member

jagerman commented Oct 6, 2017

I'm not sure what "doesn't allow new string constructors" means in choice 1)

You can define multiple constructors for an object in pybind; they are called in order of definition, and the first for which the given set of arguments can be converted from Python values to match the C++ argument types wins. What @henryiii means is that without his addition, you could always add an extra string-taking constructor that does case-insensitive (or any other sort of arbitrary lookup based on an input string), but if the core functionality unconditionally adds a string-taking constructor, that won't be possible anymore--the core constructor will always be checked first.

If this is really important to someone, I'd rather see a more general ability to inject a method definition at the beginning of the overload chain (e.g. cl.def("f", &C::f, py::priority_overload) or something along those lines).

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 9, 2017

@jagerman, I like that idea much better. I'll try to look at it in a bit. (Added in #1131)

@henryiii henryiii mentioned this pull request Oct 9, 2017
@YannickJadoul
Copy link
Collaborator

Fair enough, if I'm completely alone in liking the case insensitivity, let's just forget about it :-)

@wjakob
Copy link
Member

wjakob commented Oct 9, 2017

An aside that is not immediately related to the change here: I'm wondering if it would make sense to create an abstract pybind11-internal type which is a base class of all enumerations. Many of the functions declared over and over again for each enumeration aren't really specific to that enumeration or could be implemented in a generic way, reducing object code bloat. Projects that use many enumerations have hundreds of redundant versions of these functions, which is a concern.

PS: While looking at the patch, it wasn't clear to me why the entries_ptr handle had to be changed to a py::object (which also comes with a object code penalty, since all functions with a capture object will now require further reference counting on creation and destruction)

@henryiii
Copy link
Collaborator Author

I improved the line I think you are talking about:

auto entries = m_entries;

a bit:

dict& entries = m_entries;

But functions that capture still make a copy of the py::dict object, I believe, adding reference counting. Did the old version ever release the reference it added, though? This change was suggested as a cleanup by @jagerman.

@wjakob
Copy link
Member

wjakob commented Oct 10, 2017

The old version read

auto m_entries_ptr = m_entries.inc_ref().ptr();

so it was just using a pointer to the underlying PyObject.

@henryiii
Copy link
Collaborator Author

Doesn't inc_ref add a (leaked) reference? I can change it back tomorrow.

@jagerman
Copy link
Member

Yes; but since we don't support type destruction, every type is essentially a leaked reference already. It's not all that "clean", but as long as types aren't unloadable, it's not actually creating any new leak.

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 11, 2017

I've reverted the change, a git revert was just about all it took. Anyone have an idea about improving the error (type and/or message) on function argument implicit conversion? That's the "failing" test.

def test_conversion_enum_raises_implicit():
with pytest.raises(ValueError) as excinfo:
m.test_conversion_enum("Convert0")
assert str(excinfo.value) == "\"Convert0\" is not a valid value for enum type ConversionEnum"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails, since it gives a generic message and TypeError.

@zeroset
Copy link

zeroset commented Feb 25, 2021

What is the status of this PR? Was the feature implemented somewhere else already?

@EricCousineau-TRI
Copy link
Collaborator

EricCousineau-TRI commented Apr 18, 2021

Er, not sure if this approach is "Pythonic" with stdlib Enum. I had trouble replicating a constructor-based approach in Python: since stdlib uses metaclass, if I override MyEnum.__new__ then it activates some, uh, creative codepaths:
https://docs.python.org/3/library/enum.html#when-to-use-new-vs-init
https://github.com/python/cpython/blob/v3.6.9/Lib/enum.py#L481-L498
My dumb attempt: https://gist.github.com/EricCousineau-TRI/4f2a1a00bfd05cae04e3e4a8f1cbca76

Main concern here is the onus of maintaining this functionality if/when we're able to resolve #2332 to some extent, either by replication or as @wjakob mentioned above (comment) via actually using enum lib.

@henryiii henryiii modified the milestones: v2.7, v2.8 Jul 16, 2021
@henryiii henryiii modified the milestones: v2.8, v2.9 Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implicit conversion from Python string to C++ enum type
7 participants