Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Python 3 enums #781

Closed
wants to merge 11 commits into from
Closed

WIP: Python 3 enums #781

wants to merge 11 commits into from

Conversation

aldanor
Copy link
Member

@aldanor aldanor commented Apr 5, 2017

This adds support for wrapping C/C++ enums as Python 3's enum.IntEnum (with @unique decorator applied). It would also work with enum34 backport. If neither is available, ImportError will be thrown. The existing enum_ wrappers are untouched and should work as before.

  • Don't derive enum_ from class_
  • Allow value().value().def().def() API, like the existing enum_<>
  • Add full tests for class_interface<> API
  • Add support for py::arithmetic
  • Add support for .export_values()
  • Rename to enum34
  • Implement automatic compile-time selection between enum_ and enum34
  • Update docs

(Closes #530)

@aldanor
Copy link
Member Author

aldanor commented Apr 5, 2017

Another point, implementation-wise, I guess it could be possible to avoid the unordered map of registered py3 enums (py3_enum_info::registry) and the caster flag (type_caster<...>::py3), and instead store py3_enum_info* directly in each enum type's type_caster<...>. Would that be preferable?

@jagerman
Copy link
Member

jagerman commented Apr 6, 2017

Thoughts on having this replace the current py::enum_ implementation (with the current code preserved as a detail-level fallback implementation when enum.IntEnum isn't available) instead of having dual enum implementations? 2.1 had some additions (in #666) to make enum_ behave more like enum.IntEnum already; this seems like a logical next step.

It would break some potential backwards compatibility in that you can't use enum_ as a class_ anymore, but I don't think that was documented to begin with, nor does it seem particularly useful for an enum implementation.

I definitely like the idea of storing a the values in a static variable in the type_caster<EnumType>. It would also let you store the keys as a std::underlying_type_t<EnumType> (rather than the long long--which might have a problem if someone stashed a (size_t) -1 in an enum).

@jagerman
Copy link
Member

jagerman commented Apr 6, 2017

Upon interpreter shutdown something gets decrefed too many times...

It's caused by the static object ctor and unique. Changing them to private, non-static local members with the import+attr as default values fixes it. (I don't see much point in having them static: it seems pretty unlikely that a py3_enum<T> instance for a given T is going to be instantiated a second time for the same T, and if it is, an extra import+attr is not a big deal).

@jagerman
Copy link
Member

jagerman commented Apr 6, 2017

I just had a thought about recreating the object each time you call value(): you could, perhaps, make value() return a temporary object which itself has a value() method that records new values, but doesn't call update() until destruction. Something like this (untested):

template <typename T>
class py3_enum {
...
    class value_loader {
    private:
        py3_enum &owner;
        friend class py3_enum;
        value_loader(py3_enum &e) : owner{e} {}
    public:
        value_loader &value(const char *name, T value) {
            owner.entries[name] = cast(static_cast...);
            return *this;
        }
        ~value_loader() { owner.update(); }
    };

    value_loader value(const char *name, T value) {
        value_loader vl(*this);
        vl.value(name, value);
        return vl;
    }
...
};

Then the common approach:

py_enum3<MyEnum>(m, "MyEnum")
    .value("v1", MyEnum.v1)
    .value("v2", MyEnum.v2)
    .value("v3", MyEnum.v3)
    ;

will only do the IntEnum construction once. (I suppose it could do it twice if it doesn't apply RVO, but that shouldn't be a big deal).

@aldanor
Copy link
Member Author

aldanor commented Apr 6, 2017

It's caused by the static object ctor and unique. Changing them to private, non-static local members with the import+attr as default values fixes it.

Thanks -- yes, that fixes it.

I just had a thought about recreating the object each time you call value(): you could, perhaps, make value() return a temporary object which itself has a value() method that records new values, but doesn't call update() until destruction.

This could work if you still call update() once in py3_enum's ctor, so that it works for empty enums. You could also potentially move update() into ~py3_enum(): assuming you don't assign it to a variable, the dtor will be called before the next line of code; elsewise, will be called at the end of scope.

@aldanor aldanor force-pushed the feature/py3-enum branch 2 times, most recently from a5d6d27 to c5b5b58 Compare April 6, 2017 09:21
@aldanor
Copy link
Member Author

aldanor commented Apr 6, 2017

The only disadvantage of the current implementation, as I see it, is that you can't define methods / properties on the newly created enum class, because it's not a class_<>. In fact, py3_enum is not even an object (should it be, somehow?).

Hypothetically, here's also an alternative implementation: define py3 enums through class_, somewhat similar to the current enum_. This, however, will require two features of class_ that we don't currently have:

  • Accept a class dict to be used at the type instantiation site -- because many metaclasses (like those of enum types) use that. This could be also a useful thing to have in general. I tried it out in a local branch and it's a trivial addition. However, it's not very useful without having the next point implemented.
  • A non-trivial part is actually calling metaclass's __new__() if it's provided, similar to what builtins.__build__class__() does -- we currently don't, so metaclass support is somewhat half-baked and is limited to post-class-creation use.

Also, in this case you won't be able to do enum_<>().value().value().def() anyway since the full dict of entries is needed at type instantiation point, so it would either have to be enum_<>().value().value().finalize().def(), or enum_<>(...=..., ...=...).def().

@aldanor aldanor force-pushed the feature/py3-enum branch 2 times, most recently from 2fe1849 to a4381aa Compare April 6, 2017 13:44
@aldanor
Copy link
Member Author

aldanor commented Apr 6, 2017

@jagerman In fc629e2 I've collapsed most of py3 enum impl into the type_caster<> which gets rid of the unnecessary map lookup on cast/load. Also, py3_enum_info is templated on the underlying type which gets rid of all long long.

@aldanor aldanor force-pushed the feature/py3-enum branch 2 times, most recently from fc629e2 to d5e39cf Compare April 6, 2017 21:54
@aldanor
Copy link
Member Author

aldanor commented Apr 6, 2017

I've fixed a few things like setting __module__ / __qualname__, so the new enums now behave similar to existing types.

@jagerman I've also tried out a very cheesy way of allowing to define properties and methods on Python 3 enums -- surprisingly it seemed to work, at least for simple cases I've tested -- so I pushed it as WIP in the last commit. If it's too hacky, can always roll it back.

Basically, this seems to work:

py3_enum<Foo>(m, "Foo")
    .value("A", Foo::A)
    .value("B", Foo::B)
    .extend() // requires rvalue *this
    .def_property_readonly("is_a", [](Foo x) { return x == Foo::A; });

@aldanor
Copy link
Member Author

aldanor commented Apr 6, 2017

Darn AppVeyor complaining all the time...

@jagerman
Copy link
Member

jagerman commented Apr 6, 2017

I wonder if we should just turn off C4456 through C4459 warnings within pybind code. I don't really subscribe to the every-variable-must-be-unique idea of MS coding mentality that leads to silly things like prefixing all class members with some random character.

@aldanor
Copy link
Member Author

aldanor commented Apr 6, 2017

I'd be +1 for sure, those warnings are very opinionated.

@wjakob
Copy link
Member

wjakob commented Apr 6, 2017

I wonder if we should just turn off C4456 through C4459 warnings within pybind code.

Sound good to me.

@wjakob
Copy link
Member

wjakob commented Apr 6, 2017

Actually, on second thought: let's try to keep pybind11 clean with respect to these warnings. Other projects may choose to activate a similarly strict set of warnings, and it would be unfortunate if pybind11 causes issues in that context.

@jagerman
Copy link
Member

jagerman commented Apr 6, 2017

It won't cause external issues: the warnings are pushed at the beginning of pybind11.h and popped off at the end.

@wjakob
Copy link
Member

wjakob commented Apr 6, 2017

Right.. I was thinking of /wd...-style parameters, which would be problematic. Pushing via a pragma sounds all good to me.

@dean0x7d
Copy link
Member

dean0x7d commented Apr 6, 2017

I'd keep the warnings. C++ core guidelines / clang-tidy flag the same shadowing issues: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es12-do-not-reuse-names-in-nested-scopes

@jagerman
Copy link
Member

jagerman commented Apr 6, 2017

Fair enough.

@aldanor
Copy link
Member Author

aldanor commented Apr 28, 2017

Guys, any ideas on this? I'd be willing to finish it (if anything) but there's a few unclear points.

@jagerman
Copy link
Member

Assuming you mean your original post checklist, I vote for calling it enum34 (matching the Python backport name) if it's going to be distinct from enum_ and public.

But I vote for it being neither distinct nor public. Rather I think that enum_ should continue to be the single public interface, with py3_enum and old_enum (or whatever) as implementation details: enum_ would then map to a py3_enum when available, falling back transparently to old_enum if not available.

I'm not sure it's really worth it to worry about supporting additional definitions and attributes on the bound enum. (Or in other words, I'm perfectly happy for this to be a documented limitation, possibly to be revisited later if someone has a strong use case for it).

@dean0x7d
Copy link
Member

dean0x7d commented Apr 29, 2017

I agree with @jagerman: enum34 seems like an appropriate name for the public interface, but it would be even better if it was rolled into enum_ with a Python >= 3.4 ifdef-- if this can be done in a fully backward-compatible way. Note that py::arithmetic() would seem to switch the binding behavior between enum.Enum and enum.IntEnum.

@aldanor
Copy link
Member Author

aldanor commented Apr 30, 2017

If we roll it into one enum_, should it be compile-time or runtime? The first option being presumably easier.

@dean0x7d: if this can be done in a fully backward-compatible way

The biggest question re: backwards-compatibility here is what to do with .def() on enum types (and potentially all the other class_ stuff)?

Basically, declaration semantics are slightly different as enum34 needs to be "finalised" (currently, the type is fully recreated when you call .value()), so while this should work with the current enum_:

py::enum_<Foo>(m, "Foo").value("A", Foo::A).def("f", ...).value("B", Foo::B).def("g", ...);

-- I can't see how to make it work with enum34 implementation.

@dean0x7d
Copy link
Member

dean0x7d commented May 1, 2017

so while this should work with the current enum_:
py::enum_<Foo>(m, "Foo").value("A", Foo::A).def("f", ...).value("B", Foo::B);

That doesn't actually compile with the current enum_. .def returns class_& which doesn't have .value(). So mixing in class_ functions is already semi-broken and probably isn't there intentionally but just as a consequence of the implementation via class_ base.

I think the more important things for maintaining compatibility are: .export_values(), py::arithmetic and the behavior of the resulting types in Python. If the new implementation can appear simply as an optimization selected at compile-time, that would be great.

@aldanor
Copy link
Member Author

aldanor commented May 2, 2017

Hmm, seems like I've managed to make this work:

py3_enum<Foo>(m, "Foo").value(...).value(...).def(...).def(...)

-- however it required slight changes to class_ (namely, all public methods have been moved out to a separate CRTP). All class_<>::def_*() methods should work, I think (needs a few more tests), which brings it closer to the existing enum_<> behaviour. The first .def() will actually do a by-value conversion and return a by-value class_<> object, and all subsequent calls are lvalues with respect to that. I thought quite a bit about it but couldn't find any other sane way that wouldn't require manual copying and pasting enormous number of definitions.

If this is acceptable, I'll take a look at export_values(), py::arithmetic, renaming to enum34, and other compat stuff.

Re: py::arithmetic, I'd personally like it to be a default option because it really makes most sense in Python as IntEnum is specifically geared for ints whereas Enum is a more generic option, however this differs from the enum_<> default, so I guess we have to go with enum.Enum as default container?..

@jagerman
Copy link
Member

jagerman commented May 2, 2017

Is there a significant value in being able to define custom methods on an enum? It feels a bit like chasing backwards compatibility with a feature which was never really intended or advertised, which I think we'd be reasonably justified to just stop supporting given the complexity that it adds.

@aldanor
Copy link
Member Author

aldanor commented May 2, 2017

Well, idk, there's definitely some value. We actually have quite a few in our pybind11 codebase, it's quite handy -- mostly def_property_readonly, indicating some feature of an enum value, or whether it belong to a certain group/category ("is_this, is_that"). __str__ / __repr__ are also quite reasonable to overload. These are the first things that come to mind, at least.

@dean0x7d
Copy link
Member

dean0x7d commented May 3, 2017

If adding methods to enum_ is a desirable feature, I'd suggest making that official and documented first (preferably in a separate PR for clarity). I feel like the current undocumented way of adding methods is unfortunate because of the broken chaining. Making it explicit and documented would be nice. Something like:

py::enum_<Foo>(m, "Foo")
    .value("First", Foo::First)
    .value("Second", Foo::Second)
    .classify() // `class_` functions permitted after this point; no more `enum_`
    .def("__str__", [](Foo const&) { ... });

The enum_ implementation in this case would go from:

class enum_ : public class_ { ... };

to:

class enum_ {
public:
    class_& clasify() { return cls; }

private:
    class_ cls;
};

Using composition here instead of inheritance would simplify things overall since there would be no need for CRTP.

@aldanor
Copy link
Member Author

aldanor commented May 3, 2017

@dean0x7d That's definitely reasonable and involves less hacks. In fact, that's exactly what I did initially if you look at the commit history for this branch (the only difference was that the method was called .extend()). If we make enum_<> behave this way though, we'll break backwards-compat a bit so there's that.

Note that there are potential edge cases with your suggested implementation, like:

auto e = py::enum_<Foo>(m, "Foo")
    .value("First", Foo::First)
    .value("Second", Foo::Second);
auto c = e.classify()
    .def("__str__", [](Foo const&) { ... });
e.value("Third", Foo::Third);  // ???

To avoid this, I tried marking .extend() with &&, but not sure that's the best solution. Any thoughts?

P.S. .classify() sounds a bit like a machine-learning term; maybe into_class() (in Rust's lingo), or to_class()?

@aldanor
Copy link
Member Author

aldanor commented May 7, 2017

@dean0x7d @jagerman I've posted suggested changes to the legacy enum_<> in #837. If that's all good, I'll backtrack this branch and remove CRTP hacks so it has the same API.

@wjakob
Copy link
Member

wjakob commented Jun 10, 2019

I'll close this PR.: While wrapping Python 3 enums would be nice, it was never quite clear how to do this in a backwards-compatible way that preserves the semantics of the current enums. pybind11 now uses a less bloaty way of creating enums that addresses some of the flaws of how this worked in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enum_ QoL improvements and enum.IntEnum
4 participants