Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further relax extract_sequence for the upcoming release 0.18 #2632

Closed
wants to merge 3 commits into from

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented Sep 21, 2022

These are the two less more controversial parts, i.e. turning anything that can be iterated into a Vec whether it calls itself a sequence or not and also accepting str where the changelog entry gives a concrete hint on how to handle the str-as-Vec<String> edge case ergonomically as is common in Python API.

Base automatically changed from relax-extract-sequence-pre-0.17 to main September 21, 2022 10:03
@davidhewitt
Copy link
Member

So I'm still unsure about this. I think the removal of the str special-case handling could be included in 0.17.2 (it was added in 0.17.0). So that could be separated out too if you feel strongly that was incorrect.

The broader relax... is it too much? Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure. All opinions on this change would be welcome.

@adamreichold
Copy link
Member Author

So I'm still unsure about this. I think the removal of the str special-case handling could be included in 0.17.2 (it was added in 0.17.0). So that could be separated out too if you feel strongly that was incorrect.

Personally, my primary motivation to change anything here is making NumPy users happy and that we already did. This follow-up is mainly about furthering the discussion on what our approach to type extraction should be.

I like the general approach of just requiring iterability and dislike the str special case. But I will certainly not throw a tantrum if we do not change either of these.

The broader relax... is it too much? Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure. All opinions on this change would be welcome.

I would also very much like a overview of the opinions of interested users and/or the other maintainers.

I am cc'ing the people already involved in the original issue trying to extract Vec from NumPy arrays: @mtreinish @ritchie46 @aganders3

@adamreichold
Copy link
Member Author

Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure.

I have to admit that I consider that iter(dict) yields an iterator of the keys instead of the key-value pairs to be a historical Python edge case that is similar to the str-can-be-iterated-as-str thing. Getting a Vec of key-value pairs would definitely be useful IMHO but is ruled out the existing Python behaviour.

@mejrs
Copy link
Member

mejrs commented Sep 21, 2022

I haven't really been keeping up with this sequence stuff, but...

Passing e.g. a dict will then get a Vec of the keys. Is that helpful?

...sounds like a big footgun. That's not what I'd expect when asking pyo3 to convert things into a Vec (rather, I'd expect a typeerror)

@adamreichold
Copy link
Member Author

adamreichold commented Sep 21, 2022

rather, I'd expect a typeerror

I think the usefulness of type errors is more nuanced here than for say impl PyTryFrom for PySequence. Constructing a PySequence where later method calls mostly fail because the underlying Python object is not really a sequence is arguably worse than an upfront type error. But in this case, the resulting Vec<T> is perfectly usable as it is a freshly constructed Vec<T> like any other so this reasoning does not apply.

What the type error does yield for FromPyObject is the ability to try another branch of an outer FromPyObject impl which may be a better fit for the underlying Python object. But the order in which these branches are tried is up to the Rust program by e.g. changing the order of enum variants, so one can already try to extract a PyDict before "falling back" to a Vec to handle this case. (Especially since one would have to implement this exactly the same way in pure Python, i.e. first check if the given value is a dict before trying something like list(val) because that works for dictionaries as well.)

EDIT: I did not spell that out above, but I think a top-level type error returned when calling into a Rust extension is not particularly useful because, well, the call didn't work then.

@adamreichold adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from c30db7d to 035af82 Compare November 9, 2022 07:57
@adamreichold adamreichold mentioned this pull request Nov 27, 2022
7 tasks
@aganders3
Copy link
Contributor

I noticed this is being discussed in #2785 and figured I'd give my 2¢.

This behavior makes sense to me as a mostly Python-brained developer, but I understand the controversy. It feels more ergonomic than having to call list(ob) in Python before passing into a function expecting a Vec (which is my understanding of a possible workaround).

My only nitpick (which I kind of brought up in #2615) is with the naming/locaton as extract_sequence would now be more about whether the object is iterable than if it's a sequence. This is tangential to the more important question of whether this is the behavior we want, though.

@davidhewitt
Copy link
Member

davidhewitt commented Dec 23, 2022

I'm still undecided what the right thing to do is. It should be noted that what we do is currently almost the same as pybind11 does, which is a plus for ecosystem consistency in my opinion.

Arguments for:

  • Allowing any Python iterable has potential performance advantages by avoiding the intermediate Python collection.
  • Rust Vec has to be created anyway as part of this conversion, so requiring a concrete Python type also seems wasteful.

Arguments against:

  • It's a breaking change (would make many PyO3 libraries accept more types suddenly).
  • I think it's harder to take this away again later if we decide it was an error to do so. (We saw how much breakage we got just from not accepting numpy arrays by accident; imagine if we stopped accepting all iterables how many callsites could break.)
  • Semantically, Rust's Vec is roughly akin to Python Sequence, so the existing behaviour does seem correct from a philosophical angle.

I wonder if rather than changing the existing behaviour, it's better if we could add a type PyIterable<T> which can then have easy iteration access? Very similar to our existing PyIterator but with type safety.

(I've been wondering for a while about experimenting with a new crate which provides generic variants of existing PyO3 types, e.g. PyList<T>.)

@adamreichold
Copy link
Member Author

Semantically, Rust's Vec is roughly akin to Python Sequence, so the existing behaviour does seem correct from a philosophical angle.

Personally, I think of FromPyObject::extract as constructing a Rust type based on a Python value rather than establishing type level equivalence between Rust and Python. I think that also fits with the result of that operation usually being independent of the Python value it originated from. Although we admittedly have it perform "double duty" for argument conversion purposes. From this point of view, <Vec as FromPyObject>::extract would be similar to Python's list(iterable) constructor.

I wonder if rather than changing the existing behaviour, it's better if we could add a type PyIterable which can then have easy iteration access? Very similar to our existing PyIterator but with type safety.

I think this is almost orthogonal to the question posed here, i.e. if I want to work with Vec<T> in the end, taking vals: &PyAny and calling let vals = vals.iter()?.map(|val| val.extract()).collect<PyResult<Vec<T>>>()?; would almost have the same ergonomics IMHO.

I'm still undecided what the right thing to do is.

All that being said, I think that we should drop the change if there is this much uncertainty attached to it. I fear that it takes up scarce review bandwidth that is needed elsewhere and the status quo certainly serves us insofar there is no stream of bug reports on this behaviour.

@adamreichold adamreichold deleted the relax-extract-sequence-post-0.18 branch December 23, 2022 09:16
@davidhewitt
Copy link
Member

Ok. I am always sorry to see a contribution not be merged, as there is always an underlying motivation as well as time invested.

Although we admittedly have it perform "double duty" for argument conversion purposes. From this point of view, <Vec as FromPyObject>::extract would be similar to Python's list(iterable) constructor.

I think you've hit the nail on the head here, and this perhaps suggests the avenue for future exploration another time. There are different levels of conversion which users can want on function inputs. The current PyO3 semantics are usually to accept the equivalent Python type - which I'll call "narrow" conversion range. The proposal here is more akin to "wide" conversion range. Implicit conversions like those asked about in #2482 also are flavours of "wide" conversions.

I would say pybind11 defaults to "wide" conversions and has a .noconvert() option to reduce to "narrow" mode.

It is easier for PyO3 users to use newtypes and #[pyo3(from_py_with = ...)] to modify conversion behaviour already, so I think there's less urgency to support something like .noconvert(). At the same time, these sorts of proposals do suggest there is a wish for more. We can definitely continue to be open to better designs to offer in the framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants