Further relax extract_sequence for the upcoming release 0.18 #2632

adamreichold · 2022-09-21T08:13:45Z

These are the two ~~less~~ more controversial parts, i.e. turning anything that can be iterated into a Vec whether it calls itself a sequence or not and also accepting str where the changelog entry gives a concrete hint on how to handle the str-as-Vec<String> edge case ergonomically as is common in Python API.

davidhewitt · 2022-09-21T10:20:22Z

So I'm still unsure about this. I think the removal of the str special-case handling could be included in 0.17.2 (it was added in 0.17.0). So that could be separated out too if you feel strongly that was incorrect.

The broader relax... is it too much? Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure. All opinions on this change would be welcome.

adamreichold · 2022-09-21T10:30:48Z

So I'm still unsure about this. I think the removal of the str special-case handling could be included in 0.17.2 (it was added in 0.17.0). So that could be separated out too if you feel strongly that was incorrect.

Personally, my primary motivation to change anything here is making NumPy users happy and that we already did. This follow-up is mainly about furthering the discussion on what our approach to type extraction should be.

I like the general approach of just requiring iterability and dislike the str special case. But I will certainly not throw a tantrum if we do not change either of these.

The broader relax... is it too much? Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure. All opinions on this change would be welcome.

I would also very much like a overview of the opinions of interested users and/or the other maintainers.

I am cc'ing the people already involved in the original issue trying to extract Vec from NumPy arrays: @mtreinish @ritchie46 @aganders3

adamreichold · 2022-09-21T10:33:57Z

Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure.

I have to admit that I consider that iter(dict) yields an iterator of the keys instead of the key-value pairs to be a historical Python edge case that is similar to the str-can-be-iterated-as-str thing. Getting a Vec of key-value pairs would definitely be useful IMHO but is ruled out the existing Python behaviour.

mejrs · 2022-09-21T11:53:21Z

I haven't really been keeping up with this sequence stuff, but...

Passing e.g. a dict will then get a Vec of the keys. Is that helpful?

...sounds like a big footgun. That's not what I'd expect when asking pyo3 to convert things into a Vec (rather, I'd expect a typeerror)

adamreichold · 2022-09-21T12:50:49Z

rather, I'd expect a typeerror

I think the usefulness of type errors is more nuanced here than for say impl PyTryFrom for PySequence. Constructing a PySequence where later method calls mostly fail because the underlying Python object is not really a sequence is arguably worse than an upfront type error. But in this case, the resulting Vec<T> is perfectly usable as it is a freshly constructed Vec<T> like any other so this reasoning does not apply.

What the type error does yield for FromPyObject is the ability to try another branch of an outer FromPyObject impl which may be a better fit for the underlying Python object. But the order in which these branches are tried is up to the Rust program by e.g. changing the order of enum variants, so one can already try to extract a PyDict before "falling back" to a Vec to handle this case. (Especially since one would have to implement this exactly the same way in pure Python, i.e. first check if the given value is a dict before trying something like list(val) because that works for dictionaries as well.)

EDIT: I did not spell that out above, but I think a top-level type error returned when calling into a Rust extension is not particularly useful because, well, the call didn't work then.

…:iter actually need the sequence protocol.

…Vec<T> was relaxed.

aganders3 · 2022-11-30T15:28:46Z

I noticed this is being discussed in #2785 and figured I'd give my 2¢.

This behavior makes sense to me as a mostly Python-brained developer, but I understand the controversy. It feels more ergonomic than having to call list(ob) in Python before passing into a function expecting a Vec (which is my understanding of a possible workaround).

My only nitpick (which I kind of brought up in #2615) is with the naming/locaton as extract_sequence would now be more about whether the object is iterable than if it's a sequence. This is tangential to the more important question of whether this is the behavior we want, though.

davidhewitt · 2022-12-23T08:59:13Z

I'm still undecided what the right thing to do is. It should be noted that what we do is currently almost the same as pybind11 does, which is a plus for ecosystem consistency in my opinion.

Arguments for:

Allowing any Python iterable has potential performance advantages by avoiding the intermediate Python collection.
Rust Vec has to be created anyway as part of this conversion, so requiring a concrete Python type also seems wasteful.

Arguments against:

It's a breaking change (would make many PyO3 libraries accept more types suddenly).
I think it's harder to take this away again later if we decide it was an error to do so. (We saw how much breakage we got just from not accepting numpy arrays by accident; imagine if we stopped accepting all iterables how many callsites could break.)
Semantically, Rust's Vec is roughly akin to Python Sequence, so the existing behaviour does seem correct from a philosophical angle.

I wonder if rather than changing the existing behaviour, it's better if we could add a type PyIterable<T> which can then have easy iteration access? Very similar to our existing PyIterator but with type safety.

(I've been wondering for a while about experimenting with a new crate which provides generic variants of existing PyO3 types, e.g. PyList<T>.)

adamreichold · 2022-12-23T09:16:06Z

Semantically, Rust's Vec is roughly akin to Python Sequence, so the existing behaviour does seem correct from a philosophical angle.

Personally, I think of FromPyObject::extract as constructing a Rust type based on a Python value rather than establishing type level equivalence between Rust and Python. I think that also fits with the result of that operation usually being independent of the Python value it originated from. Although we admittedly have it perform "double duty" for argument conversion purposes. From this point of view, <Vec as FromPyObject>::extract would be similar to Python's list(iterable) constructor.

I wonder if rather than changing the existing behaviour, it's better if we could add a type PyIterable which can then have easy iteration access? Very similar to our existing PyIterator but with type safety.

I think this is almost orthogonal to the question posed here, i.e. if I want to work with Vec<T> in the end, taking vals: &PyAny and calling let vals = vals.iter()?.map(|val| val.extract()).collect<PyResult<Vec<T>>>()?; would almost have the same ergonomics IMHO.

I'm still undecided what the right thing to do is.

All that being said, I think that we should drop the change if there is this much uncertainty attached to it. I fear that it takes up scarce review bandwidth that is needed elsewhere and the status quo certainly serves us insofar there is no stream of bug reports on this behaviour.

davidhewitt · 2022-12-26T14:47:37Z

Ok. I am always sorry to see a contribution not be merged, as there is always an underlying motivation as well as time invested.

Although we admittedly have it perform "double duty" for argument conversion purposes. From this point of view, <Vec as FromPyObject>::extract would be similar to Python's list(iterable) constructor.

I think you've hit the nail on the head here, and this perhaps suggests the avenue for future exploration another time. There are different levels of conversion which users can want on function inputs. The current PyO3 semantics are usually to accept the equivalent Python type - which I'll call "narrow" conversion range. The proposal here is more akin to "wide" conversion range. Implicit conversions like those asked about in #2482 also are flavours of "wide" conversions.

I would say pybind11 defaults to "wide" conversions and has a .noconvert() option to reduce to "narrow" mode.

It is easier for PyO3 users to use newtypes and #[pyo3(from_py_with = ...)] to modify conversion behaviour already, so I think there's less urgency to support something like .noconvert(). At the same time, these sorts of proposals do suggest there is a wish for more. We can definitely continue to be open to better designs to offer in the framework.

adamreichold force-pushed the relax-extract-sequence-pre-0.17 branch from 0e20011 to df933c5 Compare September 21, 2022 08:15

adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from c0e6c4b to 11043b3 Compare September 21, 2022 08:15

adamreichold mentioned this pull request Sep 21, 2022

RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

Closed

adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from 11043b3 to 5df7818 Compare September 21, 2022 09:07

Base automatically changed from relax-extract-sequence-pre-0.17 to main September 21, 2022 10:03

adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from 5df7818 to ed1469c Compare September 21, 2022 10:35

adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from ed1469c to f62163c Compare September 27, 2022 12:12

davidhewitt mentioned this pull request Oct 13, 2022

Automatically wrap str in a vec![] for Vec<&str> and Vec<String> #2500

Merged

adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from f62163c to 8f6cc3c Compare October 13, 2022 20:20

adamreichold mentioned this pull request Oct 14, 2022

Use a TypeError, rather than a ValueError, when refusing to treat a str as a Vec #2685

Merged

adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from 8f6cc3c to c30db7d Compare November 7, 2022 13:22

adamreichold added 3 commits November 9, 2022 08:53

Further relax extract_sequence as neither PyObject::len nor PyObject:…

4620842

…:iter actually need the sequence protocol.

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

18232ba

Add changelog entry detailing how much further impl FromPyObject for …

035af82

…Vec<T> was relaxed.

adamreichold force-pushed the relax-extract-sequence-post-0.18 branch from c30db7d to 035af82 Compare November 9, 2022 07:57

adamreichold mentioned this pull request Nov 27, 2022

0.18 release #2785

Closed

7 tasks

adamreichold closed this Dec 23, 2022

adamreichold deleted the relax-extract-sequence-post-0.18 branch December 23, 2022 09:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further relax extract_sequence for the upcoming release 0.18 #2632

Further relax extract_sequence for the upcoming release 0.18 #2632

adamreichold commented Sep 21, 2022 •

edited

Loading

davidhewitt commented Sep 21, 2022

adamreichold commented Sep 21, 2022

adamreichold commented Sep 21, 2022

mejrs commented Sep 21, 2022

adamreichold commented Sep 21, 2022 •

edited

Loading

aganders3 commented Nov 30, 2022

davidhewitt commented Dec 23, 2022 •

edited

Loading

adamreichold commented Dec 23, 2022

davidhewitt commented Dec 26, 2022

Further relax extract_sequence for the upcoming release 0.18 #2632

Further relax extract_sequence for the upcoming release 0.18 #2632

Conversation

adamreichold commented Sep 21, 2022 • edited Loading

davidhewitt commented Sep 21, 2022

adamreichold commented Sep 21, 2022

adamreichold commented Sep 21, 2022

mejrs commented Sep 21, 2022

adamreichold commented Sep 21, 2022 • edited Loading

aganders3 commented Nov 30, 2022

davidhewitt commented Dec 23, 2022 • edited Loading

adamreichold commented Dec 23, 2022

davidhewitt commented Dec 26, 2022

adamreichold commented Sep 21, 2022 •

edited

Loading

adamreichold commented Sep 21, 2022 •

edited

Loading

davidhewitt commented Dec 23, 2022 •

edited

Loading