Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

Closed
wants to merge 3 commits into from

Conversation

adamreichold
Copy link
Member

I am somewhat at a loss on how to properly test this without adding NumPy as a Python-side dependency and would be grateful for any advice here? (Maybe writing a custom Python class that implements only PySequence_Length and PySequence_Length and PyObject_GetIter?)

I am also not sure whether PyMapping needs a similar treatment but the conversions for HashMap seem to go through PyDict instead?

@adamreichold
Copy link
Member Author

adamreichold commented Sep 13, 2022

After staring at this for a bit longer, I noticed that neither PyObject::len nor PyObject::iter are actually part of PySequence and we could be even more lenient by extracting a Vec from everything that can be turned into an iterator at all. I am not sure if this is covering more types than we actually want it to cover though.

@adamreichold adamreichold linked an issue Sep 13, 2022 that may be closed by this pull request
Base automatically changed from xtask-drop-workspace to main September 15, 2022 05:12
@adamreichold
Copy link
Member Author

I am somewhat at a loss on how to properly test this without adding NumPy as a Python-side dependency and would be grateful for any advice here?

I think a nice and simple way could be to test this in the rust-numpy repository where I usually have a draft PR targeting the current Git version of PyO3 sitting around anyway, c.f. PyO3/rust-numpy#353 which adds a test case of extracting a Vec from a one-dimensional NumPy array which I could merge after this or a similar fix is released on crates.io.

This of course still leaves the question open whether we want extraction to be as relaxed as this PR would make, i.e. not checking for sequences at all but using anything that can be turned into an iterator?

@davidhewitt
Copy link
Member

Sorry for the long delay in reply. Been swamped IRL and just not had a chance to get my head into this one.

It seems ok to add numpy to the noxfile for the pytests crate here, perhaps? I think it's better to add the test in this repo rather than rust-numpy if possible, else we could merge stuff which breaks this again and only notice on future rust-numpy CI runs.

As for relaxing the type check this far... I'm a little unsure, though in practice this is likely to match what the previous implementation did? I am tempted to look at what pybind11 does and see if it makes sense for us to be similar.

@adamreichold
Copy link
Member Author

As for relaxing the type check this far... I'm a little unsure, though in practice this is likely to match what the previous implementation did? I am tempted to look at what pybind11 does and see if it makes sense for us to be similar.

From a quick glance, they are doing what we did before 0.17.0, i.e. they check for a sequence (but not str or bytes) and then iterate to fill a std::vector, c.f. https://github.com/pybind/pybind11/blob/1874f8fa8767179ab8f8c645a6e8c8c4e4a7c486/include/pybind11/stl.h#L194 and https://github.com/pybind/pybind11/blob/1874f8fa8767179ab8f8c645a6e8c8c4e4a7c486/include/pybind11/stl.h#L145 where checking for a sequence seems to be based on PySequence_Check, i.e. https://github.com/pybind/pybind11/blob/1874f8fa8767179ab8f8c645a6e8c8c4e4a7c486/include/pybind11/pytypes.h#L1985

So this would correspond to the first commit pushed here (modulo the extra check for bytes which we lack). So should I drop the second commit and try to add a pytest based on NumPy? And maybe also add the check to not extract bytes into a Vec?

Personally, I am sort of endeared with the generic version with the reasoning that the Rust is the one that fixes the types and the Python side is the malleable one that tries to fit in.

Meaning if the Rust side asks for a Vec, anything on the Python side that can be turned into this at the same level of efficiency should be fair game. If one really wants to have some sort of precedence here (like handling list[char]/Vec<char> and str/PyString differently), I would try to make this explicit on the Rust side using an enum like

#[derive(FromPyObject)]
enum StringOrChars {
  String(PyString),
  Chars(Vec<char>),
}

which should work even if Vec<char> could be extracted from PyString if I understand things correctly.

At least I don't really have an example I mind where making the extraction fail on the Rust side would make something possible that would not be if it succeeded. But then again, that explicit check for str probably had a rationale that I just cannot remember ATM.

@adamreichold
Copy link
Member Author

adamreichold commented Sep 18, 2022

But then again, that explicit check for str probably had a rationale that I just cannot remember ATM.

So by now, I found #2342 and #2500 and I think even here I would argue that the idiomatic Python solution

if isinstance(strings, str):
    strings = [strings]

could be turned into

#[derive(FromPyObject)]
enum OneOrMany<'a> {
  One(&'a PyString),
  Many(Vec<&'a PyString>),
}

impl Into<Vec<&'a PyString>> for OneOrMany {
  fn into(self) -> Vec<&'a PyString> {
    match self {
      Self::One(one) => vec![one],
      Self::Many(many) => many,
    }
  }
}

@adamreichold
Copy link
Member Author

So should I drop the second commit and try to add a pytest based on NumPy?

Added the pytest and verified that both commits would make it green.

@adamreichold adamreichold force-pushed the relax-extract-sequence branch 2 times, most recently from 56022a2 to cdff4bb Compare September 18, 2022 19:27
@adamreichold
Copy link
Member Author

So should I drop the second commit and try to add a pytest based on NumPy?

Added the pytest and verified that both commits would make it green.

There are no binary wheels for NumPy at least on Windows/PyPy, so I restricted that test case to Linux/CPython since one combination should be enough to test this particular functionality.

This allows us to handle types like one-dimensional NumPy arrays which implement
just enough of the sequence protocol to support the extract_sequence function
but are not an instance of the sequence ABC.
@davidhewitt
Copy link
Member

Sorry to have been so slow to reply here. Been busy IRL and needed some time to mull this over.

Personally, I am sort of endeared with the generic version with the reasoning that the Rust is the one that fixes the types and the Python side is the malleable one that tries to fit in.

I think I'm coming round to this argument too. I am flip-flopping on whether #2500 was a mistake. If we revert it, I guess we have to wait until 0.18?

I was thinking about releasing 0.17.2, so should we split this fix in two pieces:

That way I can cherry pick just the first change for 0.17.2.

I'm still a bit murky on whether that's definitely correct, as it'd be nice to get the HashMap and Vec conversions to have "equivalent" behaviour, however in the meanwhile I think going back to pre-0.17 behaviour might be best for users before we make a more deliberate change?

I think the main thing we lost from the 0.17 change was accepting np.ndarray containing objects? Was that so bad? An alternative, just to put it out there, could be to defend the check to be based on the ABC to be correct, and encourage users to have an enum or since kind of new type if they want to accept numpy arrays.

@adamreichold
Copy link
Member Author

I was thinking about releasing 0.17.2, so should we split this fix in two pieces:

Since the change is already split into multiple commits here, I think we could already cherry-pick the second commit which basically restores pre-0.17 behaviour for impl FromPyObject for Vec<T> and does nothing else.

The second commit then further relaxes impl FromPyObject for Vec<T> to work with anything that can be turned into an iterator (whether it passes PySequence_Check or not). It however does touch upon the PyString special case introduced by #2500.

To make things less messy, I will open up two new PR, one containing the test and the first change. And another one containing the second commit and a removal of the PyString special case. Both with appropriate changelog entries.

@adamreichold adamreichold deleted the relax-extract-sequence branch September 21, 2022 08:14
@adamreichold
Copy link
Member Author

These two PR, #2631 and #2632, are now pending review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

val ndarray object cannot be converted to Sequence
2 participants