Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP [internal] Allow going from Rust-CPython pointers to PyO3 pyclass structs #12451

Closed

Conversation

Eric-Arellano
Copy link
Contributor

This will allow us to use the migration strategy suggested by @stuhood at https://github.com/pantsbuild/pants/pull/12437/files#r678512355: we can move the definition of types like AddPrefix from Python into the PyO3 native extension, then have the rust-cpython engine/ crate parse cpython::PyObjects into values like our custom PyAddPrefix and PyPathGlobs.

The benefits are that we can move our parsing code out of nodes.rs and intrinsics.rs into dedicated FFI files, and "would avoid copying from a Python object into a Rust struct by directly creating the relevant types in the constructor."

@Eric-Arellano
Copy link
Contributor Author

This fails with a lifetime issue I'm struggling to grok. Anything obvious stick out to you @stuhood, @mejrs, @davidhewitt ?

❯ ./cargo check
    Checking engine v0.0.1 (/Users/ericarellano/code/pants/src/rust/engine)
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'p` due to conflicting requirements
   --> src/externs/mod.rs:427:5
    |
427 |     pyo3::PyAny::from_owned_ptr(py, value.as_ptr() as *mut pyo3::ffi::PyObject)
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the body at 426:41...
   --> src/externs/mod.rs:426:41
    |
426 |     let pyo3_val = pyo3::Python::with_gil(|py| unsafe {
    |  _________________________________________^
427 | |     pyo3::PyAny::from_owned_ptr(py, value.as_ptr() as *mut pyo3::ffi::PyObject)
428 | |   });
    | |___^
note: ...so that the expression is assignable
   --> src/externs/mod.rs:427:33
    |
427 |     pyo3::PyAny::from_owned_ptr(py, value.as_ptr() as *mut pyo3::ffi::PyObject)
    |                                 ^^
    = note: expected `pyo3::Python<'_>`
               found `pyo3::Python<'_>`
note: but, the lifetime must be valid for the anonymous lifetime defined on the function body at 425:48...
   --> src/externs/mod.rs:425:48
    |
425 | pub(crate) fn from_rust_cpython_to_pyo3(value: &PyObject) -> &pyo3::PyAny {
    |                                                ^^^^^^^^^
note: ...so that reference does not outlive borrowed content
   --> src/externs/mod.rs:431:3
    |
431 |   pyo3_val
    |   ^^^^^^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0495`.

@Eric-Arellano
Copy link
Contributor Author

Oh hm..(bare with me, still internalizing Rust). I see now this implementation:

    unsafe fn from_owned_ptr(py: Python<'p>, ptr: *mut ffi::PyObject) -> &'p Self {
        Self::from_owned_ptr_or_panic(py, ptr)
    }

If I read correctly, that means the returned object shares the same lifetime as the Python marker type, meaning the GIL will be held the entire time?

Here, we'd need to take Python as an arg to from_rust_cpython_to_pyo3, so that its lifetime outlives this function?

@mejrs
Copy link

mejrs commented Jul 29, 2021

The lifetime of the &PyAny is bound to that of the py token which originates inside the function. So you can never return the &PyAny from the function. There are some ways around this though.

But you should probably rethink that function - for example mem::forgetting a reference does nothing, because references don't do anything when dropped. It won't leave the reference count incremented, if that is what you want.

@stuhood
Copy link
Member

stuhood commented Jul 29, 2021

But you should probably rethink that function - for example mem::forgetting a reference does nothing, because references don't do anything when dropped. It won't leave the reference count incremented, if that is what you want.

Mmm. Can skip the mem::forget by using from_borrowed_ptr instead of from_owned_ptr here.

@@ -420,3 +421,12 @@ pub enum GeneratorResponse {
Get(Get),
GetMulti(Vec<Get>),
}

pub(crate) fn from_rust_cpython_to_pyo3(value: &PyObject) -> &pyo3::PyAny {
Copy link
Member

@stuhood stuhood Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @mejrs said, the &pyo3::PyAny is only valid while the GIL is held: the simplest option would be to return an owned PyAny (or whatever the pyo3 equivalent is) by cloning it.

More complicated would be to continue to return a reference, but require that the GIL is already held before calling the function by accepting the py token as an argument: then the signature might look something like:

pub(crate) fn from_rust_cpython_to_pyo3<'p, 'o>(py: Python<'p>, value: &'o PyObject) -> &'p pyo3::PyAny {
  ..
}

Which of those is preferable depends on whether the GIL is already likely to be held at callsites? But I'm also not familiar enough with pyo3's API to know.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owned PyAny is currently bundled up in a smart pointer Py<PyAny>, which also doesn't have a lifetime tied to the GIL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I would probably go with &PyAny and require the GIL to be held at callsites. Python::with_gil had some performance overhead so it seems undesirable to do it unconditionally inside a low level conversion function like this. If you took the GIL token as an argument, you could do a bulk conversion of many objects without having to acquire and release the GIL for each one.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor Author

Will reopen when I have time to pick this up.

@Eric-Arellano Eric-Arellano deleted the rust-cpython-to-pyo3 branch August 31, 2021 00:15
@Eric-Arellano
Copy link
Contributor Author

I set up a simplified repository at https://github.com/Eric-Arellano/rust-cpython-pyo3 and got the code to compile, but PyO3 gives a TypeError when trying to do this:

    pyo3_py_person = pyo3_extension.PyPerson("some user", 23)
    cpython_greeting = cpython_extension.greet_py_person(pyo3_py_person)
    print(cpython_greeting)
TypeError("'PyPerson' object cannot be converted to 'PyPerson'"

--

I also am seeing weirdness from doing the conversion entirely within PyO3:

fn from_pyo3_pointer_to_pyo3<'py>(value: &PyAny, py: Python<'py>) -> &'py PyAny {
    // NB: Gives TypeError when using `from_borrowed_ptr` instead of `from_owned_ptr`.
    unsafe { pyo3::PyAny::from_owned_ptr(py, value.as_ptr()) }
}
    pyo3_py_person = pyo3_extension.PyPerson("some user", 23)
    pyo3_greeting = pyo3_extension.greet_py_person(pyo3_py_person)
    print(pyo3_greeting)
    print(pyo3_py_person)

Rather than rendering the PyPerson repr, it renders 'write'. This isn't very surprising. I'm using from_owned_ptr, so it makes sense the object becomes invalid after calling the function.

--

This all leads me to wonder if this approach is viable.

@davidhewitt
Copy link

From a quick look I see two things wrong with the rust-cpython-pyo3 example repository you link:

  • unsafe { pyo3::PyAny::from_owned_ptr(py, value.as_ptr()) } will create a double-free; you should use value.into_ptr() instead or go back to using from_borrowed_ptr as you were before.
  • You're defining a #[pyclass] in a lib crate and then using that in two other final extension-module crates. Unfortunately this will create two separate instantiations of that #[pyclass], because of Rust's static linking model. This is why you see the confusing TypeError - there really are two different PyPerson classes created. (This is basically Sharing pyclasses between multiple Rust packages PyO3/pyo3#1444).

@mejrs
Copy link

mejrs commented Nov 7, 2021

I'm not sure this approach is worthwhile. A gradual transition where you keep everything working at all times sounds like a lot of effort. I would sooner do a big refactor where you change over all at once, fix the errors and sort out any bugs you find. If you're worried, write more tests before you start.

Eric-Arellano added a commit that referenced this pull request Nov 7, 2021
This switches from several free functions in `externs/mod.rs` to instead implement methods on `TypeId`, `Key`, and `Value`. That's more idiomatic. 

The new methods also tend to take `py: Python` now, rather than force-acquiring the GIL. This allows sharing the GIL across several function calls, which the PyO3 maintainers shared can be much more efficient to avoid overhead: #12451 (comment)

Finally, this deduplicates when `Debug` and `Display` had the same implementations.

[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor Author

Thank you both @mejrs and @davidhewitt! I agree. I changed my focus this weekend to instead refactor our rust-cpython code so a) it's easier to port, and b) I understand it 😄

Eric-Arellano added a commit that referenced this pull request Nov 7, 2021
…13515)

The PyO3 maintainers recommended that our low-level helpers request `Python` (a lock for the GIL) as an argument, rather than acquiring it directly: #12451 (comment). This is because acquiring the GIL has a non-trivial overhead, and it allows callers to batch the operation. Indeed, this PR has two `map` operations where we now only acquire the GIL once, rather than n times.

This also simplifies our handling of `TypeError`s for less verbose code. As explained in #13515 (comment), MyPy already will handle these issues and they're also not mission-critical. Not worth the boilerplate to duplicate MyPy.
@stuhood
Copy link
Member

stuhood commented Nov 8, 2021

@davidhewitt: If PyO3/pyo3#1444 is unavoidable at the moment, is there any way to expose both Py03 types/methods and rust-cpython out of the same module, such that we could build a single .so file containing both libraries? Maybe by getting a pointer to the module from either Py03 or rust-cpython before it is finalized, and then passing it to the other to define more types in?

I'm not sure this approach is worthwhile. A gradual transition where you keep everything working at all times sounds like a lot of effort. I would sooner do a big refactor where you change over all at once, fix the errors and sort out any bugs you find. If you're worried, write more tests before you start.

Both approaches are tons of effort, but if the incremental cost can be lowered sufficiently, it might be worth it at our size. It allows for casual pushing of the problem here and there, and multiple contributors... rather than huge challenging-to-maintain branches.

@Eric-Arellano
Copy link
Contributor Author

rather than huge challenging-to-maintain branches.

So far, #13526 is going a lot better than expected! It helps that PyO3 and Rust-Cpython share so many APIs. I suspect this port will be much simpler than when you went from CFFI to Rust-Cpython.

Where the port has been causing issues, I'm trying to split out prework refactor PRs to simplify Rust-CPython.

(TBD how painful it will be to port the pyclasses.)

@davidhewitt
Copy link

is there any way to expose both Py03 types/methods and rust-cpython out of the same module, such that we could build a single .so file containing both libraries? Maybe by getting a pointer to the module from either Py03 or rust-cpython before it is finalized, and then passing it to the other to define more types in?

This would probably work. Ultimately a PyModule is just a Python module represented by a pointer, even if the Rust types are different.

Note that the traits being incompatible between the two libraries would mean that the two sets of types probably would not be usable in each other's macros.

Eric-Arellano added a commit that referenced this pull request Nov 14, 2021
See #12110 for the original motivation.

## Why migrate

Beyond the APIs requiring much less boilerplate and being more ergonomic (e.g. the `.call0()` and `call1()` variants of `.call()`), I think the biggest benefit is clearer modeling for when the GIL is held.

With PyO3, the two [core types](https://pyo3.rs/v0.15.0/types.html) are `&PyAny` and `PyObject` (aka `Py<PyAny`). `&PyAny` is always a reference with the same lifetime as the `Python` token, which represents when the GIL is used. Whenever you want to do Python operations in Rust, like `.getattr()`, you use `&PyAny`. `PyObject` and any `Py<T>` are GIL-independent.

In contrast, rust-cpython only had `PyObject` and `&PyObject`. It passed around `Python` as an argument to any Python functions like `.getattr()`.

--

An additional benefit is that using proc macros instead of declarative macros means PyCharm no longer hangs for me when working in the `engine/` crate! PyCharm does much better w/ proc macros, and things feel zippy now like autocomplete working.

## Migration strategy

I tried originally doing an incremental strategy, most recently with #12451, which proved technically infeasible.

This PR completely swaps Rust-Cpython with PyO3, but tries to minimize the diff. It only makes changes when it was too difficult to get working with PyO3. For example, `interface.rs` is far too big, but this keeps the same organization as before.

For now, we still have the `native_engine_pyo3` extension along with `native_engine`. That will be consolidated in a followup.

## Benchmark

Before:
```
❯ hyperfine -w 1 -r 10 './pants --no-pantsd list ::'
Benchmark #1: ./pants --no-pantsd list ::
  Time (mean ± σ):      2.170 s ±  0.025 s    [User: 1.786 s, System: 0.303 s]
  Range (min … max):    2.142 s …  2.227 s    10 runs
```

After:

```
❯ hyperfine -w 1 -r 10 './pants --no-pantsd list ::'
Benchmark #1: ./pants --no-pantsd list ::
  Time (mean ± σ):      2.193 s ±  0.060 s    [User: 1.798 s, System: 0.292 s]
  Range (min … max):    2.144 s …  2.348 s    10 runs
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants