-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about multithreading and asyncio #1274
Comments
Hi @awestlake87, sorry for the slow reply. This is a complicated topic indeed. I've seen a similar error in the past but not had time to investigate it. There's an upstream issue which looks relevant: https://bugs.python.org/issue26693 I have a theory that the error is probably related to importing We might be able to solve this by making it possible to spin up and down a sub-interpreter in a test. Or maybe it's something that can easily be fixed upstream with a bit of willing. This is all just speculating though... |
No worries, I've been able to work around it in the meantime. I think you're right about it having something to do with the main thread. By default cargo test is running multiple threads (might be 12 on my computer), but scaling it back down to 1 with At the moment, I don't personally import I'll try this out by making a binary that imports If this works, it seems straightforward enough to add a As for the sub-interpreter idea, I hadn't considered using a sub-interpreter for my purposes before, but if it could help solve some of these issues for testing purposes, I'd be willing to try it out. Am I correct in assuming that this isn't currently possible with pyo3? If not, how complicated would it be to implement? I'm not really familiar with embedding python in C/C++ and I haven't dug too far into the pyo3 code, but I'd be willing to help out if it's not too complicated. |
Ok, I just found out that I can actually override the default test harness to do some main thread initialization, so I'll try going that route and see if it fixes my issue without losing parallel testing. I also read up a bit on the sub-interpreters and what I saw was worrying. I don't know if the issues surrounding the proposal have been addressed since the posts I saw, but it looked like the community was fairly divided on it. Some |
On top of that, pyo3 would need quite a few changes to support it. We're not supposed to have any shared state between sub-interpreters - but at the moment we cache our type objects in Let me know if the main thread initialization fixes the problem - would be very curious to hear about that! |
Well, unfortunately it seems like I was too quick to call it fixed. It seems like it just changed the timing enough that it wasn't happening for that test. It started happening again as soon as I made some unrelated tweaks to the code. I did try replacing the default test harness to be certain that I was importing Idk why it stopped happening when I initially scaled it back to |
What i would recommend doing when interacting with both async Rust and async python is use tokio's new background event loop feature and keep python as the main thread regardless of what you do and make use of One thing to point out in your original message; you're first error is on 3.8 the second is on 3.6 so seems to depend on what python version you're on. |
Oh my bad, I should clarify. I have python 3.8 on my host system (Ubuntu 20.04), but I usually run my tests inside On Ubuntu 20.04Most of the time I get this error:
It's rarer now, but I do sometimes see this:
Occasionally it just passes. In Tensorflow Docker ContainersMost of the time it's this error:
Sometimes this one:
I haven't seen it pass inside the container recently. The third error I mentioned in the original issue is one I don't see anymore. I'm assuming that it could still happen, but it's hard to tell. The timing of the application seems to affect which error shows up most, and the timing varies depending on what else I'm doing on my computer, tweaks I'm making to the application, and whether it runs inside the container or not. |
Right now I'm pretty tied to the For context, my project originally had a python websocket server that works with tensorflow/numpy to run some ML tasks. My websocket client was written in rust and is used in a larger WASM + JS web app. As my websocket protocol got more complex, I thought it'd be beneficial to have a symmetric implementation for the client / server because I was duplicating a lot of protocol functionality in Rust and Python. So I started replacing bits and pieces of the Python server with Rust with No matter what, I'll still need to use Python for some of this stuff, so the goal isn't to replace the Python code entirely, it's just to make it so that my client and server share more code and evolve together. As I see it, I have a few options, but I'm not sure which ones you guys would recommend:
|
@awestlake87 I just experimented with a solution inside pyo3 which might solve this - I create a new thread internally which becomes the "initialization" (and finalization) thread for the Python interpreter. Can you please give this a try in your
If it solves your problem, I'll open it up as a PR to merge into pyo3 master. |
Sure! I'll have time to try it later this evening. |
I tried that branch out, but unfortunately it still gave me the same error. It didn't seem to act any differently. |
Hmm. Were you still using the main thread initialization? It shouldn't be needed with the patch I built. Is there any chance you could share a (stripped down if needed) copy of your code which still exhibits the problem, so I could play with it? |
It was using the default test harness, so I wasn't using the main thread initialization. I agree though, making a stripped down example is probably our best bet at this point. I'm hoping that I can put one together today, but I had a lot of trouble replicating the problem last time I tried, so it could be a few days (especially with the holidays). |
Alright, it took a little while, but here you go: https://github.com/awestlake87/pyo3-issue. The error seems a bit touchy, which is kinda expected from race conditions, so hopefully you see it too. I see it just about every time I run this example. But what's worrying is that it seems like when I make certain tweaks to the code, the problem still exists, but is very hard to replicate. For example, there's a block of code in the task::spawn(async move {
loop {
Python::with_gil(|py| py.check_signals().map_err(dump_err(py)).unwrap());
task::sleep(Duration::from_millis(5)).await;
}
}); If I remove this block, one of my pyo3 server test suites can run 1000 times without failing, but my other test suite ran about 5 times before it failed. Edit: |
Thanks for the repro; sorry it took me a little while to find a chance to investigate. I had a play with it this morning and didn't come up with a definitive solution, though here's some things I did note:
So I'm confident my branch fixes part of the problem (the |
No worries, I'm not really in a rush to get this fixed. I guess I can't say for sure whether I was still getting the I'm a bit stumped on this as well, plus I'm getting other issues related to the signal handlers not being called on the main thread. I'm tempted to try the background executor that @ChillFish8 suggested and just surrender the main thread to Python. Python really seems to need control over the main thread in order to finalize and handle signals like CTRL-C properly. Unfortunately, this means I'll probably have to write my own custom test harness since I'll have to handle the background executor initialization before the tests run. If I find a fairly generic solution, I'll link it to this thread. I'm kinda curious whether or not this issue could be solved with a custom Rust impl of the Python asyncio event loop. I'm assuming this is probably necessary for first class pyo3 async/await integration anyway (although I'm definitely not certain). It would definitely be an undertaking though. At the very least it might be nice to expose some |
I've done a similar system like that before but it tends to be incredibly frustrating. Made easier now with the recent tokio changes though. My recommendation would be give python the main thread and just accept that it won't surrender it, you can start a background runtime with tokio using something like Once Cell or lazy static which will give you a global handle for spawning tasks to. My personal recommendation is using the single threaded scheduler becuase you can combine a local set with the runtime giving you the ability to aquire the gil and still spawn futures because you have 'Send but this will change depending on what you need it for. Spawning rust tasks atleast with async python I recommend just creating a future with the loop and returning a clone of that giving you a handler to be able to set the result etc... From the control of rust when using asyncio.EventLoop.call_soon_threadsafe() |
Thanks for the tips @ChillFish8! I figured things might get a bit weird with the GIL, so I'll look into the single-threaded scheduler. I've been getting by so far with the async def rust_coroutine_glue(coroutine, future):
try:
result = await coroutine
future.set_result(result)
except Exception as e:
traceback.print_exc(file=sys.stdout)
raise e where the #[pyclass]
struct PyFuture {
tx: Option<oneshot::Sender<PyObject>>,
}
#[pymethods]
impl PyFuture {
fn set_result(&mut self, py: Python, result: &PyAny) -> PyResult<()> {
let result = result.into();
py.allow_threads(move || {
self.tx.take().unwrap().send(result).unwrap();
Ok(())
})
}
} Awaiting a rust coroutine from python is a bit easier since I can just use a From your comment, it looks like you're doing something similar, so I think we're thinking along the same lines. Is your code hosted publicly somewhere? |
I dont have a direct public set of code for this atm but i can give the follow set of stuff: Runtime building use pyo3::prelude::*;
use lazy_static::lazy_static;
use once_cell::sync::OnceCell;
use tokio::runtime::{Builder, Runtime};
use std::thread;
// Example of Multi threaded runtime
lazy_static! {
static ref MULTI_RUNTIME: Runtime = {
Builder::new_multi_thread()
.enable_all()
.build()
.expect("Couldn't build the runtime")
};
}
// Example of a single threaded background worker
lazy_static! {
static ref SINGLE_RUNTIME: Runtime = {
Builder::new_current_thread()
.enable_all()
.build()
.expect("Couldn't build the runtime")
};
}
// For both runtimes they need to be started with a block_on in a new thread.
#[pyfunction]
fn setup_rt() {
thread::spawn(move || {
MULTI_RUNTIME.block_on(futures::future::pending::<()>());
});
thread::spawn(move || {
SINGLE_RUNTIME.block_on(futures::future::pending::<()>());
});
} Useful helpers // Some really useful helpers you want in the global scope.
// This setup() function can just be a internal call with
// the pointers to the callbacks passed which saves alot of
// calls and hassle trying to get them from rust imo.
static EVENT_LOOP: OnceCell<PyObject> = OnceCell::new();
static CALL_SOON: OnceCell<PyObject> = OnceCell::new();
static CREATE_TASK: OnceCell<PyObject> = OnceCell::new();
static CREATE_FUTURE: OnceCell<PyObject> = OnceCell::new();
#[pyfunction]
fn setup(
event_loop: PyObject,
call_soon_thread_safe: PyObject, // asyncio.EventLoop.call_soon_threadsafe
call_coroutine_thread_safe: PyObject, // asyncio.call_coroutine_threadsafe
create_future: PyObject, // asyncio.EventLoop.create_future
) {
let _ = EVENT_LOOP.get_or_init(|| event_loop);
let _ = CALL_SOON.get_or_init(|| call_soon_thread_safe);
let _ = CREATE_TASK.get_or_init(|| call_coroutine_thread_safe);
let _ = CREATE_FUTURE.get_or_init(|| create_future);
} Examples // Our completed callback when we make our python tasks
#[pyclass]
struct EventCompleter {
s: async_channel::Sender<PyObject>,
}
#[pymethods]
impl EventCompleter {
#[call]
#[args(val)]
fn __call__(&self, val: PyObject) -> PyResult<()> {
let _ = self.s.try_send(val);
Ok(())
}
}
// Some example functions
fn set_fut_result(fut: PyObject, res: PyObject) -> PyResult<()> {
Python::with_gil(|py: Python| -> PyResult<()> {
let sr = fut.getattr(py, "set_result")?;
// Would recommend not using the get_unchecked method
// if you do not have a sure fire way of confirming that it will
// guarantee being initialised. Im just doing it for the ease of
// demonstration.
let cb = unsafe { CALL_SOON.get_unchecked() };
let _ = cb.call1(py, (sr, res))?;
Ok(())
})
}
async fn wait_for_py_coro(coro: PyObject) -> PyResult<PyObject> {
let r = {
let (s, r) = async_channel::bounded(1);
// Would recommend not using the get_unchecked method
// if you do not have a sure fire way of confirming that it will
// guarantee being initialised. Im just doing it for the ease of
// demonstration.
let ct = unsafe { CREATE_TASK.get_unchecked() };
Python::with_gil(|py: Python| -> PyResult<()> {
let fut = ct.call1(py, (coro,))?;
let callback = EventCompleter { s };
fut.call_method1(py, "add_done_callback", (callback,))?;
Ok(())
})?;
r
};
r.recv().await.map_err(|_| pyo3::exceptions::PyRuntimeError::new_err("future dropped"))
} Example of creating a rust task with async python #[pyfunction]
fn some_async(py: Python) -> PyResult<PyObject> {
// Would recommend not using the get_unchecked method
// if you do not have a sure fire way of confirming that it will
// guarantee being initialised. Im just doing it for the ease of
// demonstration.
let cf = unsafe { CREATE_FUTURE.get_unchecked() };
let fut = cf.call0(py)?;
let res_fut: PyObject = fut.clone_ref(py).into();
SINGLE_RUNTIME.spawn(async move {
// do some async stuff
Python::with_gil(|py| {
let _ = set_fut_result(fut, "hello_world!".into_py(py));
});
});
Ok(res_fut)
} |
Awesome, I appreciate the snippets! It might be awhile before I can actually get everything working with the tokio event loop, although I'm pretty sure this will fix the errors. It could be a few days, but I'll be sure to post an update once I'm finished. |
Alright, I have good news and bad news. The good news is that I got my tests working with Python owning the main thread and tokio handling the async rust in background threads. I managed to create a crate using the snippets from @ChillFish8 that provides a decent interface for the runtime with conversions between Rust futures and Python coroutines. Here's the repo: https://github.com/awestlake87/pyo3-asyncio. The bad news is that this didn't appear to fix my problems. I have not seen the I'll change my example project https://github.com/awestlake87/pyo3-issue to use this new runtime and hopefully I'll still be able to reproduce the issue like before. Edit: |
Just as an update, I did try to change https://github.com/awestlake87/pyo3-issue over to the new runtime, but I was unable to reproduce the errors. I'm going to have to mess with it a bit more to get the error to occur. That being said I don't really think it's anything specific about my project, I think it's just a matter of timing during the finalization step. I'll post another update if I can successfully reproduce it. |
Man this issue is frustrating. I'm not even certain I've made any progress, but here's what I'm seeing right now:
For the The most aggravating part about this error is that I can never be sure I've fixed it. I can run some test suites literally 1000 times on Python 3.8 without failing only to have them fail with Python 3.6 within 5 attempts. TensorFlow seems to have an effect on certain tests, but I can't be certain that it's the TensorFlow library itself or if the problem exists in all of the tests and the timing of the finalization when TensorFlow is involved tips it over the edge. Because of this, I'm really not sure if the observations above mean anything. In any case I have a few more things to try, so I'll keep plugging away on it. |
Alright, now I've got better news. I was able to "fix" both of these issues last night and get all of my tests passing consistently, although it's not a good long-term solution. It turns out The I've been trying to find a better solution, but I'm just not familiar enough yet with Python's threading quirks to find something better. I actually poked through Boost's Python binding library and found something interesting though. They actually don't call the
I wasn't able to find out why this is the case, but maybe they have problems like this too. It might be a good idea to find out why they don't call |
Thanks for the continued investigation. Yes, it looks like the |
I haven't done much investigation on it after my last comment. Today I poked around a little bit to try and figure out why Boost doesn't call Py_Finalize. In a TODO list (not sure if it's official) I found the following:
Additionally I found this thread relating to the I know pyo3 has a thread atexit function for Py_Finalize, but maybe something needs to clear the ref-counts of Python objects beforehand? |
I don't think we have to clear any ref-counts beforehand, and we won't touch them after this, so I don't think that's it. For now, perhaps it's best to add a new |
It seems like there should be some cleaner way around it, but my tests are behaving perfectly now that I've disabled I'd love to be able to disable |
#1355 has now merged - we decided to disable finalizing by default because as we see it can lead to weird crashes like this. Hopefully all your problems should now be gone if you use the pyo3 master? 😄 |
Works for me! Will this be in 0.14? |
Most likely! |
Ok awesome. Any thoughts on integrating pyo3-asyncio into the rest of the pyo3 ecosystem? It'd be nice to have it as a crates.io dependency, but I completely understand if you want to wait for a more complete and hashed out solution for it. |
@tailhook this is some fascinating stuff to consider |
I'd be very happy already to bless Eventually as |
Sure, I'd be up for that. It's a pretty small crate so maintaining it and writing up the guides / examples should be nbd. Are there any guidelines I should know about for versioning, project structure, etc? Also, any bikesheds about the current API? |
Looks pretty good to me! I would advise potentially using fewer I might be tempted to make the signature of Versioning - no real standard. Someone suggested I keep the pythonize versions in sync with PyO3, so that's currently at version |
Alright, I updated the documentation with some more details and examples to prep it for release. I hope you don't mind me plagiarizing your Code-of-Conduct and Contributing.md files. In addition, I configured the GitHub Actions pretty much exactly like PyO3 to get the test matrix, code coverage, gh-pages deployment, and dependabot working for that project. As for the feedback you gave me, I went ahead and changed the signature of With all of that, I think it's pretty much ready to publish. Feel free to look over it and let me know if I should change anything else. Otherwise, just give me the thumbs up and I'll publish it! Update: |
Awesome! Crate all looks great.
👍 makes total sense. We don't have any breaking changes merged yet (other than the finalization change, but that's also kind of a bugfix), so maybe I will release 0.13.2 soon to unblock you and then start breaking stuff on the road to 0.14 |
Ok, I'm not too picky about it unless 0.14 is gonna be months out. I'm not in a huge rush to get it converted over to crates.io, so I'll leave that up to you. At some point this week I might open up a PR to add an example to the guide. It'll likely be one that already exists in the pyo3-asyncio docs as I think async sleeps are probably the simplest example, but it'd be nice to mention it in the overall guide since that seems to be the starting place for most pyo3 users. |
👍 please also feel welcome to add |
As there's been a lot of progress with |
This commit is an initial draft at fixing #147. The error is due to the fact that pyo3 has linked the libpython from the compilation and not trying to use libpython that is available in `LD_LIBRARY_PATH`. The current only solution to disable the linking is to use the `extension-module` flag. This requires to make the python `runtime-node` packaged in a python library. The python `runtime-node` should also be fully compatible with the other operators in case we want hybrid runtime. The issue that i'm facing is that. Due to the packaging, I have to deal with the `GIL` that is present from the start of `dora-runtime` node. This however makes the process single threaded wich is impossible. So, I have to disable it, but when I do, I have a race condition: ```bash Exception ignored in: <module 'threading' from '/usr/lib/python3.8/threading.py'> Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 1373, in _shutdown assert tlock.locked() AssertionError: ``` The issue is the same as PyO3/pyo3#1274 To fix this issue, I'm going to look again at the different step to make this work. But this is my current investigation.
Hey, I've got a rather complicated program that involves mixing Rust async/await with Python asyncio. I know this problem isn't sorted out yet in pyo3 (although I think it's being worked on), but I figured I could get around those limitations by running two event loops, one for Rust and one for Python on its own dedicated thread.
So here's what I'm doing:
asyncio.set_event_loop(loop)
andloop.run_forever()
on that threadasync-tungstenite
that occasionally interacts with the Python event loop via pyo3.call_soon_threadsafe
andasyncio.run_coroutine_threadsafe
to run my applicationloop.call_soon_threadsafe(loop.stop)
and join my threadMy tests run exactly as expected with pyo3. The only problem is that when the test exits, I get one of three errors:
If I'm lucky, my test exits successfully this warning:
If I'm unlucky I either get this error:
or this one:
Things I've tried:
std::thread
to run the event loopthreading
module to run my event looppyo3::prepare_freethreaded_python()
before the first call toPython::with_gil
Sorry the problem description is so vague. I've tried simplifying my program to narrow the problem down, but it's frustratingly hard to replicate in a smaller codebase. Before I spend too much time on that, I just wanted to check with someone more experienced with embedding Python to see if I'm setting things up properly.
Questions:
pyo3::prepare_freethreaded_python()
before the first call toPython::with_gil
in this situation?threading
module or is it ok to use Rust'sstd::thread
🌍 Environment
rustc --version
): rustc 1.47.0-nightly (6c8927b0c 2020-07-26)The text was updated successfully, but these errors were encountered: