Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about multithreading and asyncio #1274

Closed
awestlake87 opened this issue Nov 12, 2020 · 41 comments
Closed

Questions about multithreading and asyncio #1274

awestlake87 opened this issue Nov 12, 2020 · 41 comments
Labels

Comments

@awestlake87
Copy link
Contributor

awestlake87 commented Nov 12, 2020

Hey, I've got a rather complicated program that involves mixing Rust async/await with Python asyncio. I know this problem isn't sorted out yet in pyo3 (although I think it's being worked on), but I figured I could get around those limitations by running two event loops, one for Rust and one for Python on its own dedicated thread.

So here's what I'm doing:

  • Spin up an async Rust application with async-std runtime
  • Spawn a thread to act as the main thread for the asyncio event loop and call asyncio.set_event_loop(loop) and loop.run_forever() on that thread
  • Host a websocket server with async-tungstenite that occasionally interacts with the Python event loop via pyo3.
  • Interact with the Python event loop via call_soon_threadsafe and asyncio.run_coroutine_threadsafe to run my application
  • When the application exits, I call loop.call_soon_threadsafe(loop.stop) and join my thread

My tests run exactly as expected with pyo3. The only problem is that when the test exits, I get one of three errors:

If I'm lucky, my test exits successfully this warning:

Exception ignored in: <module 'threading' from '/usr/lib/python3.8/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 1373, in _shutdown
    assert tlock.locked()
AssertionError: 

If I'm unlucky I either get this error:

Fatal Python error: This thread state must be current when releasing

Thread 0x00007f7dd1dd8b40 (most recent call first):

Current thread 0x00007f7dcece9700 (most recent call first):
  File "/usr/lib/python3.6/asyncio/selector_events.py", line 148 in _write_to_self
  File "/usr/lib/python3.6/asyncio/base_events.py", line 643 in call_soon_threadsafe

Thread 0x00007f7dcc4a0700 (most recent call first):
  File "/usr/lib/python3.6/selectors.py", line 445 in select
  File "/usr/lib/python3.6/asyncio/base_events.py", line 1415 in _run_once
  File "/usr/lib/python3.6/asyncio/base_events.py", line 438 in run_forever

Thread 0x00007f7dce8e7700 (most recent call first):
error: test failed

or this one:

FATAL: exception not rethrown
error: test failed

Things I've tried:

  • Spawning native rust threads with std::thread to run the event loop
  • Spawning python threads with the threading module to run my event loop
  • Calling pyo3::prepare_freethreaded_python() before the first call to Python::with_gil

Sorry the problem description is so vague. I've tried simplifying my program to narrow the problem down, but it's frustratingly hard to replicate in a smaller codebase. Before I spend too much time on that, I just wanted to check with someone more experienced with embedding Python to see if I'm setting things up properly.

Questions:

  • Am I setting up my threads correctly?
  • Do I need to call pyo3::prepare_freethreaded_python() before the first call to Python::with_gil in this situation?
  • Do I need to use a thread from the Python threading module or is it ok to use Rust's std::thread
  • Do I need to use some of the FFI calls to ensure that my Python thread state is correct?
  • Are there any other multi-threading limitations that I should know about with pyo3?

🌍 Environment

  • OS: Ubuntu 20.04
  • Python version: 3.8.5 (installed with distro)
  • Rust version (rustc --version): rustc 1.47.0-nightly (6c8927b0c 2020-07-26)
  • PyO3 version: "0.12.3"
  • I have not tried pyo3 master yet (I think the problem is with my code)
@davidhewitt
Copy link
Member

Hi @awestlake87, sorry for the slow reply.

This is a complicated topic indeed. I've seen a similar error in the past but not had time to investigate it. There's an upstream issue which looks relevant: https://bugs.python.org/issue26693

I have a theory that the error is probably related to importing threading for the first time from a thread which isn't the main thread, which is why you might be seeing it in your test. (Or maybe it's if threading module's finalization happens from a different thread in which it was originally imported?)

We might be able to solve this by making it possible to spin up and down a sub-interpreter in a test. Or maybe it's something that can easily be fixed upstream with a bit of willing.

This is all just speculating though...

@awestlake87
Copy link
Contributor Author

awestlake87 commented Nov 20, 2020

No worries, I've been able to work around it in the meantime.

I think you're right about it having something to do with the main thread. By default cargo test is running multiple threads (might be 12 on my computer), but scaling it back down to 1 with cargo test -- --test-threads=1 seems to fix this issue.

At the moment, I don't personally import threading in my python bindings since it didn't seem to make a difference if the event loop ran on a python thread vs a rust thread. Although I am importing other modules that could definitely be importing threading themselves. These imports can happen from any thread, so I think it matches up with your theory.

I'll try this out by making a binary that imports threading, then runs the same tests.

If this works, it seems straightforward enough to add a threading import in the main function for a binary, but is there any way to gain control over the main thread for cargo test? It'd be nice to still be able to run my tests in parallel, especially since rust should be doing most of the heavy lifting underneath the python.

As for the sub-interpreter idea, I hadn't considered using a sub-interpreter for my purposes before, but if it could help solve some of these issues for testing purposes, I'd be willing to try it out. Am I correct in assuming that this isn't currently possible with pyo3?

If not, how complicated would it be to implement? I'm not really familiar with embedding python in C/C++ and I haven't dug too far into the pyo3 code, but I'd be willing to help out if it's not too complicated.

@awestlake87
Copy link
Contributor Author

Ok, I just found out that I can actually override the default test harness to do some main thread initialization, so I'll try going that route and see if it fixes my issue without losing parallel testing.

I also read up a bit on the sub-interpreters and what I saw was worrying. I don't know if the issues surrounding the proposal have been addressed since the posts I saw, but it looked like the community was fairly divided on it. Some numpy developers were especially unhappy with it since they thought numpy would have to undergo massive changes to support it, which could be bad news for me because one of the python modules I'm importing is numpy.

@davidhewitt
Copy link
Member

numpy would have to undergo massive changes to support it

On top of that, pyo3 would need quite a few changes to support it. We're not supposed to have any shared state between sub-interpreters - but at the moment we cache our type objects in statics!

Let me know if the main thread initialization fixes the problem - would be very curious to hear about that!

@awestlake87
Copy link
Contributor Author

Well, unfortunately it seems like I was too quick to call it fixed. It seems like it just changed the timing enough that it wasn't happening for that test. It started happening again as soon as I made some unrelated tweaks to the code.

I did try replacing the default test harness to be certain that I was importing threading on the main thread before anything else. It didn't make a difference.

Idk why it stopped happening when I initially scaled it back to --test-threads=1, but it's not passing now.

@ChillFish8
Copy link

What i would recommend doing when interacting with both async Rust and async python is use tokio's new background event loop feature and keep python as the main thread regardless of what you do and make use of async-channel for interactions between the two.

One thing to point out in your original message; you're first error is on 3.8 the second is on 3.6 so seems to depend on what python version you're on.

@awestlake87
Copy link
Contributor Author

Oh my bad, I should clarify. I have python 3.8 on my host system (Ubuntu 20.04), but I usually run my tests inside tensorflow/tensorflow:latest or tensorflow/tensorflow:latest-gpu, which have python 3.6. I can run them on my host system since I have the same packages installed in Ubuntu 20.04, and it had the newer python version, so I went with that for the GitHub issue. I guess I pasted output from one of the docker tests, but I do see the same errors in both:

On Ubuntu 20.04

Most of the time I get this error:

test result: ok. 9 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

FATAL: exception not rethrown
error: test failed

It's rarer now, but I do sometimes see this:

Exception ignored in: <module 'threading' from '/usr/lib/python3.8/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 1373, in _shutdown
    assert tlock.locked()
AssertionError: 

Occasionally it just passes.

In Tensorflow Docker Containers

Most of the time it's this error:

test result: ok. 9 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

FATAL: exception not rethrown
error: test failed

Sometimes this one:

test result: ok. 9 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Exception ignored in: <module 'threading' from '/usr/lib/python3.6/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 1289, in _shutdown
    assert tlock.locked()
AssertionError: 
FATAL: exception not rethrown
error: test failed

I haven't seen it pass inside the container recently.

The third error I mentioned in the original issue is one I don't see anymore. I'm assuming that it could still happen, but it's hard to tell. The timing of the application seems to affect which error shows up most, and the timing varies depending on what else I'm doing on my computer, tweaks I'm making to the application, and whether it runs inside the container or not.

@awestlake87
Copy link
Contributor Author

awestlake87 commented Nov 24, 2020

Right now I'm pretty tied to the async-std event loop. That's not to say I couldn't switch, but it would be quite a bit of work. I've got quite a bit of churn in my project anyway as I'm adopting pyo3, so I'm not opposed, I just want to weigh my options a bit.

For context, my project originally had a python websocket server that works with tensorflow/numpy to run some ML tasks. My websocket client was written in rust and is used in a larger WASM + JS web app. As my websocket protocol got more complex, I thought it'd be beneficial to have a symmetric implementation for the client / server because I was duplicating a lot of protocol functionality in Rust and Python. So I started replacing bits and pieces of the Python server with Rust with async-tungstenite and started calling into pyo3 for my Python bits.

No matter what, I'll still need to use Python for some of this stuff, so the goal isn't to replace the Python code entirely, it's just to make it so that my client and server share more code and evolve together.

As I see it, I have a few options, but I'm not sure which ones you guys would recommend:

  1. (what I'm currently doing) Ignore the errors for now and continue adopting pyo3 and hope the problem fixes itself as I replace more and more pieces
  2. Create a smaller simplified example of the problem to narrow it down.
  3. Rework the server tests to run Python on the main thread, and Rust on background tokio executor to see if that fixes it.

@davidhewitt
Copy link
Member

davidhewitt commented Nov 24, 2020

@awestlake87 I just experimented with a solution inside pyo3 which might solve this - I create a new thread internally which becomes the "initialization" (and finalization) thread for the Python interpreter.

Can you please give this a try in your Cargo.toml and report back:

pyo3 = { git = "https://github.com/davidhewitt/pyo3", branch = "threading-bugfix" }

If it solves your problem, I'll open it up as a PR to merge into pyo3 master.

(master...davidhewitt:threading-bugfix)

@awestlake87
Copy link
Contributor Author

Sure! I'll have time to try it later this evening.

@awestlake87
Copy link
Contributor Author

I tried that branch out, but unfortunately it still gave me the same error. It didn't seem to act any differently.

@davidhewitt
Copy link
Member

Hmm. Were you still using the main thread initialization? It shouldn't be needed with the patch I built.

Is there any chance you could share a (stripped down if needed) copy of your code which still exhibits the problem, so I could play with it?

@awestlake87
Copy link
Contributor Author

It was using the default test harness, so I wasn't using the main thread initialization.

I agree though, making a stripped down example is probably our best bet at this point. I'm hoping that I can put one together today, but I had a lot of trouble replicating the problem last time I tried, so it could be a few days (especially with the holidays).

@awestlake87
Copy link
Contributor Author

awestlake87 commented Nov 26, 2020

Alright, it took a little while, but here you go: https://github.com/awestlake87/pyo3-issue.

The error seems a bit touchy, which is kinda expected from race conditions, so hopefully you see it too. I see it just about every time I run this example. But what's worrying is that it seems like when I make certain tweaks to the code, the problem still exists, but is very hard to replicate.

For example, there's a block of code in the init_python fn that spawns an async task that continuously runs the signal handlers for Python:

task::spawn(async move {
    loop {
        Python::with_gil(|py| py.check_signals().map_err(dump_err(py)).unwrap());
        task::sleep(Duration::from_millis(5)).await;
    }
});

If I remove this block, one of my pyo3 server test suites can run 1000 times without failing, but my other test suite ran about 5 times before it failed.

Edit:
BTW, let me know if you have any issues running it. I use docker containers to keep my host environment as clean as possible, so I think as long as you have docker and make installed you should be fine, but you never know. Sometimes the docker stuff can get a bit messy.

@davidhewitt
Copy link
Member

Thanks for the repro; sorry it took me a little while to find a chance to investigate.

I had a play with it this morning and didn't come up with a definitive solution, though here's some things I did note:

  • the threading-bugfix branch I pushed earlier seems to reliably solve the Python AssertionError.
  • on the threading-bugfix branch I can't repro the FATAL: exception not rethrown issue on Ubuntu 20.04. Inside the tensorflow docker container I can reproduce it fairly easily.

So I'm confident my branch fixes part of the problem (the AssertionError). I'm a bit stumped on the other issue. Maybe a future session can shed some more light on it.

@awestlake87
Copy link
Contributor Author

awestlake87 commented Dec 17, 2020

No worries, I'm not really in a rush to get this fixed.

I guess I can't say for sure whether I was still getting the AssertionError when I tried that branch out. Back then I said it didn't seem to make much of a difference, but I could have just been getting the FATAL: exception not rethrown error and not the AssertionError.

I'm a bit stumped on this as well, plus I'm getting other issues related to the signal handlers not being called on the main thread. I'm tempted to try the background executor that @ChillFish8 suggested and just surrender the main thread to Python. Python really seems to need control over the main thread in order to finalize and handle signals like CTRL-C properly.

Unfortunately, this means I'll probably have to write my own custom test harness since I'll have to handle the background executor initialization before the tests run. If I find a fairly generic solution, I'll link it to this thread.

I'm kinda curious whether or not this issue could be solved with a custom Rust impl of the Python asyncio event loop. I'm assuming this is probably necessary for first class pyo3 async/await integration anyway (although I'm definitely not certain). It would definitely be an undertaking though.

At the very least it might be nice to expose some pyo3-compatible version of asyncio and rust executor functions like task::spawn behind a feature gate or additional crate so other people don't run into this issue. I don't think this would be an ideal long-term solution, but it might abstract away some nasty glue code and give other people a better starting point for Python's async/await in the meantime. I'm likely going to have to roll something up for that myself, so I might share that too.

@ChillFish8
Copy link

I've done a similar system like that before but it tends to be incredibly frustrating. Made easier now with the recent tokio changes though.

My recommendation would be give python the main thread and just accept that it won't surrender it, you can start a background runtime with tokio using something like Once Cell or lazy static which will give you a global handle for spawning tasks to.

My personal recommendation is using the single threaded scheduler becuase you can combine a local set with the runtime giving you the ability to aquire the gil and still spawn futures because you have 'Send but this will change depending on what you need it for.

Spawning rust tasks atleast with async python I recommend just creating a future with the loop and returning a clone of that giving you a handler to be able to set the result etc... From the control of rust when using asyncio.EventLoop.call_soon_threadsafe()
That would be my recommendation for this sorta thing and is probably the easiest and cleanest way. If you need to await a coroutine from rust use asyncio's run coro threadsafe method and use the async-channel crate to allow python to send the result back in a sync way going to a async receiver.

@awestlake87
Copy link
Contributor Author

awestlake87 commented Dec 17, 2020

Thanks for the tips @ChillFish8! I figured things might get a bit weird with the GIL, so I'll look into the single-threaded scheduler. I've been getting by so far with the asyncio.EventLoop.call_soon_threadsafe along with some glue code like this:

async def rust_coroutine_glue(coroutine, future):
    try:
        result = await coroutine
        future.set_result(result)

    except Exception as e:
        traceback.print_exc(file=sys.stdout)
        raise e

where the future object is a wrapper around a futures oneshot channel.

#[pyclass]
struct PyFuture {
    tx: Option<oneshot::Sender<PyObject>>,
}

#[pymethods]
impl PyFuture {
    fn set_result(&mut self, py: Python, result: &PyAny) -> PyResult<()> {
        let result = result.into();

        py.allow_threads(move || {
            self.tx.take().unwrap().send(result).unwrap();
            Ok(())
        })
    }
}

Awaiting a rust coroutine from python is a bit easier since I can just use a asyncio.Future. I haven't thought much about exceptions yet, though.

From your comment, it looks like you're doing something similar, so I think we're thinking along the same lines. Is your code hosted publicly somewhere?

@ChillFish8
Copy link

ChillFish8 commented Dec 17, 2020

I dont have a direct public set of code for this atm but i can give the follow set of stuff:

Runtime building

use pyo3::prelude::*;

use lazy_static::lazy_static;
use once_cell::sync::OnceCell;

use tokio::runtime::{Builder, Runtime};

use std::thread;


// Example of Multi threaded runtime
lazy_static! {
    static ref MULTI_RUNTIME: Runtime = {
        Builder::new_multi_thread()
            .enable_all()
            .build()
            .expect("Couldn't build the runtime")
    };
}

// Example of a single threaded background worker
lazy_static! {
    static ref SINGLE_RUNTIME: Runtime = {
        Builder::new_current_thread()
           .enable_all()
           .build()
           .expect("Couldn't build the runtime")
    };
}

// For both runtimes they need to be started with a block_on in a new thread.
#[pyfunction]
fn setup_rt() {
    thread::spawn(move || {
        MULTI_RUNTIME.block_on(futures::future::pending::<()>());
    });

    thread::spawn(move || {
        SINGLE_RUNTIME.block_on(futures::future::pending::<()>());
    });
}

Useful helpers

// Some really useful helpers you want in the global scope.
// This setup() function can just be a internal call with
// the pointers to the callbacks passed which saves alot of
// calls and hassle trying to get them from rust imo.
static EVENT_LOOP: OnceCell<PyObject> = OnceCell::new();
static CALL_SOON: OnceCell<PyObject> = OnceCell::new();
static CREATE_TASK: OnceCell<PyObject> = OnceCell::new();
static CREATE_FUTURE: OnceCell<PyObject> = OnceCell::new();

#[pyfunction]
fn setup(
    event_loop: PyObject,
    call_soon_thread_safe: PyObject,  // asyncio.EventLoop.call_soon_threadsafe
    call_coroutine_thread_safe: PyObject,  // asyncio.call_coroutine_threadsafe
    create_future: PyObject,  // asyncio.EventLoop.create_future
) {
    let _ = EVENT_LOOP.get_or_init(|| event_loop);
    let _ = CALL_SOON.get_or_init(|| call_soon_thread_safe);
    let _ = CREATE_TASK.get_or_init(|| call_coroutine_thread_safe);
    let _ = CREATE_FUTURE.get_or_init(|| create_future);
}

Examples

// Our completed callback when we make our python tasks
#[pyclass]
struct EventCompleter {
    s: async_channel::Sender<PyObject>,
}

#[pymethods]
impl EventCompleter {
    #[call]
    #[args(val)]
    fn __call__(&self, val: PyObject) -> PyResult<()> {
        let _ = self.s.try_send(val);
        Ok(())
    }
}


// Some example functions
fn set_fut_result(fut: PyObject, res: PyObject) -> PyResult<()> {
    Python::with_gil(|py: Python| -> PyResult<()> {
        let sr = fut.getattr(py, "set_result")?;

        // Would recommend not using the get_unchecked method
        // if you do not have a sure fire way of confirming that it will
        // guarantee being initialised. Im just doing it for the ease of
        // demonstration.
        let cb = unsafe { CALL_SOON.get_unchecked() };
        let _ = cb.call1(py, (sr, res))?;
        Ok(())
    })
}

async fn wait_for_py_coro(coro: PyObject) -> PyResult<PyObject> {
    let r = {
        let (s, r) = async_channel::bounded(1);

        // Would recommend not using the get_unchecked method
        // if you do not have a sure fire way of confirming that it will
        // guarantee being initialised. Im just doing it for the ease of
        // demonstration.
        let ct = unsafe { CREATE_TASK.get_unchecked() };

        Python::with_gil(|py: Python| -> PyResult<()> {
            let fut = ct.call1(py, (coro,))?;

            let callback  = EventCompleter { s };

            fut.call_method1(py, "add_done_callback", (callback,))?;
            Ok(())
        })?;

        r
    };

    r.recv().await.map_err(|_| pyo3::exceptions::PyRuntimeError::new_err("future dropped"))
}

Example of creating a rust task with async python

#[pyfunction]
fn some_async(py: Python) -> PyResult<PyObject> {
    // Would recommend not using the get_unchecked method
    // if you do not have a sure fire way of confirming that it will
    // guarantee being initialised. Im just doing it for the ease of
    // demonstration.
    let cf = unsafe { CREATE_FUTURE.get_unchecked() };

    let fut = cf.call0(py)?;
    let res_fut: PyObject = fut.clone_ref(py).into();

    SINGLE_RUNTIME.spawn(async move {
        // do some async stuff
        Python::with_gil(|py| {
            let _ = set_fut_result(fut, "hello_world!".into_py(py));
        });
    });

    Ok(res_fut)
}

@awestlake87
Copy link
Contributor Author

Awesome, I appreciate the snippets! It might be awhile before I can actually get everything working with the tokio event loop, although I'm pretty sure this will fix the errors. It could be a few days, but I'll be sure to post an update once I'm finished.

@awestlake87
Copy link
Contributor Author

awestlake87 commented Dec 21, 2020

Alright, I have good news and bad news.

The good news is that I got my tests working with Python owning the main thread and tokio handling the async rust in background threads. I managed to create a crate using the snippets from @ChillFish8 that provides a decent interface for the runtime with conversions between Rust futures and Python coroutines. Here's the repo: https://github.com/awestlake87/pyo3-asyncio.

The bad news is that this didn't appear to fix my problems. I have not seen the AssertionError since I converted it over, but I have seen the FATAL: exception not rethrown error in my project as well as a new error (signal: 11, SIGSEGV: invalid memory reference) during finalization. I have seen the FATAL: exception not rethrown error in both my container and Ubuntu 20.04. So unfortunately, it looks like this problem might be a bit more complicated than we thought.

I'll change my example project https://github.com/awestlake87/pyo3-issue to use this new runtime and hopefully I'll still be able to reproduce the issue like before.

Edit:
On the plus side, it wasn't all bad. Changing my runtime to allow Python to control the main thread fixed my issues with signal handling like CTRL-C. At the very least it's still useful to have a runtime like this, it just didn't fix the main issue like I'd hoped.

@awestlake87
Copy link
Contributor Author

Just as an update, I did try to change https://github.com/awestlake87/pyo3-issue over to the new runtime, but I was unable to reproduce the errors. I'm going to have to mess with it a bit more to get the error to occur.

That being said I don't really think it's anything specific about my project, I think it's just a matter of timing during the finalization step. I'll post another update if I can successfully reproduce it.

@awestlake87
Copy link
Contributor Author

awestlake87 commented Dec 22, 2020

Man this issue is frustrating. I'm not even certain I've made any progress, but here's what I'm seeing right now:

  • FATAL: exception not rethrown
    • With GDB, I was able to find out that this error actually comes from the tokio thread where I call block_on(future::pending::<()>()).
    • It seems like this error is usually seen when people use the pthread_cancel call.

      It's possible that Py_Finalize sees the tokio thread as a thread it's supposed to clean up and tries to do something to the thread that it shouldn't. However, this doesn't track very well with the following information because TensorFlow seems to have some effect on the error.

    • Occurs when:
      • Python 3.6 is the version installed
      • Python 3.8, but only when TensorFlow gets involved (which is why I saw it on Ubuntu 20.04 as well).

        As soon as I disabled the tests involving TensorFlow, I was no longer able to reproduce the error under Ubuntu 20.04.

        When I based my Docker image off of Ubuntu 20.04 (Python 3.8) instead of 18.04 (Python 3.6), I was no longer able to reproduce the error inside the container with the TensorFlow tests disabled.

  • (signal: 11, SIGSEGV: invalid memory reference)
    • This actually seems to be somewhat unrelated. It occurs inside tests where I'm training ML models and seems to originate from inside the test itself, not the test finalization.

      I need to do some more investigation on this issue to find out what's causing it, but I think the culprit is in my code this time (not that my intuition has been right with this issue at all so far).

    • Occurs when:
      • Either Python 3.6 or Python 3.8 is installed, but only when certain TensorFlow tests are enabled.

For the FATAL: exception not rethrown error, I'm going to see if I can figure out how to clean up the TensorFlow library before finalization in a way that doesn't produce the error. I'm hoping that TensorFlow is involved, but I can't say for certain, especially since the error seems to come from the tokio thread for some reason.

The most aggravating part about this error is that I can never be sure I've fixed it. I can run some test suites literally 1000 times on Python 3.8 without failing only to have them fail with Python 3.6 within 5 attempts. TensorFlow seems to have an effect on certain tests, but I can't be certain that it's the TensorFlow library itself or if the problem exists in all of the tests and the timing of the finalization when TensorFlow is involved tips it over the edge. Because of this, I'm really not sure if the observations above mean anything.

In any case I have a few more things to try, so I'll keep plugging away on it.

@awestlake87
Copy link
Contributor Author

Alright, now I've got better news. I was able to "fix" both of these issues last night and get all of my tests passing consistently, although it's not a good long-term solution.

It turns out (signal: 11, SIGSEGV: invalid memory reference) was happening in PyThreadState_Clear when I had a long-lasting blocking task (ML training task) that I wasn't handling properly. I was blocking in the coroutine instead of calling loop.run_in_executor(None, fn, args). I pushed a few changes to https://github.com/awestlake87/pyo3-asyncio to support the ThreadPoolExecutor and perform a graceful shutdown, then the error went away. I'm not entirely sure I understand how this translated into a SEGV, but I haven't seen the error since I pushed those changes and I've run the test many times.

The FATAL: exception not rethrown error is trickier. I didn't actually fix it, I just suppressed it by removing the Py_Finalize call in my pyo3 fork (https://github.com/awestlake87/pyo3/tree/finalization-bugfix). That narrows things down, and tracks a bit better with my theory that Py_Finalize is somehow interfering with the tokio thread. I know this is probably not something you want to push upstream, so I think we need a better solution. At the very least, this narrows it down and pins it to the Py_Finalize function.

I've been trying to find a better solution, but I'm just not familiar enough yet with Python's threading quirks to find something better. I actually poked through Boost's Python binding library and found something interesting though. They actually don't call the Py_Finalize function either:

From https://www.boost.org/doc/libs/1_74_0/libs/python/doc/html/tutorial/tutorial/embedding.html#tutorial.embedding.getting_started

Note that at this time you must not call Py_Finalize() to stop the interpreter. This may be fixed in a future version of boost.python.

I wasn't able to find out why this is the case, but maybe they have problems like this too. It might be a good idea to find out why they don't call Py_Finalize and whether they have any ideas for a fix.

@davidhewitt
Copy link
Member

Thanks for the continued investigation. Yes, it looks like the finalize issue is definition related to threading somehow, but I don't have any headway yet either. All the Python documentation seems to imply that Py_Finalize is expected to be called.

@awestlake87
Copy link
Contributor Author

I haven't done much investigation on it after my last comment. Today I poked around a little bit to try and figure out why Boost doesn't call Py_Finalize. In a TODO list (not sure if it's official) I found the following:

Currently Boost.Python has several global (or function-static) objects whose existence keeps reference counts from dropping to zero until the Boost.Python shared object is unloaded. This can cause a crash because when the reference counts do go to zero, there's no interpreter. In order to make it safe to call PyFinalize() we must register an atexit routine which destroys these objects and releases all Python reference counts so that Python can clean them up while there's still an interpreter. Dirk Gerrits has promised to do this job.

Additionally I found this thread relating to the Py_Finalize issue in Boost Python. It might have some more details. Seems like someone was trying to figure out how to solve it. It's 11 years old though, so I can't be sure any of it is still relevant.

I know pyo3 has a thread atexit function for Py_Finalize, but maybe something needs to clear the ref-counts of Python objects beforehand?

@davidhewitt
Copy link
Member

I don't think we have to clear any ref-counts beforehand, and we won't touch them after this, so I don't think that's it.

For now, perhaps it's best to add a new prepare_freethreaded_python_without_finalizer API to just turn off finalization in problem cases - see #1355.

@awestlake87
Copy link
Contributor Author

It seems like there should be some cleaner way around it, but my tests are behaving perfectly now that I've disabled Py_Finalize, so idk how much time I'm willing to dedicate to finding a better solution right now. Also, if Py_Finalize can cause other issues with the libraries I'm using like tensorflow (#1073) and potentially scipy (#1261) (idk if I'll need scipy or not), then that's just another hurdle to fixing it.

I'd love to be able to disable Py_Finalize without using my pyo3 fork, so the new API call for problem cases sounds great to me.

@davidhewitt
Copy link
Member

#1355 has now merged - we decided to disable finalizing by default because as we see it can lead to weird crashes like this. Hopefully all your problems should now be gone if you use the pyo3 master? 😄

@awestlake87
Copy link
Contributor Author

Works for me! Will this be in 0.14?

@davidhewitt
Copy link
Member

Most likely!

@awestlake87
Copy link
Contributor Author

Ok awesome. Any thoughts on integrating pyo3-asyncio into the rest of the pyo3 ecosystem? It'd be nice to have it as a crates.io dependency, but I completely understand if you want to wait for a more complete and hashed out solution for it.

@dmgolembiowski
Copy link

@tailhook this is some fascinating stuff to consider

@davidhewitt
Copy link
Member

I'd be very happy already to bless pyo3-asyncio as the recommended solution for async Python+Rust, if you're willing to release and maintain the crate for now! We can also add an example to the guide if you're up for writing one. We did pretty much all this for pyo3-log.

Eventually as pyo3-asyncio matures and grows in userbase we can consider moving it into the PyO3 org to bolster maintainer efforts.

@awestlake87
Copy link
Contributor Author

Sure, I'd be up for that. It's a pretty small crate so maintaining it and writing up the guides / examples should be nbd.

Are there any guidelines I should know about for versioning, project structure, etc? Also, any bikesheds about the current API?

@davidhewitt
Copy link
Member

Looks pretty good to me! I would advise potentially using fewer .unwrap() calls internally in the lib if you're able to remove those. Panics aren't much fun coming from library code.

I might be tempted to make the signature of get_event_loop be fn get_event_loop(py: Python) -> &PyAny, just because &PyAny is marginally more useful than PyObject imo.

Versioning - no real standard. Someone suggested I keep the pythonize versions in sync with PyO3, so that's currently at version 0.13 also. Up to you if you like this or want to keep to your own semantic versioning.

@awestlake87
Copy link
Contributor Author

awestlake87 commented Jan 14, 2021

Alright, I updated the documentation with some more details and examples to prep it for release. I hope you don't mind me plagiarizing your Code-of-Conduct and Contributing.md files.

In addition, I configured the GitHub Actions pretty much exactly like PyO3 to get the test matrix, code coverage, gh-pages deployment, and dependabot working for that project.

As for the feedback you gave me, I went ahead and changed the signature of get_event_loop and the version to 0.13 as you suggested. I wasn't really able to get rid of most of the unwrap calls, but I did replace them with expect calls with a message that tells the user that PyO3 Asyncio has not been initialized. There were a few other places where the unwrap calls could be addressed with errors, so I went ahead and fixed those.

With all of that, I think it's pretty much ready to publish. Feel free to look over it and let me know if I should change anything else. Otherwise, just give me the thumbs up and I'll publish it!

Update:
Sometimes the actions will fail because the project still depends on PyO3 0.13 which doesn't have the new finalization API. This poses a problem for publishing since I don't believe I can depend on PyO3 master in a crates.io dependency. I can either hold off until 0.14 or mention that PyO3 should be patched to master temporarily. Idk how long it might be before 0.14 is released, but waiting for that release to publish this crate would be my preference.

@davidhewitt
Copy link
Member

Awesome! Crate all looks great.

Idk how long it might be before 0.14 is released, but waiting for that release to publish this crate would be my preference.

👍 makes total sense. We don't have any breaking changes merged yet (other than the finalization change, but that's also kind of a bugfix), so maybe I will release 0.13.2 soon to unblock you and then start breaking stuff on the road to 0.14

@awestlake87
Copy link
Contributor Author

Ok, I'm not too picky about it unless 0.14 is gonna be months out. I'm not in a huge rush to get it converted over to crates.io, so I'll leave that up to you.

At some point this week I might open up a PR to add an example to the guide. It'll likely be one that already exists in the pyo3-asyncio docs as I think async sleeps are probably the simplest example, but it'd be nice to mention it in the overall guide since that seems to be the starting place for most pyo3 users.

@davidhewitt
Copy link
Member

👍 please also feel welcome to add pyo3-asyncio to the README (in the tools and libraries section) in the same PR!

@davidhewitt
Copy link
Member

As there's been a lot of progress with pyo3-asyncio since this discussion, I'm going to close it.

haixuanTao added a commit to dora-rs/dora that referenced this issue Nov 21, 2022
This commit is an initial draft at fixing #147. The error is due to the
fact that pyo3 has linked the libpython from the compilation and not
trying to use libpython that is available in `LD_LIBRARY_PATH`.

The current only solution to disable the linking is to use the `extension-module` flag.
This requires to make the python `runtime-node` packaged in a python library.
The python `runtime-node` should also be fully compatible with the other operators in case we want hybrid runtime.

The issue that i'm facing is that. Due to the packaging, I have to deal with the `GIL` that is present from the start of
`dora-runtime` node. This however makes the process single threaded wich is impossible.

So, I have to disable it, but when I do, I have a race condition:
```bash
Exception ignored in: <module 'threading' from '/usr/lib/python3.8/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 1373, in _shutdown
    assert tlock.locked()
AssertionError:
```
The issue is the same as PyO3/pyo3#1274

To fix this issue, I'm going to look again at the different step to make this work.

But this is my current investigation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants