Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading backend? #12

Open
rcarmo opened this issue Jun 7, 2014 · 23 comments
Open

Threading backend? #12

rcarmo opened this issue Jun 7, 2014 · 23 comments
Labels

Comments

@rcarmo
Copy link
Contributor

rcarmo commented Jun 7, 2014

This is great, and much nicer than my homegrown threading solution. How difficult would it be to mix in a threading back-end?

@rgalanakis
Copy link
Owner

I have thought a bit about a threading backend and haven't implemented it yet for a few reasons:

  1. threading performance is very different from cooperative coroutine performance. A thousand coroutines are fine. A thousand threads are madness. So 'threaded' goless and 'async' goless would have to be coded differently.
  2. I can't figure out a way to back goless with a threadpool without a significant amount of work (each yield or channel op would need to somehow unschedule the current code and put it back in the work queue?). Without this, we can't fix item 1.
  3. Behavior between threading and cooperative are different. Threading interrupts while cooperative does not. This is not a major problem. And on top of that, stackless can be run interrupt tasklets, so we in theory already have this problem. But it would be more severe and lose some of the predictability of cooperative multitasking.
  4. The backend, since it isn't really a drop-in, would need to be selected through an environment variable. Would make API more confusing.
  5. If you're serious about using concurrency, you should probably be avoiding Python threads (or threads altogether) in the first place. I'd happily add another backend for an alternative concurrency library.

I'm curious about your feedback on these issues. If you feel a threading backend has value, it would probably not be much work to add in.

@rgalanakis
Copy link
Owner

So it turns out this is really difficult. Without any ability to control the interpreter's thread scheduler, there's not much we can do.
The best we could do would be to implement some threading abstraction that would allow us to control the scheduler and use a pool of threads. I'm not sure how this would work, or even if it would work, and it would be a significant amount of work. Unless there's a compelling story here, I don't think we should support a threading-based backend.

@rcarmo
Copy link
Contributor Author

rcarmo commented Jun 8, 2014

Yeah, well, I've been experimenting with threading + gevent for a while (I have a feed fetcher working atop that). In my experience, Go is stupefyingly performant precisely because its scheduler maps OS threads to goroutines, and that's why I was looking at offset -- it implements a scheduler like Go's.

@rgalanakis
Copy link
Owner

But I still don't see what the upside is for Python code? You can't achieve parallelism anyway due to the GIL, so the "juggle goroutines across OS threads" model doesn't help at all. Assuming there's no IO, simply using goless with its stackless backend (Stackless or PyPy) and a single thread is going to give better CPU performance than any sort of scheduler and multiple threads. Switching threads in Python is not cheap. I'm not sure I see the upside of a threading-based backend.

Which reminds me, we should probably abstract the monkeypatching capabilities of gevent and stackless (stacklesslib) into goless.

@rcarmo
Copy link
Contributor Author

rcarmo commented Jun 8, 2014

In my testing, the GIL is not really an issue when you're fetching and parsing hundreds of feeds. Python can take advantage of multiple cores just fine, and using threads does provide better performance -- it all depends on your workload and how far each thread can progress independently. Of course, I don't expect Python to outperform Go on this, just saying that threading is useful and does help.

@ctismer
Copy link
Contributor

ctismer commented Jun 8, 2014

@rcarmo
Where is your testing visible as code?
Where are your timing logs?
Where is the proof of your claims?

"""the GIL is not really an issue when you're fetching and parsing hundreds of feeds."""

If that is true, then you don't need more than one thread, and your whole point is nonsense,
and your testing as well. The GIL is the central reason why Goless never will be able
to really compete with GoLang, unless we do a substantial change to Stackless for CPython.
The claim that threading solves any part of this problem can only make sense if
you heavily depend on code that releases the GIL.

Please don't emit such statements without proof if you are after a real discussion.

@rcarmo
Copy link
Contributor Author

rcarmo commented Jun 8, 2014

My bottle-fever project, for starters, which is stalled on Github but evolved (elsewhere) into a larger scale affair. That and a couple of years doing work with Celery, the last of which involved a few tens of millions of task runs across a few dozen cores on multiple machines using a mix of threaded and gevent workers.

So yes, I do know what I'm talking about. And I honestly don't appreciate that kind of retort when I'm discussing an issue, so kindly dial it down... I'm interested in goless for its simplicity and promise, but technical merits are not the only reason one gets involved... or not.

On 08/06/2014, at 23:25, Christian Tismer [email protected] wrote:

@rcarmo
Where is your testing visible as code?
Where are your timing logs?
Where is the proof of your claims?

"""the GIL is not really an issue when you're fetching and parsing hundreds of feeds."""
If that is true, then you don't need more than one thread, and your whole point is nonsense,
and your testing as well. The GIL is the central reason why Goless never will be able
to really compete with GoLang, unless we do a substantial change to Stackless.

Please don't emit such statements without proof if you are after a real discussion.


Reply to this email directly or view it on GitHub.

@ctismer
Copy link
Contributor

ctismer commented Jun 9, 2014

I appreciate your experience with your other projects, also using Celerity, ZeroMQ
and so on. But that does not make your claim more considerable without facts.
If you have a direct comparison of threading versus tasklet/eventlet performance,
then please post it here, in a reproducible form that can be validated.

Thanks - tismer at stackless.com

@rgalanakis
Copy link
Owner

@rcarmo I am a bit confused by some claims. I am going to quote each thing I have questions about:

In my testing, the GIL is not really an issue when you're fetching and parsing hundreds of feeds.

No question, because for the most part you are just waiting for IO.

Python can take advantage of multiple cores just fine

I am most confused by this claim. Unless you are working with C extensions that do not need the GIL (or using something like PyParallel), I would need more explanation how this can be the case. A Python-only CPython program cannot use more than one core at a time (though I suspect it can use different cores if you use multiple threads, but only one is running at a time).

and using threads does provide better performance

This cannot be the case. There is overhead with thread scheduling at the interpreter level, not to mention things like context switching. David Beasly has a great presentation about the GIL that you may have seen. If only one thread can run at a time (see last point), multiple threads will always increase overall CPU time.

it all depends on your workload and how far each thread can progress independently.
"Workload" meaning CPU work or IO work? Nevertheless, I don't see how threads can improve performance.

Of course, I don't expect Python to outperform Go on this, just saying that threading is useful and does help.

Actually with PyPy I wonder if it can compete against Go :) At least in a single-threaded environment. And perhaps the goless model will benefit from STM very much, with minimal shared memory. I will add an issue to add some Go benchmarks.

Really, being programmers, the best solution would be to provide some code that demonstrates what you are talking about! (Python using multiple cores, threads improving performance).

@rcarmo
Copy link
Contributor Author

rcarmo commented Jun 9, 2014

I've had mixed experiences with PyPy, largely because most of what I use Python for is either server-side web stuff (where string manipulation and object access slow it down significantly) or IO-bound stuff where it delivers very little improvement (and where threading is a quick way to leverage multiple cores). When I need proper concurrency, I usually dip into the JVM (or, lately, Go).

@rgalanakis
Copy link
Owner

Would still love feedback on the rest of my questions, if we want to go any further with this.

@rcarmo
Copy link
Contributor Author

rcarmo commented Jul 11, 2014

Aaand I'm back. I think you'll be pleased to know that goless is now being used for a few message-handling chores down at my neck of the woods, but getting back to the topic at hand, the recent pypy-stm alpha (which I'm testing) would probably be a nice target for a threading backend.

@rgalanakis
Copy link
Owner

I am very pleased! 👍 Will do my best to take care of anything that comes up and keep things stable.
Yeah when I saw the pypy-stm release my mind immediately went to a threading backend.
So now the discussion is not if but how to use threads :)

The way I see it there are a few options:

  • Let the PyPy team figure out how to handle this with their stackless (and probably greenlet) implementations. They're smart and can probably figure it out and would involve no work on our end.
  • Use a thread-per-goroutine. I'm against this because it is fundamentally inconsistent with the other uses of a goroutine (which are very lightweight). However it could be a decent first attempt. That said, since the semantics of threads are quite different from coroutines, this still involves quite some work. I've tried it quickly a few times and can't get it to work.
  • Create a threadpool that runs the goroutines. This would be a better option but how the hell to implement it? I am not sure if anyone has done something like this at the library level. I'm not quite sure it's possible. Would require scheduling a goroutine to run on a thread, then unscheduling it when it blocks, and scheduling a new goroutine to run. I've only ever seen this implemented at the language level (this is sort of like async/await in .NET I think, writing a single function/coroutine that ends up getting split up and run via a threadpool).

Thoughts?

@rcarmo
Copy link
Contributor Author

rcarmo commented Jul 12, 2014

Well, 3) seems a lot like the Go scheduler (and I recall some discussion about that in the offset library presentation -- maybe it would be possible to re-use some of Benoit's work). 1) might happen (no idea what their priorities are besides ironing out bugs in STM at this point), 2) might be a usable first step...

@AnneTheAgile
Copy link

Links, fyi. If they get the donations [3] do you think that functionality would be sufficient? The current docs [1] discuss problems currently with being conflict-prone.
AnneTheAgile

1.[]Docs for Software Transactional Memory — PyPy 2.4.0 documentation
http://pypy.readthedocs.org/en/latest/stm.html
2.[]How I found this GoLess project, a blog about GoLang vs Python. ; ; ; x.Back to Python - Dave Behnke
http://davebehnke.com/break-to-python.html
3.[]Donations requested to achieve improved STM in PyPy. They over-achieved with under-donations on the first call.
http://pypy.org/tmdonate2.html
PyPy - 2nd Call for donations - Transactional Memory in PyPy

@rgalanakis
Copy link
Owner

Having a STM implementation wouldn't be enough (and I'm confident PyPy team will come up with one though). You also need a multithreaded program. goless would work for free if a) PyPy had STM and b) they updated their stackless.py module in some way to be multithreaded (or had some other mechanism). In the meantime if we did 2) above (thread per goroutine), it could use STM.

Go-style concurrency is a good candidate for STM because shared memory (what creates conflicts) is an antipattern there.

I'd be excited for someone to take up this work (forking goless, or just copy/pasting stuff into a new codebase focused on STM). I do not have the time, unfortunately, since I'm not using Python at work currently.

@AnneTheAgile
Copy link

Thank you @rgalanakis . So am I misreading it? It seems like the proposed Pypy-Stm does have or plan to have multithreading, and stackless.py is part of pypy so it should come too, right?

Both the docs and the latest release notes imply that to me. Of course , it's not funded yet so there's alot of uncertainty.

Docs;
"pypy-stm, a special in-development version of PyPy which can run multiple independent CPU-hungry threads in the same process in parallel." from Software Transactional Memory — PyPy 2.4.0 documentation, http://pypy.readthedocs.org/en/latest/stm.html

Release;
"Today, the result of this is a PyPy-STM that is capable of running pure Python code on multiple threads in parallel", from Saturday, July 5, 2014;, on PyPy Status Blog: PyPy-STM: first "interesting" release; http://morepypy.blogspot.com/2014/07/pypy-stm-first-interesting-release.html
I am not sure where the source is for the July release. There are numerous bitbucket branches with 'stm' in the name.
https://bitbucket.org/pypy/pypy/
stm-gc-2
stm-jit
stm-thread-2
stmgc-c4
stmgc-c7
stmgc-static-barrier

@rgalanakis
Copy link
Owner

stackless.py is "part" of pypy but not its core. For example it doesn't work in PyPy3. And it (well, continulets) would still need work to take advantage of STM.

My point is, mostly, that a good threading backend with STM support (IMO the only reason to create a threading backend) would require lower-level scheduler work than goless was designed for. Remember that goless isn't rewriting Go in Python, it's really just a wrapper to allow go-like programming on top of Python libraries/runtimes that allow asynchrony.

Once such a scheduler system was built, goless could just have another adapter/backend written for it, like it can swap between gevent/stackless. But this work is quite outside goless (though it could be done with goless in mind).

@rcarmo
Copy link
Contributor Author

rcarmo commented Nov 27, 2014

Incidentally, I hacked my threaded task queue to support a goless-like syntax:

https://github.com/rcarmo/python-utils/blob/master/taskkit.py

Bear in mind that this was originally something designed to let me run Celery-like code without the hassle of installing a message queue (hence the scheduler thread, priorities and retry support), but it now lets me write goless-like code for systems where I can't install gevent and seems to work fairly well with PyPy (I have a tiny RADIUS server that handles accounting events, sends the interesting ones down a couple of channels to specific workers that do HTTP requests for lookups and a few more chained tasks that take the results and apply further processing). All of it mostly I/O bound and relatively simple, but at least it's now vastly more readable...

@rgalanakis
Copy link
Owner

Have you tried writing a goless backend for it?

@rcarmo
Copy link
Contributor Author

rcarmo commented Nov 29, 2014

I don't think it would be a very clean fit as it is. I've never been happy with my scheduler, for starters, and I'm likely to rewrite the whole thing some day and split the "mini-Celery" bits away from the "goish" bits. It's a hack atop another hack, and even if adding go() and the channel shim was an interesting and fun experience, I don't have a clear idea of how to go about implementing select() (for instance).

Besides, I'm writing a fair amount of Go and Clojure these days...

@pothos
Copy link

pothos commented Feb 13, 2016

Maybe a threading backend could make use of https://github.com/stuglaser/pychan which supports python3 now.

@navytux
Copy link

navytux commented Dec 3, 2019

For the reference pygolang supports both gevent and thread runtimes. (see also #43 for why pygolang was created instead of reusing goless).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants