how stackless python can be fast for concurrency?

1.6k views Asked by At

stackless python didn't take a good usage of multi-core, so where is the point it should be faster than python thread/multiprocessing ?

all the benchmark use stackless python tasklet to compare with python thread lock and queue, that's unfair, cause lock always has low efficiency

see, if use single thread function call without lock it should be as efficient as stackless python

2

There are 2 answers

3
gahooa On BEST ANSWER

Focus on functionality first, and performance second (unless you know you have the need).

Most of the time on a server is spent with I/O, so multi-cores do not help so much. If it is mostly I/O that you are working with, multi-threading python may be the simplest answer.

If the server requests are CPU intensive, then having a parent process (be it multi-threaded or not), and respective child processes does make a good bit of sense.

If you really want to scale, you could look at a different platform, like Erlang. If you really want to scale and still use python, you could look at distributed erlang with Python processes managed as Erlang ports on a distributed cluster.

Lots of options, but unless you are dealing with someting big big, you could most likely take a simple approach.

release early, release often.

0
Will On

There is this new and trendy thing called asynchronous-IO-loops and message-passing-concurrency and a few other trendy terms. Well, its not at all new, but it is only just these last 5 years being discovered by the mainstream.

Stackless Python is a version of Python where the VM has itself been modified to better support these message passing and IO loops, and its trick is green threading / coroutines.

There are other libraries for doing the same with different tools, e.g. Twisted and Tornado, on Python. You can even run hybrid Twisted on Stackless Python and so on.

The IO loop bit maps directly to how Berkley sockets do asynchronous IO, and with a bit of effort can be extended to be proactive rather than reactive and work with file systems as well as network sockets, e.g. the newest libevent.

To scale sideways to utilise more than one core is where you have two approaches - multithreading; shared state e.g. threads or between processes - multiprocessing e.g. message queues. It is a general limitation of current architectures that the threads approach works well for a large number of cores locally, whereas message passing overtakes performance-wise as the number of cores becomes massive or if those cores are on different machines. And you can make a hybrid approach.

Because of internal design choices in the Python VM, it is generally not as efficient at multi-threading as multi-processing, so you go to multiple processes with message passing sooner than you might on other platforms.

But generally the message passing approach is a cleaner, easily correct version.

And there are other languages that build on this same approach with different additional aims and constraints e.g. Erlang, node.js, Clojure, Go.

Of these, Clojure is perhaps the most informative. When you understand how Clojure ticks, and think through the whys, the whole aims and constraints of the other systems will fall into place...