[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: lightweight coroutines in C (take 2)

To: "Tom Locke" <tom@xxxxxxxxxxxxx>, "Allan McInnes" <amcinnes@xxxxxxxxxx>
Subject: RE: lightweight coroutines in C (take 2)
From: "Eric Verhulst OLS" <eric.verhulst@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 13 Mar 2005 20:30:44 +0100
Cc: "occam list" <occam-com@xxxxxxxxxx>
Delivery-date: Sun, 13 Mar 2005 19:31:48 +0000
Envelope-to: phw@xxxxxxxxxxxxxxxx
Importance: Normal
In-reply-to: <4233BF58.1000508@xxxxxxxxxxxxx>
Reply-to: <eric.verhulst@xxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-occam-com@xxxxxxxxxx

Hi All,

I am a bit astonished that the concept of using co-routines as light weight
threads even pops up in a occam oriented group.
Mind me, this is not a reproach, just a fact I observe. The question is why
because it seems to indicate that some of the original and fundamental ideas
were lost in history.

If we go back to the origins, we have CSP and CSP stands for systems defined
as a collection of processes and communication (on channels). And nothing
else. A process is a priece of code with its own private workspace that only
interacts through messages.
Well, of course there is build in composibility and hierarchy (a process can
be composed of 1 milion other processes). Pure CSP says nothing about
scheduling. That's a side-effect of the interaction, but when adding
scheduling criteria (e.g. priorities) the only observed effect should be a
different scheduling in time. The logic itself remains intact.

I think the observations that equate lightweight threads to CSP stem from
the observation that processes on transputers were inherently lightweight.
Hence, it was not really a problem to define a process with a single
statement although there was nothing preventing you from putting a million
lines of code in a single process. The reason is because the transputer had
a 'matching' context. There were only three registers to save (at least on
the fixed point version) and even switching process queues (e.g. to provide
multiple priorities and preemption) was just two registers extra (the
frontend and backend pointers of the process queues). Hence, CSP has become
associated with lightweight threads but that's because the context was so
light on the transputer in hardware. Try to do the same on any traditional
processor and you have to switch tens of registers and quiete some extra
status bits of peripherals. Hence, a context switch on a tradional processes
is heavy and that more or less prohibits writing lightweight threads to
remain useful. In other words, the process granularity is coarse grain hence
heavy. The underlying reason is that most CPUs today are designed to
maximize throughput even assuming the presence of heavy caches or the
processors risk becoming starved.
Nevertheless, one can approach 'pure' CSP if one only writes processes with
a minimum size to remain practical.

Co-routines can never provide CSP like behaviour because the scheduling is
not purely defined by the interaction pattern. The processes must cooperate
(hence there is hidden enforcement of sequential execution) and the CSP
semantics are lost.

Summary: there is no discussion about the 'cleanness' of the CSP
model/paradigm. But to be useable as a practical programming paradigm one
must increase the 'grain' size. Or, reinvent the transputer if we really
want to have a match between the programming model and the implementation. I
am more and more convinced this will be needed becuase the current practice
lacks scalebility and that, amongs others like better formalisation, is
needed to tackle the growing complexity. 20 yeasr ago we had programs with
100 000 or so processes running on 100s af transputers in a network. This is
the kind of complexity one gets in a car today. And as we see, these cars
are not realible enough. The CSP paradigm can handle it, but in order to put
it in practice we need processors that really support it.

br.

Eric Verhulst



-----Original Message-----
From: owner-occam-com@xxxxxxxxxx [mailto:owner-occam-com@xxxxxxxxxx]On
Behalf Of Tom Locke
Sent: Sunday, March 13, 2005 5:20 AM
To: Allan McInnes
Cc: occam list
Subject: Re: lightweight coroutines in C (take 2)


Hi All,

[Summary: this approach to lightweight threads puts a heavy penalty on
sub-routine calls, and hence is no basis for a CSP kernel]

 From the page:

> For example, we would like to be able to write a function that says
>
>|int function(void) {
>    int i;
>    for (i = 0; i < 10; i++)
>        return i;   /* won't work, but wouldn't it be nice */
>}|
>
> and have ten successive calls to the function return the numbers 0
> through 9.
>
Python now supports this style of programming through "generators"

def zeroToNine():
    for i in range(10):
        yield i

The is called a "generator function", calling it gets you a "generator
object" (the function does not run). Each call to the "next" method
causes the function to run until it hits a yield, at which point the
value of the yield is returned to the caller of "next". Call "next"
again, and the function carries on where it left off. The call to "next"
is often implicit, as in:

nums = zeroToNine()
for x in nums:
    print x

C# now has something very similar, although I've only galnced at that.

A number of people have observed that you can use this as a basis to
build a lightweight threads mechanism. The basic idea is that you code
all your threads up as generator functions, and then write a scheduler
that invokes all of these, and calls the next method when it wants a
"thread" to run. When that thread hits a yield, control returns to the
scheduler.

There's a BIG snag. The threads can only yield at the top level. If you
put a yield in a sub-routine, that routine becomes a generator itself. e.g.

def a():
    while True:
        ...do some work
        yield None
        ...do work
        b()
        ...

def b():
    ...do work
    yield None
    ...do work

This doesn't work. "b" was not called by the scheduler, it's next method
is never called so it never runs. To make it work, the call to 'b()' has
to be coded as:

for x in b(): yield x

In other words, 'b' has to yield to 'a' which in turn yields to the
scheduler. In general, if your thread needs to yield control n
sub-routine calls down, that costs you n context switches + the overhead
of n loop iterations.

I find this totally unsuitable as a basis for implementing a CSP kernel.
As far as I can make out, this C coroutine idiom suffers from the same
problem. If this is what people mean by coroutine, I guess all
implementations do.

A communicating-process approach to programming is such a powerful idea,
that it seems to be inevitable. In ignorance of the right way to do it,
people are just stumbling blindfolded in the general direction.

[Aside: A real lightweight thread mechanism for Python is available in
Stackless Python, although Stackless is a somewhat marginal project. The
PyPy project is building a new VM for Python which will also be
'stackless'. PyPy is a rather more serious project, with European
(academic) funding]

Tom.

References:
- Re: lightweight coroutines in C (take 2)
  - From: Tom Locke

Prev by Date: Re: lightweight coroutines in C (take 2)
Next by Date: Re: lightweight coroutines in C (take 2)
Previous by thread: Re: lightweight coroutines in C (take 2)
Next by thread: Re: lightweight coroutines in C (take 2)
Index(es):
- Date
- Thread