Re: Quote from CACM paper: cost of parallelism

Hi all,

It's a pain having to patch things together from several emails in a text editor; perhaps someone can suggest a better way? I'll start wit Ruth's comments as they were the biggest "downer" and yet if you look at them in reverse they suggest the solution that has been evading us.

Ruth Ivimey-Cook <ruth.Ivimey-Cook@xxxxxxxxxx> wrote:

> And this is where I get disappointed. While I can see where Larry is coming from (I too appreciated the ability to program on bare metal on the transputer) it is no longer practical with the sorts of systems being designed and the nature of the solutions being demanded because ultimately, you end up having to reinvent the wheel. The OS constructs, in the main, are aimed at (a) supporting legacy apps and (b) providing a kit of parts to make applications easier to write.

So you let them do what they are good at, and isolate them in application-land where the bloat is allowed to be as bloatful as it wishes. This is basically what VMWare and the like do. And, from another point of view, interpreted languages like Python. In that fenced-off area, others can develop ultra-complex apps until their heads explode, but on our side of the wall, we do what we do best. I'm even trying to start a small business doing this.

http://www.LAZM.net

> While I can agree that you can eliminate that in some embedded environments, your can't do so on mainstream desktop OSs, which are the ones feeling the pinch right now.

I think "mainstream desktop OSs" are a lost cause. Let someone else go crazy trying to program them! Even economically, it is becoming harder and harder to justify the upgrades, since people can do all they need to with Windows XP. Moore's Law has simply won. Netbooks are good enough.

> From my (these days) very "industrial" view, the place to start is simply where we are - namely, often large apps of mainly C code written by various authors over many years, with tight deadlines and demanding management. Pretending that we have the luxury of redesigning either OS's or the raw silicon is fairy tales, not because we can't see why it might be useful or interesting, but because (in the case of the million-odd line codebase I'm thinking of) it would probably take over 5 years for a competent team of people. We don't have that time to spend, let alone the money to pay people. Moreover, in many cases we are also constrained tightly by our own customers, who say things like "we'd love your code, but it must run on weird processor X using niche compiler Y". We have to be able to write the code using very portable constructs. No gcc-isms here, I'm afraid.

Obviously, if the problem is defined this way, we lose. Therefore pick a different battle. In many cases, the particulars of an "industrial" problem are, at the core, friendly to our approach [see my RJIT patent, US Patent 7,389,507], and you just introduce lines of communication between the million-odd line codebase to a small thing that does real stuff faster and better. Of course it will have to be written in C, which is another problem because of C's bad design, but that is a task for another day.

> I was hoping, in posting the original email, that people might be able to say "well, if you do things *this* way..." but it seems not.

> At present the general computing world is being forced by the scruff of its neck to face parallel programming head on. I believe it won't be long before even the current luxury of true shared memory will be left behind and we'll all be in NUMA land. For CSP to be part of that new world depends on proponents being able to provide usable solutions to the problems real people face.

> A year or so ago, I was hopeful that the CSP community would be ready to take up this challenge, but I'm becoming less hopeful now.

A command to gallop off in all directions and solve all problems at once is not a "challenge" but a guaranteed defeat. The CSP community CANNOT take it up. It needs to be broken up into doable tasks. Here's one for starters: a good CSP-based interpreted language in the Python mold. It is much needed. Stackless Python, for example, is a misnomer that abandoned being stackless and is now thoroughly mired in the OO-tainted constructs typical of standard developments:

From: Richard Tew <richard.m.tew@xxxxxxxxx>

Date: Oct 9, 2009 3:37 PM

Subject: Re: [Stackless] Size of a Tasklet

To: Andrew Francis <andrewfr_ice@xxxxxxxxx>

Cc: Kristján Valur Jónsson <kristjan@xxxxxxxxxxxx>, stackless@xxxxxxxxxxxxx

"Stackless Python was

rewritten with a different core implementation (hard switching / soft

switching) from the stacklessness that allowed its initial

continuations."

Other posters sound discouraged but I think they are on the right track.

Eric Verhulst <eric.verhulst@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> [EVR] No, it was driven by CSP. DOS was not an OS. It was a jump table.

Well, that is what I was trying to say! I did not make myself clear: "driven by" is not the same as "based on", it means as in drivers that push problems to a cooperating device.

> [EVR]

> They only remedy is to show that less code is less power, hence show the "green" value.

Yes, the green thing can be an "in," I think; our big problem is motivating someone to support our efforts. I wish we could be gentlemen farmers like the old Royal Society! The big green fact is that if you drop clock speed by a factor of 2, you can drop power consumption by a factor of up to 8. That's a net power gain of 4, but that requires workable massive parallelism... and here we are.

Espen Suensen <expen@xxxxxxx> wrote:

> Damian Dimmich and others did it: http://www.transterpreter.org/publications/pdfs/a-cell-transterpreter.pdf. The success is limited, however.

> I tried to port occam directly to the Cell, but didn't get very far. It's quite complicated to program this processor.

Jon Simpson posted me a copy of Christian Jacobsen's thesis, and it looks as if (correct me if I'm wrong, please) all the new occam-pi constructs are handled with a semaphore. This makes it look as if any decent chip could be made to do it. Can you do a dump, Espen, of what problems you ran into? Perhaps retreating to a subset of the processor's capabilities is called for.

Larry Dickson

On Oct 12, 2009, at 9:03 PM, Espen Suenson wrote:

On Sun, Oct 11, 2009 at 7:53 PM, Ruth Ivimey-Cook <ruth.Ivimey-Cook@xxxxxxxxxx> wrote:

Larry Dickson wrote:

Sorry to be a little slow on this, but...

I think the key task for our side is what the previous poster talked about - the Cell processor (and similar). We all know that it should be + EASY + to program the Cell, because it's just a slightly disguised PC and B008 with 8 transputers ;-) But none of us that I know of, including myself, has actually done anything about this...

I would agree with Eric; the Cell is an interesting design but far from easy to program for. I do seem to recall that someone at Kent did some work with the transterpreter on the Cell. Maybe my memory is faulty.

Damian Dimmich and others did it: http://www.transterpreter.org/publications/pdfs/a-cell-transterpreter.pdf. The success is limited, however.

I tried to port occam directly to the Cell, but didn't get very far. It's quite complicated to program this processor.

And the key is what Rick says, we start from the wrong place. Namely, the mountain of massive OS constructs and their insistence on hiding the "bare metal". The Transputer was a big technical success because it was driven from DOS, a totally minimalistic non-OS that allowed you to go around it and whack away, in standard code, at things like DMA addresses. Now we have to tiptoe around the whole attic full of exploding OS and driver constructs, never doing a real design (like a classic car), and the effort involved is not only triple or more, but discouragingly senseless.

And this is where I get disappointed. While I can see where Larry is coming from (I too appreciated the ability to program on bare metal on the transputer) it is no longer practical with the sorts of systems being designed and the nature of the solutions being demanded because ultimately, you end up having to reinvent the wheel. The OS constructs, in the main, are aimed at (a) supporting legacy apps and (b) providing a kit of parts to make applications easier to write. While I can agree that you can eliminate that in some embedded environments, your can't do so on mainstream desktop OSs, which are the ones feeling the pinch right now.

From my (these days) very "industrial" view, the place to start is simply where we are - namely, often large apps of mainly C code written by various authors over many years, with tight deadlines and demanding management. Pretending that we have the luxury of redesigning either OS's or the raw silicon is fairy tales, not because we can't see why it might be useful or interesting, but because (in the case of the million-odd line codebase I'm thinking of) it would probably take over 5 years for a competent team of people. We don't have that time to spend, let alone the money to pay people. Moreover, in many cases we are also constrained tightly by our own customers, who say things like "we'd love your code, but it must run on weird processor X using niche compiler Y". We have to be able to write the code using very portable constructs. No gcc-isms here, I'm afraid.

I was hoping, in posting the original email, that people might be able to say "well, if you do things *this* way..." but it seems not.

At present the general computing world is being forced by the scruff of its neck to face parallel programming head on. I believe it won't be long before even the current luxury of true shared memory will be left behind and we'll all be in NUMA land. For CSP to be part of that new world depends on proponents being able to provide usable solutions to the problems real people face.

A year or so ago, I was hopeful that the CSP community would be ready to take up this challenge, but I'm becoming less hopeful now.

Regards,

Ruth