Re: Ravenscar profile and occam on the transputer

Øyvind TEIG

+47 959 615 06

http://www.teigfam.net/oyvind/home

(iPhone)

17. nov. 2017 kl. 21:05 skrev Lawrence Dickson <tjoccam@xxxxxxxxxxx>:

Hi guys,

We are trying to assemble a legacy system and have obtained a B008, but I am puzzled by a TRAM-like thing shipped with it (nylon-bolted in) in the physically last slot (Slot 1). It is not a TRAM because there is no transputer. It may be labeled RVSI 44751. I will try to attach a picture I found of it. Can anyone enlighten me? It blocks access to slot 1, but I’d like to know what it is before I get rid of it.

Larry Dickson
<WHATS-IN-SLOT_1.JPG>

On Apr 10, 2017, at 9:32 AM, Lawrence Dickson <tjoccam@xxxxxxxxxxx> wrote:

All,

I am happy about this lively discussion, because “my sponsors” may need to have some of this stuff really implemented in a language, and we are hearing from a lot of people who are likely to be able to do it! I am behind the curve in a lot of this, especially in respect of Xmos, but I think I can respond to Øyvind’s question - inserted below. . .

On Apr 10, 2017, at 4:04 AM, Øyvind Teig <oyvind.teig@xxxxxxxxxxx> wrote:

All

Roger wrote:
Secondly, in terms of XC I was very unhappy about the restrictions it placed on the use of channels (no repetition) in ALTs and in TIME ? guards. To be clear, the Occam rules could have been implemented by the xcore - it was a limitation of the core. The Occam rules are much nicer and useful in a number of cases (some of which you can address in XC but, I would argue in a less clear way than in Occam).

Whether these points are still true, I don’t know. As I said it is a long time since I’ve looked at XC.

I have done some xC to try to find out now. This compiles:

#include <platform.h>

#define NUM_CLIENTS 2
#define NUM_TIMERS 2

typedef interface my_interface_t {
  void rpc(const int n);
} my_interface_t;

[[combinable]] void PS1 (server my_interface_t my_interface[NUM_CLIENTS]) {
  int   time[NUM_TIMERS];
  timer tmr[NUM_TIMERS];
  int   guard = 0;

  for (int index_of_timer=0; index_of_timer< NUM_TIMERS; index_of_timer++) {
tmr[index_of_timer] :> time[index_of_timer];
}

  while (1) {
  select {
  case (guard==1) => tmr[int index_of_timer] when timerafter(time[index_of_timer]) :> void: {
time[index_of_timer] += 1000;
guard = 0;
} break;
  case my_interface[int index_of_client].rpc(const int n): {
guard = 1;
}  break;
}
}
}

void PC1 (client my_interface_t my_interface) {
my_interface.rpc(1);
}

void T1 (void) {
  // ...
}

int main(void) {
  interface my_interface_t my_interface[NUM_CLIENTS];
  par {
  on tile[0]: PC1 (my_interface[0]);
  on tile[0]: PC1 (my_interface[1]);
  on tile[0].core[0]: PS1 (my_interface);
  par {
  on tile[0]: T1();
}
}
  return 0;
}

Since I made PS1 [[combinable]] I just as well placed it on tile core. When I did that I also had to quality the others with on tile. [[combinable]] means that many can run on the same hw thread (one per logical core) sharing the same select loop underneath, and I only have one, then it wasn’t really necessary. But I wanted to decorate as much as possible for you guys. It has replicated timer and interface calls with 2 clients. I didn’t use channels, but they are treated like interfaces here. I didn’t make any session in the interface, decoraded with [[notification]] and [[clears_notification]]. If I had put a guard for the rpc call then I must write [[guarded]]for it in the interface. (References to the somewhat outdated literature in http://www.teigfam.net/oyvind/home/technology/141-xc-is-c-plus-x/)

I don’t know the fork and join logic here, except that I think it is like in occam.

--------------------------------------
Peter Morris wrote (added into this thread):
When I suggested polling I did mean busy polling, but only for systems in which polling could meet the timing requirements.

Provided that is the case, I don't think it matters if the ALT is busy or non-busy

Hmm. Do you mean busy polling of a single ALT doing nothing else, or busy-polling that may be preempted? I guess the last, if not, it would starve? Great to hear from you, Peter!

--------------------------------------
Larry wrote:
What takes occam further is occam + Transputer priority,

Since I am also learning about XMOS, from XMOS Programming Guide (https://www.xmos.com/download/private/XMOS-Programming-Guide-%28documentation%29%28B%29.pdf) I read that the [[ordered]] can make priority select. However, with code placed on logical cores with cycles shared in a round robin with cycle count when giving guarantees, I guess, in that situation we won’t need task priority?

That last doument also shows replicated cases with an example of a timer array with a somewhat different syntax.

What I am getting at would take it either way - could be interpreted as a subset of occam (i.e. occam + restrictions) or something that takes occam further. I have actually used this successfully and thought it rather obvious.

Are you saying that in your opinion we would need an occam Ravenscar subset? Would it compile with occam 2 or would you need N priorities etc? I think it only had max two?

The N-priority question is answered in my US-only patent

http://www.google.com/patents/US20140101663

- which by the way everyone is welcome to develop on, as I am only claiming hardware, not software, and only in the USA. Basically, more than 2 priority levels are OK but only a few (say up to 6 or 7) separated by wide ratios in natural times, so that higher looks like interrupts (stolen cycles) to lower, and lower looks like idle time to higher. Then you can design as if to a two-priority Transputer. Think keyboard interrupt as a high-priority process. If you lean on a key it only fires about 30 times a second, and the hi-pri keystroke calculates quickly and goes into a FIFO (another hi-pri process) then both are blocked - the kb process till next keystroke (30,000 microseconds later) and the FIFO till lo-pri process draws out keystroke. You can build with several interrupt services and design so response is fast and lo-pri is never starved. You have to avoid what might in CSP analogous terms be called “quasi-divergence” where hi-pri stuff starts talking to other hi-pri stuff and pushes the lo-pri response out for a long time.

So since occam 2 with Transputer priorities is certainly a subset of this, I see no problem with compiling it, but the restrictions as described above have to be imposed, and then you get a pretty easy proof of response times in any particular case. I imagine an Xmos-like design with lots of cores would be a good fit, because some cores could be high-priority Transputers with few processes in effect, handling all the quick-response stuff, and other cores could be low-priority-with-FIFO in effect, and by doing a chained response analysis you could get the result you need. (I actually did something like this for Ford Motor Company in the 1990s, using multi-T2 TRAMs of my own design to decimate incoming radar data which was very fast and impatient.)

We certainly need true occam on the Xmos or similar device (like the Adapteva Parallella?). If ALT is implemented by a select() we have to groan and bear it. Even if polling is required, that is just cycle theft balanced against response latency. But it has to be able to be absolutely analogous to Transputer behavior.

Larry

Øyvind
=====================

I am adding Peter Morris reply here:

Hi Øyvind and all,
All (+ Peter Morris)

7. apr. 2017 kl. 18.46 skrev Lawrence Dickson <tjoccam@xxxxxxxxxxx>:

All,

In [2], Peter Morris says,

And once one has implemented CSP channels, one can implement ALTs by polling ready flags in the channels.

Does “polling" mean what it customarily means, trying again and again until an OK is found? If so, I would disagree that the described approach is a substitute for occam ALT, which does not poll.

In the context of «timing analysis» I don’t think he would have thought of busy polling. I have added Peter Morris on the list. Peter, are you there? Øyvind
Yes, I am here. Thanks for remembering me !

When I suggested polling I did mean busy polling, but only for systems in which polling could meet the timing requirements.

Provided that is the case, I don't think it matters if the ALT is busy or non-busy.

Regards,
Peter

9. apr. 2017 kl. 20.30 skrev Roger Shepherd <roger.shepherd@xxxxxxxxx>:

Larry, Øyvind,

Very interesting.

As an aside, regarding the Xmos processor and XC. It is some years since I took a look in detail but I remember a couple of important things.

Firstly, in terms of the scheduler, the Xcore has a more primitive processor than the transputer. That is, the operations provided are more primitive than the operations in the transputer, but enable you to build the same (or at least a very similar) model as the transputer. XC does this but there are some subtle differences which I suspect have little or practical impact.

One I remember is the implementation of Fork and Join. The explanation of the difference is made difficult by the sparseness of the identity of “process" in the transputer. The transputer implementation of

SEQ
A
PAR
B
C
D
E

has a process which first executes A, spawns (“startp") C and D, and continues as B. The join is implemented as an “endp” executed at the end, of B, C and D. The last one of which to execute “endp” (i.e. the last one to join) continues and executes E.

I can’t remember the implementation of the fork in XC - the point of uncertainty is whether there were 3 or 4 processes which implement the PAR. I think there are, in effect 4, as if the program with three processes (B, C, D) spawned at the join, and forking process waiting. B, C and D then execute and the join restarts the forking process (now as E) when the third process terminates. [The alternative I have in mind is that, as in the transputer implementation, two processes are spawned, but the join effectively causes the original process to continue, rather than the final process as in the tramsputer].

Secondly, in terms of XC I was very unhappy about the restrictions it placed on the use of channels (no repetition) in ALTs and in TIME ? guards. To be clear, the Occam rules could have been implemented by the xcore - it was a limitation of the core. The Occam rules are much nicer and useful in a number of cases (some of which you can address in XC but, I would argue in a less clear way than in Occam).

Whether these points are still true, I don’t know. As I said it is a long time since I’ve looked at XC.

Roger

For example, the implementation of Fork and Join (the beginning and end of PAR) is different.

On 9 Apr 2017, at 18:58, Larry Dickson <tjoccam@xxxxxxxxxxx> wrote:

All,

Øyvind and I had an exchange off line, and I think I see where a mis-communication creeps in. Øyvind said

======
Perhaps you mean that a subset of occam would be needed to get meaningful cycle count? That occam as it is won’t give us this now. I believe that the XMOS architecture takes it further. Is XC a subset of occam on the transputer?
======

What I am getting at would take it either way - could be interpreted as a subset of occam (i.e. occam + restrictions) or something that takes occam further. I have actually used this successfully and thought it rather obvious.

What takes occam further is occam + Transputer priority, and the subset of occam in question is the high-priority subset of occam, with the restriction that each high-priority process is restricted to a known maximum cycle count before it deschedules. In addition there must be a round-robin scheduler. The consequence of this is that wait time before response of a high-priority process (when ready) is less than the sum of the maxima for all the high-priority processes (you can subtract the time of the responding process itself). In the case of an ALT, this would hold for a unique winner, because output is ready and the communication on the ALT side after disabling will therefore be immediate with no descheduling. IT DOES NOT MATTER IF THE OTHER SIDE OF THE COMMUNICATION IS LOW PRIORITY, though the low-priority process’s response to the communication might be delayed if other high-priority processes “spin” to the exclusion of low priority. So to make it robust against that, you have to impose a maximum cycle count before high priority becomes dependent on low priority, and enforce spacing of external stimuli. (I have a US-only patent that deals with stuff like that, which Øyvind says I should not be ashamed to mention!)

This is all standard interrupt stuff, done ad hoc on other processors. The provable result is due to the beautiful design of the Transputer, which must (and, at the bare metal, can) be emulated on other processors to get the result. This does not so much have to do with CSP, I am learning, as it operates on a kind of sub-CSP level.

Larry

On Apr 7, 2017, at 11:40 AM, Øyvind Teig <oyvind.teig@xxxxxxxxxxx> wrote:

All (+ Peter Morris)

7. apr. 2017 kl. 18.46 skrev Lawrence Dickson <tjoccam@xxxxxxxxxxx>:

All,

In [2], Peter Morris says,

And once one has implemented CSP channels, one can implement ALTs by polling ready flags in the channels.

Does “polling" mean what it customarily means, trying again and again until an OK is found? If so, I would disagree that the described approach is a substitute for occam ALT, which does not poll.

In the context of «timing analysis» I don’t think he would have thought of busy polling. I have added Peter Morris on the list. Peter, are you there? Øyvind

Therefore, I disagree that an “occam Ravenscar” profile is superfluous. However, for schedulability it must be occam + priority. Given absolute high priority as found on the Transputer, and cycle-counted limits on high-priority code between deschedulings, a round-robin scheduler can guarantee response within a known count of cycles.

It’s either one or the other, isn’t it: 1.) occam Ravenscar or 2.) schedulability analysis and guaranteed response time? Or is there a possibility of both in a case? Øyvind

Larry

On Apr 7, 2017, at 1:52 AM, Teig, Oyvind UTC CCS <Oyvind.Teig@xxxxxxxxxxxxxxxx> wrote:

All

The Ravenscar profile is for safety critical systems written in Ada. It basically takes the rendezvous away. This opens for schedulability analysis. [1]

I have wondered about this in [2] (the “Computer scientist” quoted is close to the Ravenscar profile..)

Finally I now have had a mail with Roger Shepherd how the transputer tackled this, whether there also would be any reason to make an “occam Ravenscar” profile. From what he answered I think the answer is no. [3]

Any comments welcomed. Please state if your comment here might by name be quoted in the blog note. If I don’t see this specifically I will not do any copy paste. (But I will not guarantee that I might not mail you and ask in case I think you may have forgotten…)

For any one of you who are extremely knowledgable about the XMOS processors and XC I would be delighted to see this viewed also from that perspective..

[1] https://en.wikipedia.org/wiki/Ravenscar_profile
[2] http://oyvteig.blogspot.fr/2011/12/035-channels-and-rendezvous-vs-safety.html
[3] http://www.teigfam.net/oyvind/home/technology/138-determined-about-buffers-and-bit-arrays/

Med vennlig hilsen / Best regards
Øyvind Teig
Pensioner and blogger from June 2017
http://www.teigfam.net/oyvind/home/technology/
Senior utviklingsingeniør, siv.ing. / Senior Development Engineer, M.Sc.
Autronica Fire and Security AS
Research and Development
UTC Building and Industrial Systems
Phone: +47 95961506
E-mail:  oyvind.teig@xxxxxxxxxxxxxxxx, web: www.autronicafire.no

<image001.jpg>

Øyvind TEIG
+47 959 615 06
oyvind.teig@xxxxxxxxxxx
http://www.teigfam.net/oyvind/home
(Mac)

Øyvind TEIG
+47 959 615 06
oyvind.teig@xxxxxxxxxxx
http://www.teigfam.net/oyvind/home
(Mac)