[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Occam channel based on polling techniques !



Dear all,

There is small rectification on the Channel class I wrote with ComsTime. 
The Channel type is NOT a memory-less channel as in Occam but a memory 
channel. Peter noticed this... thanks.  
In the search for a channel with the best performance I ended up this 
one-place memory channel. I call it a fast channel (APPROACH 4). 
This channel is derived from several other Occam like channels. 

APPROACH 1: memory-less channel with wait() and notify() (failure)
APPROACH 2: memory-less channel with yield() context switching
APPROACH 3: modification of APPROACH 2 with renamed methods in as
            wwread and out as wwrite and new methods sread and swrite
            to construct the PAR.
APPROACH 4: modification of APPROACH 3 but as a memory channel for better
            performance (implemented in channel with ComsTime).

(In the following code exception handling is omitted)

/* APPROACH 1 */

The first approach was based on the monitors wait() and notify(). Waiting 
clients are be placed on a waiting queue when communication is not ready 
and returned to the processing queue when communication is ready. 

Thread writer, reader;
int buffer;
boolean writeready = false;

public synchronized void out(int value)            
{
  buffer = value;
  writeready = true;
  writer = currentThread();     // must be outside this method
  reader.notify();
  writer.wait();
}

public synchronized int in(int value)           
{
 if (!writeready)
  {
    reader = currentThread();   // must be outside this method
    writer.wait();
  }
  writerready = false;
  reader.notify();

  return buffer;
}

This approach has four problems:
1. it is slow and fragile.
2. a context switch before writer.wait() or reader.wait() can cause dead-
   lock.
3. constructing a PAR is only possible with input/output threads (Oyvind 
   Teig named it a kicker). Making a kicker was also my idea but I found 
   this complex and the fear for overhead made me searching for an other 
   solution.
4. The writer = Thread.current() and reader = Thread.current() statements 
   must be brought outside in and out else it won't work. This makes the 
   use of the channel difficult.

Because I ran into these problems I tried to find another solution with 
the use of a context switching. I have no test results and I tested this 
in a simple program.

Peter's CHAN_OF_INT is the right sollution here.

The second approach was based on the idea of scheduling on context 
switching by channel communication. In Java a Thread.yield() causes a 
context switch. 

/* APPROACH 2 */

public void out(int value)
{
  while (!writerready) Thread.yield();

  buffer[0]  = value;
  writeready = true;

  while (writerready) Thread.yield();
}

public void in(int value[])
{
  while (!writeready) Thread.yield();

  value[0]    = buffer[0];
  writerready = false;
}

This channel performed much better (in ComsTime between 2.7 and 2.1 ms) 
and it is simpler than the previous one. The yield method is written in 
native code. This could be the reason why this is faster than wait() and 
notify().
If the current running thread yields the CPU, then the scheduler 
implements a simple non-preemptive round-robin scheduling order.
Approach 3 is an update to the previous one and I renamed in as wread and 
out as wwrite. This is because I add sread and swrite to construct a PAR.

In wwrite the first yield spin is deleted.

/* APPROACH 3 */

public boolean wwrite(int value)
{
  buffer[0]  = value;
  writeready = true;

  while (writerready) Thread.yield();

  return true;
}

public boolean wread(int value[])
{
  while (!writeready) Thread.yield();

  value[0]    = buffer[0];
  writerready = false;

  return true;
}

The sread and swrite are used for polling. Using sread and swrite makes 
the channel memory-less.

public boolean swrite(int value)
{
  if (writerready) return false;

  buffer[0]  = value;
  writeready = true;

  while(writeready) Thread.yield();     // delete this for the fast 
channel

  return true;
}

public boolean sread(int value[])
{
  if (!writeready) return false;

  value[0]    = buffer[0];
  writerready = false;

  return true;
}

The performance is between 2 and 1.1 ms. I got better performance with a 
memory channel of one element (on-place FIFO) which I used in the channel 
class of ComsTime (between 1.5 and 0.6 ms), see below.

/* APPROACH 4 */

public boolean wwrite(int value)
{
  while (writerready) Thread.yield();

  buffer[0]  = value;
  writeready = true;

  return true;
}

Methods wread, sread and swrite stay the same.

The return flag lets the client know if the read/write is succeeded or 
not. 
Method swrite returns false if the channel is not read or if the buffer 
is full, else it returns true. Method sread returns false if no value is 
written to the channel or if the buffer is empty. (resp. memoryless and 
memory channel). The swrite and sread are used for a polling mechanism to 
contruct parallel input and/or ouput behavior. 

The PAR-out construct is:

done1 = done2 = ... = donei = false;
do
{
  if (!done1) done1 = swrite(value1);
  if (!done2) done2 = swrite(value2);
  ..
  if (!donei) donei = swrite(valuei);
  if (!(done1 & done2 & .. & donei)) Thread.yield();
} while (!(done1 & done2 & .. & donei));

The PAR-in construct is:

done1 = done2 = ... = donei = false;
do
{
  if (!done1) done1 = sread(value1[]);
  if (!done2) done2 = sread(value2[]);
  ..
  if (!donei) donei = sread(valuei[]);
  if (!(done1 & done2 & .. & donei)) Thread.yield();
} while (!(done1 & done2 & .. & donei));

Both PAR-constructs can be mixed.

The PAR-in-out is:

done1 = done2 = ... = donei = false;
do
{
  if (!done1) done1 = sread(value1[]);
  if (!done2) done2 = swrite(value2);
  if (!done3) done3 = swrite(value3);
  if (!done4) done4 = sread(value4[]);
  ..
  if (!donei) donei = swrite(valuei);
  if (!(done1 & done2 & .. & donei)) Thread.yield();
} while (!(done1 & done2 & .. & donei));

The PAR-constructs are "busy-loops" by only one cycle. The 
"if(!(done1...donei)) Thread.yield();" forces a context switch when one 
of the done flags is false. If the loop has run once and one or more done 
flags are false an other run does not change the done flags. A context 
switch prevents this idle situation. If the loop is large then every run 
can takes time. This can decrease performance. As long as the PAR is 
short this technique performs well.

The variables value1[] and value4[] carry the value returned by sread. 
Because Java does not allow pointers and does not know a by reference 
modifier, a one-array is used to return the value.

The PRI-PAR, FAIR-ALT and PRI-ALT are can also be constructed with sread 
and swrite. More information about this I wrote in the paper Concurrent 
Programming in Java (draft). This paper describes new ideas and it will 
be updated from time to time.
(ftp//ftp.rt.el.utwente.nl/pub/Java/ComsTime/comstime.ps)

A nice example of sread and swrite is the advanced delta building block 
advDelta. advDelta can create parallel outputs by passing the number of 
outputs as an argument together with an array of channels. Extension at 
run-time could also be a feature.

Class advDelta ....
.....
.....
class advDelta extends Threads
{
  protected boolean keepRunning = true;
  protected Channel chanin, chanout[];
  int n;

  public AdvancedDelta(Channel ci, Channel co[], int cn)
  {
    chanin  = ci;
    chanout = co;
    n = cn;
  }

  public void run()
  {
    int x[] = new int[1];
    boolean done[] = new boolean[n];
    boolean alldone;
    int i;
    
    while (keepRunning)
    {
      chanin.wread(x);

      for (i=0; i<n; i++) done[i] = false;
      do
      {
        alldone = false;
        for (i=0; i<n; i++)
        {
          if (!done[i]) done[i] = chanout[i].swrite(x[0]);
          alldone = alldone | !done[i]; 
        }
        if (alldone) Thread.yield();
      } while(alldone);
    }
  }
}


In the same way you could make an advPlus, advMultiplexer, 
advDemultiplexer, or a neuron with many input and output channels. This 
construction does only work on channel input and output. Using a kicker 
would perform slower I believe. However a kicker could run formulas in 
parallel. 

PAR
  b := 2*a
  c := k*d*3.14

Running formulas in parallel is I believe something to be careful. 
Processors with instruction pipelining perform better on sequencial 
formulas (statements) then formulas running in random order. And there is 
no scheduler overhead.

SEQ
  b := 2*a
  c := k*d*3.14

The compiler optimizes these statements for the processor and cache. Nops 
in a instruction pipeline or extra cache flushes are time consuming.


DYNAMIC CHANNEL BEHAVIOR

The channel that came with ComsTime has three different behaviors (fast, 
fifo, lifo) but the code is hard to read and difficult to maintain as 
well. This is because I have experimented with this channel to get the 
best performance. 

Now, Im writing a channel with more object-oriented features which makes 
the channel extensible with there behaviors. My opinion is that a channel 
can have more behaviors than just an Occam channel. The behavior should 
be easy to be configured at run-time. For certain applications a 
particular channel could perform better or it could make the program 
behave more dynamically. You have to be careful with this. It think it is 
a research worth it.

Imagine a channel with a buffer where the buffer size can range between 
zero and the available memory; buffersize=n where 0 <= n <= m, m = 
available memory (not 1 <= n <= m). The buffer size determines the 
communication behavior. When buffersize = 0 then the channel behavior is 
fully synchronized. The channel is memoryless and behaves as an Occam 
channel. When buffersize = n and 1 <= n <= k (k is small, k=2) then the 
channel is quasi-synchronized. If the channel is empty or full the 
channel is synchronized. If the channel is not empty or not full the 
channel is asynchronuous. This is reader or writer process dependent. 
When buffersize>>1 (large enough) and the empty and full states are never 
reached, the channel behavior is fully asynchronuous. The buffer can also 
be of FIFO or LIFO type. The buffer is by default FIFO. The LIFO type can 
be used as a stack mechanism. The channel can be configured as a 
memory-less or as a memory channel.
        
Declaration of a channel is as follows. Here the Channel declaration is 
for integer data types. Channel configurations are given by arguments.   
Examples are,

        Channel ch1 = new Channel();        // memoryless channel 
                                            //(Occam channel)
        Channel ch1 = new Channel(0);       // same as previous
        Channel ch1 = new Channel(2);       // memory channel with two 
                                            // buffer elements (FIFO)
        Channel ch1 = new Channel(2,FIFO);  // same as previous memory
        Channel ch1 = new Channel(2,LIFO);  // channel with two buffer
                                            // elements (LIFO) where data
                                            // is read in reverse order

The behavior of channels can be adjusted at run-time by statements as,

        ch1.setBufferSize(1);               // reconfigure buffer size to 
1
                                            // (solve deadlock situation)

There are other applications, such as buffer is to small, make it larger.

There are more features to build in. Think of debugging and monitoring 
channels at run-time.

Now I have four behaviors:

1. rendezvous channel (occam channel)
2. fast channel       (one-element buffer)
3. fifo channel       (n-element queue buffer)
4. lifo channel       (n-element stack buffer)

The same wread/wwrite and sread/swrite methods can be used for all the 
different channels. No radical changes in the program are necessary.

IMPORTANT

There are some limits to this technique. Method synchronization is 
impossible because Thread.yield() does not release the monitor. Therefor 
a shared buffer with multiple readers and writers is impossible to 
implement. In addition, threads can only yield the CPU to other threads 
of the same priority. Attempts to yield to a lower priority are ignored.

This conclusion is interesting. 

The performance and the flexibility of this polling technique drove my in 
this direction.


Regards,

Gerald.