[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Mobile variables



Mobile Variables
================

This is so obvious that it must have been suggested before?

A common objection to message passing in shared memory is that it involves
redundant copying. A large buffer in shared memory passed across a channel
will be copied. To avoid that, we might turn off usage checking, allow
concurrent access, and impose CREW by passing an access token across a
channel. Or we might pass an ARRAY pointer. Such techniques are well known
in the occam community.

But a simple language extension puts this on a sound footing and makes code
transparent and elegant. Introduce channels that transport *variables*.

Doing this in a very general way can present problems with static memory
allocation, aliasing, usage checking and transportation of variables out
of their scope.

But the slightly restricted version below seems to overcome those problems.

If a variable x is sent on a channel c, it is as if it is taken out of
scope. So after   c ! VAR x , x is no longer accessible.

After the matching c ? VAR x, it is as if x has come into scope.
"As if" is used above in case "scope" is taken to be static and textual.
So this "dynamic scoping" is really a dynamic usage rule.
Notice that we require the input to use the same name x: we must avoid
aliasing. And it conforms to static textual scoping: x remains in scope
in the existing sense.

Consider:

--{{{  example
MOBILE INT x:    -- "MOBILE" is redundant but avoids the compiler needing
                 -- to search forward. "SHARED" is an alternative.
CHAN OF VAR x c: -- must reference existing variable(s) in scope
                 -- (This alone is enough to imply that x is MOBILE)

CLAIM x        -- like a PAR, but claims exclusive access for first process
  SEQ
    x := 42
    c ! VAR x  -- export access rights, in effect exporting the variable
  SEQ
    c ? VAR x  -- receive exclusive access right to x
    ...        --  presumably use x for something here

-- x is implicitly released here at the end of the scope of the CLAIM.
-- Thus x is now subject to normal usage rules unless another CLAIM is
-- made.
--}}}

The idea is that the movable variable (x above) is declared in outer scope.
This restricts the construct to a shared memory, although the motivation
was to permit static memory allocation.

The declaration
MOBILE INT x:
is designed to allow the compiler to determine that the variable x will be
used in a VAR channel (or at least CLAIMed) without having to look ahead.
Since the idea has similarities to a SHARED channel, perhaps this should be
SHARED INT x:    ?
But SHARED seems to have exactly the wrong connotations here.
This special declaration is intended to facilitate one possible
implementation strategy where the compiler reserves storage for the variable
together with a flag. Perhaps "MOBILE/SHARED" is unnecessary?


The declaration
CHAN OF VAR x c:     -- does this need brackets: CHAN OF (VAR x) c : ?
which specifies the variable(s) explicitly rather than
CHAN OF VAR INT c:
provides a simple way to avoid variables being exported out of their scope.
However this needs care when abbreviations are involved including formal
parameters of procedures.

The CLAIM syntax is really a special sort of PAR (with a PRI CLAIM
matching a PRI PAR). It allows the compiler to determine which sort
of usage check to apply.

In the scope of
CLAIM v1,v2,v3
the variables v1,v2 and v3 must be held exclusively by a single branch.
If any of v1,v2 and v3 were not already CLAIMed, then they are
owned by the first component until any are moved by a VAR communication.
At the end of the CLAIM, any of the  variables still held are released to
the enclosing environment. So a CLAIM has two functions:-
a) to determine the usage rules;  and
b) to make the initial allocation of ownership.

The compiler applies two usage rules:
1) the usual CREW for components of any embedded (PRI) PARs augmented
   to exclude any VAR communications;
2) the new dynamic exclusive rule which will usually be implemented by
   code generation which applies to the components of a CLAIM.


When applying rule 1) to an embedded PAR, the compiler regards a CLAIM v
as equivalent to a local declaration
(type) v:
This permits the compiler to use the existing static usage checks at compile
time. Run time code determines whether v is held: if it is not, any
reference is an error and behaves as STOP.

Although a optimising compiler might be able to dispense with a
CLAIM v1,v2,v3
altogether when applying rule 2 to simple cases, rule 2 will usually be
implemented at run time by the code generated with minimal overhead. Thus
most compilers can simply omit any static usage checks on CLAIMed
variables not appearing in an embedded PAR.

Outside the CLAIM, rule 1  applies as usual.
So a variable can be accessed under both sharing schemes: otherwise
there might still be some residual redundant copying in certain
applications.

VAR communications might be implemented by passing
a pointer to the MOBILE variable, or a token of the access rights.

The VARs  in   c ! VAR x  and c ? VAR x  are strictly redundant: the
channel type determines that the objects must be variables. But the
explicit VARs make it easier to read and understand the code.

PAR
  c1 ! VAR x
  c2 ! VAR x
is banned for obvious reasons. But so is
PAR
  c ! VAR x
  y := 5
because the communicating branch does not have exclusive access to x.
This should be
CLAIM x
  c ! VAR x  -- an error if the CLAIM was not successful
  y := 5
which shows why embedded CLAIMs on the same variable are permitted.
VAR communications are only permitted where there is exclusive access
obtained by a local CLAIM: local here means that the closest enclosing
parallel constructor is a (PRI) CLAIM of the relevant variable.
Likewise
CLAIM x
  c ? VAR x  -- this one is an error if the CLAIM succeeded
  y := 5
is valid. If the CLAIM on x succeeds, then c ? VAR x is an error and
behaves as STOP. But the CLAIM on x is expected to fail: its purpose here
is to inform the compiler to generate mutually exclusive code in any
reference to x.


In general the compiler cannot track the dynamic movement of a variable
by static analysis: attempting to output a variable that is not held at
run time is an error, and compiles as STOP.
If a program needs to test for access rights, it can use code like:
IF
  x = x           -- the compiler compiles access test
    c ! VAR x     -- which it needs anyway here, so it might conflate?
  TRUE
    ...  whatever
Or is that too obscure? The alternative seems to be yet another key word,
OWN x, maybe?  :-(  But it would give the compiler a much better opportunity
to optimize code like that above.


At the end of a CLAIM block, if the variable is "held" it is passed up
to the next enclosing CLAIM if any. Otherwise, the shared variable reverts
to its normal usage matching the textual scope.

Both CLAIMs and CHANs can use lists of variables, including arrays.
Examples:

CHAN OF VAR x1,x2,x3,x4 c :  -- allows c to move only these
CLAIM p,q,r,s,t              -- CLAIMs 5 variables

Inputs on VAR channels involving several variables use variant protocol
syntax. Notice that there is no need for an implementation to employ
tags. If access rights are passed by token, these are equivalent to tags.
If variable address/pointers are passed, these also are their own
identifiers. The second example at the end of this note illustrates
the CASE syntax.

--{{{  Semantics

An informal account can be given in terms of passing the address/pointer
to the shared variable.
Outside a claim the pointer/address of the variable is kept in outer scope.
A CLAIM moves that pointer into the "local scope"/workspace  of the first
branch. It may be moved on an appropriate VAR channel: the sender may
not retain a copy. At the end of the CLAIM, the pointer is moved back to
outer scope. Such a scheme might use a Not_a_Pointer value for empty
slots. Such a run time strategy means that the compiler need do no usage
checking within CLAIMs.

Another implementation could involve access tokens, especially if
addresses were expensive to move.
A MOBILE variable might have a boolean stored with it: if TRUE,
the variable is "present": subject to the usual rules: CREW usage checking
in PARs.
However, a CLAIM results in the associated value being FALSE which
means that some process has exclusive access to the variable.
The boolean becomes TRUE again only when the variable is RELEASEd.

--}}}

The example below illustrates several points, including the
possible need to include extra type specifications in CHAN OF VAR formal
parameters.
Care needs to be taken to avoid unintentional aliasing of MOBILE
variable names: recall that the meaning of formal parameters is defined
by abbreviation.

--{{{  Another example
[2][rows][cols]INT image:

--{{{  produce
PROC produce(CHAN OF VAR [2][rows][cols]INT image[0],
                         [2][rows][cols]INT image[1] in,out)
  INT ping,pong:
  SEQ
    ping := 1
    WHILE TRUE
      SEQ
        pong := ping
        ping := ping >< 1
        CLAIM image[ping]
          SEQ
            ...  fill image[ping] from camera
            out ! VAR image[ping]
          CLAIM image[pong]
            in ? CASE VAR image[pong]
:
--}}}
--{{{  consume
PROC consume(CHAN OF VAR [2][rows][cols]INT image[0],
                         [2][rows][cols]INT image[1] in,out:)
  INT ping,pong:
  SEQ
    ping := 1
    WHILE TRUE
      SEQ
        pong := ping
        ping := ping >< 1
        CLAIM image[pong}
          out ! VAR image[pong]
          CLAIM image[ping]
            SEQ
              in ? CASE VAR image[ping]
              ...  process
:
--}}}

CHAN OF VAR image[0],image[1] up,down:

CLAIM image[0]        -- we expect this to eliminate an indirection
  produce(up,down)
  CLAIM image[1]
    consume(up,down)
    SKIP

-- If there were more than 2 buffers, or if we wished to make things more
-- symmetrical, we could have a process that CLAIMed all the buffers, and
-- then distributed the variables down VAR channels to the components...
-- Or produce(..) could CLAIM both, and send one on initially. Roll your
-- own.
--}}}

--{{{  Questions

1) What have I overlooked? What pathologies can arise?

2) Should  c ! VAR x   and  c ? VAR x be written as c ! x and c ? x.
   Do we need to flag the nature of the communication to make the code
   easier to understand?

3) Is the typing right?

4) Using explicit variable names in the VAR CHAN protocol solves some
   problems, but makes it very easy to introduce aliasing in the
   formal parameters of PROCedures. Is there a better way?

5) Is implementation feasible and simple?

6) Has something like this or better already been suggested?
--}}}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These are thoughts to get discussion going. I have changed my mind several
times today alone about some aspects, so I might want to retract or modify.
And I have probably overlooked something or other. 

But has anything of this sort been done before? Is there anything similar
in other languages? Is this interesting/useful/promising?

Adrian
-- 
A E Lawrence, MA., DPhil.  	adrian.lawrence@xxxxxxxxxxxxxx
MicroProcessor Unit, 13, Banbury Road, Oxford. OX2 6NN. UK.                
Voice: (+44)-1865-273274,  Fax: (+44)-1865-273275