[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Something for the weekend ...



In the `something-for-the-weekend' posting (March 19th, 1999), I noted that
we had managed to deadlock the JavaPP ALT algorithm:

  http://www.hensa.ac.uk/parallel/groups/wotug/java/discussion/4.html

that was published some 2.5 years earlier and had survived rather well until
then.

It was left as a challenge to the reader to find the flaw (a race hazard) and
correct it, saying I would publish my version at WoTUG-22.  Unfortunately,
there being so many other exciting things happening, I never got around to
doing this - despite the exceedingly late nights ;-)

Somewhat belatedly then, here is the `corrected' algorithm currently used
in the JCSP library ... `correct', that is, unless Jeremy's FDR check tells
us otherwise!!!

Note: the `corrections' to the original version are:

  o removing the global synchronized from the Alternative.select method and
    imposing it only around the ALT WAIT operation;

  o making the Channel.disable synchronized.

That may *sound* simple but it took over 4 days full-time thinking ... :-(.
There were numerous ever-more-complex-and-desperate attempts to secure the
race hazard before this relatively simple `solution' was realised.  In fact,
this new algorithm is closer to the transputer's micro-coded ALT instructions
than the original :-) ...

One point about this new algorithm: during the ENABLE sequence, there can be
parallel *writes* to the same variable.  The ALTing process may write `ready'
to the `state' variable (currently holding `enabling') and so might a process
writing to a channel that had been enabled earlier in the sequence.  The latter
write is synchronized on the `alt' object (see the `schedule' method) but the
former is not - so both could commit to doing this.  That could be prevented
by synchronising the ALTing process' write on the `alt' object as well.  But
I don't believe that is necessary - parallel writes of the same value ought
to be safe ...

I could explain the original race hazard that lead to deadlock and why this
new version avoids that ... but no time now.  I have a fairly comfortable
feeling, based on informal reasoning, that it is now correct ... but I had
that before with the old one!  Anyway, it passes the prolonged stress tests
that broke the original.  Hopefully, Jeremy will now be able to verify it.

This logic is similar to the things Kevin Vella had to get to grips with for
the SMP version of KRoC occam.  It's very hard.  Anyone working directly with
Java threads at the level of synchronized/wait/notify *has* to get involved
with this stuff ... and the complexity seems to grow exponentially with the
number of synchronized/wait/notifies that are mutually dependent.  I'm hoping
not to have to do that any more ;-) ...

Cheers,

Peter.

PS.   Jeremy: note that the enclosed Channel is actually a many-1 channel.
      It may simplify things, initially, to make it a 1-1 channel by removing
      the write_monitor.

PPS.  Jeremy: note also that the `read' method below dispenses with the
      `local' variable used in the paranoia section on page 274 of WoTUG-21.
      Because I now know that JVMs do *not* release the object lock after
      invoking a notify (which I find a little peculiar), the concern
      expressed in the second paragraph on that page disappears and we
      can get rid of `local'.  The CSP model of Java notify reflects this,
      so you should be able to re-verify the 1-1/many-1 channels with that
      simplification :-)

      The first paragraph, though, still describes a real problem - so we
      still need the extra notify/wait in the case when the reader is first
      to the channel.  It would be interesting to apply FDR to the code on
      page 273 and see it fail ...

PPS.  Jeremy: please hold off verifying this ALT until I've replied to your
      posting from yesterday!  I want to argue some more about the merits
      of the LEFT-only model for a JCSPCHANNEL versus the LEFT-RIGHT one.

==============================================================================

public class Alternative {

  private static final int enabling = 0;
  private static final int waiting = 1;
  private static final int ready = 2;
  private static final int inactive = 3;

  private int state = inactive;

  public int select (Channel[] c) throws InterruptedException {

    int selected = -999999;    // this value should *never* be returned!
    int i;

    state = enabling;                              // ALT START

    for (i = 0; i < c.length; i++) {
      if (c[i].enable (this)) {                    // ENABLE CHANNEL
        state = ready;
        selected = i;
        break;
      }
    }

    synchronized (this) {
      if (state == enabling) {                     // ALT WAIT
        state = waiting;
        wait ();
        state = ready;
      }
    }

    // assert : state == ready

    for (i--; i >= 0; i--) {
      if (c[i].disable ()) {                       // DISABLE CHANNEL
        selected = i;
      }
    }

    state = inactive;

    return selected;                               // ALT END

  }

  synchronized void schedule () {
    switch (state) {
      case enabling:
        state = ready;
      break;
      case waiting:
        state = ready;
        notify ();
      break;
      // case ready: case inactive:
      // break
    }
  }

}



///////////////////////////////////////////////////////////////////////////



public class Channel {

  private int channel_hold;                // buffer (not detectable to users)

  private boolean channel_empty = true;    // synchronisation flag

  private Object write_monitor =           // all writers multiplex through this
    new Object ();

  private Alternative alt;                 // state of reader

  public synchronized int read () throws InterruptedException {
    if (channel_empty) {
      channel_empty = false;               // first to the rendezvous
      wait ();                             // wait for the writer thread
      notify ();                           // schedule the writer to finish
    } else {
      channel_empty = true;                // second to the rendezvous
      notify ();                           // schedule the waiting writer thread
    }
    return channel_hold;
  }

  public void write (int n) throws InterruptedException {
    synchronized (write_monitor) {
      synchronized (this) {
        channel_hold = n;
        if (channel_empty) {
          channel_empty = false;           // first to the rendezvous
          if (alt != null) {               // the reader is ALTing on this Channel
            alt.schedule ();
          }
          wait ();                         // wait for the reader thread
        } else {
          channel_empty = true;            // second to the rendezvous
          notify ();                       // schedule the waiting reader thread
          wait ();                         // let the reader regain this monitor
        }
      }
    }
  }

  synchronized boolean enable (Alternative alt) {
    if (channel_empty) {
      this.alt = alt;
      return false;
    } else {
      return true;
    }
  }

  synchronized boolean disable () {
    alt = null;
    return !channel_empty;
  }

}