Initial revision

This commit is contained in:
ceriel 1987-02-26 10:26:19 +00:00
parent 00b8ca7b58
commit 690cd32023
12 changed files with 785 additions and 0 deletions

17
doc/occam/Makefile Normal file
View file

@ -0,0 +1,17 @@
EMHOME=../..
FILES= p0 p1 p2 p3 p4 p5 p6 p7 p8 p9
PIC=pic
EQN=eqn
TBL=tbl
../occam.doc: p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 channel.h.t channel.c.t
cat $(FILES) | $(PIC) | $(TBL) | $(EQN) > $@
channel.h.t: $(EMHOME)/h/ocm_chan.h
ctot <$(CDIR)/h/ocm_chan.h >channel.h.t
channel.c.t: channel.c
ctot <channel.c >channel.c.t
channel.c: $(EMHOME)/lang/occam/lib/tail_ocm.a
arch x tail_ocm.a channel.c

8
doc/occam/ctot Executable file
View file

@ -0,0 +1,8 @@
sed 's/^$/.sp 0.5/
s/\\/\\e/g
s/^ $/.ft\
.DE\
.bp\
.DS\
.ft 5\
.ta 0.65i 1.3i 1.95i 2.6i 3.25i 3.9i 4.55i 5.2i 5.85i 6.5i/'

21
doc/occam/p0 Normal file
View file

@ -0,0 +1,21 @@
.pl 11.7i
.ND
.de PT
.if \\n%>0 .if e .tl '\fB%\fP'''
.if \\n%>1 .if o .tl '''\fB%\fP'
..
.TL
An Occam Compiler
.AU
Kees Bot
Edwin Scheffer
.AI
Vrije Universiteit
Amsterdam, The Netherlands
.AB
This document describes the implementation of an \fBOccam\fP to \fBEM\fP
compiler. The lexical analysis is done using \fBLex\fP.
For the semantic analysis the extended LL(1) parser generator \fBLLgen\fP is
used. To handle the Occam-specific features as channels and parallelism some
library routines are required.
.AE

87
doc/occam/p1 Normal file
View file

@ -0,0 +1,87 @@
.NH
Introduction
.PP
Occam [1] is a programming language which is based on the concepts of
concurrency and communication. These concepts enable today's applications of
microprocessors and computers to be implemented more effectively.
.PP
An Occam program consists of a (dynamically determined) number
of processes communicating through channels.
To communicate with the outside world some predefined channels are needed.
A channel has only one writer and one reader; it carries machine words and
bytes, at the reader/writer's discretion. The process with its communication
in Occam replaces the procedure with parameters in other languages (there are
no procedures in Occam).
.PP
In addition to the normal assignment statement, Occam has two more
information-transfer statements, the input and the output:
.DS
.ft 5
chan1 ? x -- reads a value from chan1 into x
chan2 ! x -- writes the value of x onto chan2
.ft
.DE
Both the outputting and the inputting processes wait until the other is there.
Channels are declared and given names. Arrays of channels are possible.
.PP
Processes come in 5 varieties: sequential, parallel, alternative,
conditional and repetitive. A process starts with a reserved word telling
its nature, followed by an indented list of other processes. (Indentation
is used to indicate block structure.) It may be preceded by declarations.
The processes in a sequential/parallel process are executed sequentially/in
parallel. The processes in an alternative process have guards based on the
availability of input; the first to be ready is executed (this is waiting
for multiple input). The conditional and repetitive processes are normal
\fBIF\fPs and \fBWHILE\fPs.
.PP
\fIProducer-consumer example:\fP
.DS
.ft 5
.nf
CHAN buffer: -- declares the channel buffer
PAR
WHILE TRUE -- the producer
VAR x: -- a local variable
SEQ
produce(x) -- in some way
buffer ! x -- and send it
WHILE TRUE -- the consumer
VAR x:
SEQ
buffer ? x -- get a value
consume(x) -- in some way
.ft
.fi
.DE
.bp
.PP
Processes can be replicated from a given template; this combines
with arrays of variables and/or channels.
.PP
\fIExample: 20 window-sorters in series:\fP
.DS
.ft 5
.nf
CHAN s[20]: -- 20 channels
PAR i = [ 0 FOR 19 ] -- 19 processes
WHILE TRUE
VAR v1, v2:
SEQ
s[i] ? v1; v2 -- wait for 2 variables from s[i]
IF
v1 <= v2 -- ok
s[i+1] ! v1; v2
v1 > v2 -- reorder
s[i+1] ! v2; v1
.fi
.ft
.DE
.PP
A process may wait for a condition, which must include a comparison
with \fBNOW\fP, the present clock value.
.PP
Processes may be distributed over several processors; all processes
under a \fBVAR\fP declaration must run on the same processor. Concurrency can be
improved by avoiding \fBVAR\fP declarations, and replacing them by \fBCHAN\fP
declarations. Processes can be allocated explicitly on named processors and
channels can be connected to physical ports.

151
doc/occam/p2 Normal file
View file

@ -0,0 +1,151 @@
.NH
The Compiler
.PP
The compiler is written in \fBC\fP using LLgen and Lex and compiles
Occam programs to EM code, using the procedural interface as defined for EM.
In the following sub-sections we describe the LLgen parser generator and
the aspect of indentation.
.NH 2
The LLgen Parser Generator
.PP
LLgen accepts a Context Free syntax extended with the operators `\f5*\fP', `\f5?\fP' and `\f5+\fP'
that have effects similar to those in regular expressions.
The `\f5*\fP' is the closure set operator without an upperbound; `\f5+\fP' is the positive
closure operator without an upperbound; `\f5?\fP' is the optional operator;
`\f5[\fP' and `\f5]\fP' can be used for grouping.
For example, a comma-separated list of expressions can be described as:
.DS
.ft 5
expression_list:
expression [ ',' expression ]*
;
.ft
.DE
.LP
Alternatives must be separated by `\f5|\fP'.
C code (``actions'') can be inserted at all points between the colon and the
semicolon.
Variables global to the complete rule can be declared just in front of the
colon enclosed in the brackets `\f5{\fP' and `\f5}\fP'. All other declarations are local to
their actions.
Nonterminals can have parameters to pass information.
A more mature version of the above example would be:
.DS
.ft 5
expression_list(expr *e;) { expr e1, e2; } :
expression(&e1)
[ ',' expression(&e2)
{ e1=append(e1, e2); }
]*
{ *e=e1; }
;
.ft
.DE
As LLgen generates a recursive-descent parser with no backtrack, it must at all
times be able to determine what to do, based on the current input symbol.
Unfortunately, this cannot be done for all grammars. Two kinds of conflicts
are possible, viz. the \fBalternation\fP and \fBrepetition\fP conflict.
An alternation confict arises if two sides of an alternation can start with the
same symbol. E.g.
.DS
.ft 5
plus: '+' | '+' ;
.ft
.DE
The parser doesn't know which `\f5+\fP' to choose (neither do we).
Such a conflict can be resolved by putting an \fBif-condition\fP in front of
the first conflicting production. It consists of a \fB``%if''\fP followed by a
C-expression between parentheses.
If a conflict occurs (and only if it does) the C-expression is evaluated and
parsing continues along this path if non-zero. Example:
.DS
.ft 5
plus:
%if (some_plusses_are_more_equal_than_others())
'+'
|
'+'
;
.ft
.DE
A repetition conflict arises when the parser cannot decide whether
``\f5productionrule\fP'' in e.g. ``\f5[ productionrule ]*\fP'' must be chosen
once more, or that it should continue.
This kind of conflicts can be resolved by putting a \fBwhile-condition\fP right
after the opening parentheses. It consists of a \fB``%while''\fP
followed by a C-expression between parentheses. As an example, we can look at
the \fBcomma-expression\fP in C. The comma may only be used for the
comma-expression if the total expression is not part of another comma-separated
list:
.DS
.nf
.ft 5
comma_expression:
sub_expression
[ %while (not_part_of_comma_separated_list())
',' sub_expression
]*
;
.ft
.fi
.DE
Again, the \fB``%while''\fP is only used in case of a conflict.
.LP
Error recovery is done almost completely automatically. All you have to do
is to write a routine called \fILLmessage\fP to give the necessary error
messages and supply information about terminals found missing.
.NH 2
Indentation
.PP
The way conflicts can be resolved are of great use to Occam. The use of
indentation, to group statements, leads to many conflicts because the spaces
used for indentation are just token separators to the lexical analyzer, i.e.
``white space''. The lexical analyzer can be instructed to generate `BEGIN' and
`END' tokens at each indentation change, but that leads to great difficulties
as expressions may occupy several lines, thus leading to indentation changes
at the strangest moments. So we decided to resolve the conflicts by looking
at the indentation ourselves. The lexical analyzer puts the current indentation
level in the global variable \fIind\fP for use by the parser. The best example
is the \fBSEQ\fP construct, which exists in two flavors, one with a replicator
and one process:
.DS
.nf
.ft 5
seq i = [ 1 for str[byte 0] ]
out ! str[byte i]
.ft
.fi
.DE
and one without a replicator and several processes:
.DS
.nf
.ft 5
seq
in ? c
out ! c
.ft
.fi
.DE
The LLgen skeleton grammar to handle these two is:
.DS
.nf
.ft 5
SEQ { line=yylineno; oind=ind; }
[ %if (line==yylineno)
replicator
process
|
[ %while (ind>oind) process ]*
]
.ft
.fi
.DE
This shows clearly that, a replicator must be on the same line as the \fBSEQ\fP,
and new processes are collected as long as the indentation level of each process
is greater than the indentation level of \fBSEQ\fP (with appropriate checks on this
identation).
.PP
Different indentation styles are accepted, as long as the same amount of spaces
is used for each indentation shift. The ascii tab character sets the indentation
level to an eight space boundary. The first indentation level found in a file
is used to compare all other indentation levels to.

337
doc/occam/p3 Normal file
View file

@ -0,0 +1,337 @@
.NH
Implementation
.PP
It is now time to describe the implementation of some of the occam-specific
features such as channels and \fBNOW\fP. Also the way communication with
UNIX\(dg is performed must be described.
.FS
\(dg UNIX is a trademark of Bell Laboratories
.FE
For a thorough description of the library routines to simulate parallelism,
which are e.g. used by the channel routines and by the \fBPAR\fP construct
in Appendix B, see [6].
.NH 2
Channels
.PP
There are currently two types of channels (see Figure 1.) indicated by the type
field of a channel variable:
.IP -
An interprocess communication channel with two additional fields:
.RS
.IP -
A synchronization field to hold the state of an interprocess communication
channel.
.IP -
An integer variable to hold the value to be send.
.RE
.IP -
An outside world communication channel. This is a member of an array of
channels connected to UNIX files. Its additional fields are:
.RS
.IP -
A flags field holding a readahead flag and a flag that tells if this channel
variable is currently connected to a file.
.IP -
A preread character, if readahead is done.
.IP -
An index field to find the corresponding UNIX file.
.RE
.LP
.PS
box ht 3.0 wid 3.0
box ht 0.75 wid 0.75 with .nw at 1st box.nw + (0.5, -0.5) "Process 1"
box ht 0.75 wid 0.75 with .ne at 1st box.ne + (-0.5, -0.5) "Process 2"
box ht 0.75 wid 0.75 with .sw at 1st box.sw + (0.5, 0.5) "Process 3"
box ht 0.75 wid 0.75 with .se at 1st box.se + (-0.5, 0.5) "Process 4"
line right from 5/12 <2nd box.ne, 2nd box.se> to 3rd box
line right from 7/12 <2nd box.ne, 2nd box.se> to 3rd box
line right from 5/12 <4th box.ne, 4th box.se> to 5th box
line right from 7/12 <4th box.ne, 4th box.se> to 5th box
line down from 5/12 <2nd box.sw, 2nd box.se> to 4th box
line down from 7/12 <2nd box.sw, 2nd box.se> to 4th box
line down from 5/12 <3rd box.sw, 3rd box.se> to 5th box
line down from 7/12 <3rd box.sw, 3rd box.se> to 5th box
line right 1.0 from 5/12 <5th box.ne, 5th box.se>
line right 1.0 from 7/12 <5th box.ne, 5th box.se>
line left 1.0 from 5/12 <2nd box.nw, 2nd box.sw>
line left 1.0 from 7/12 <2nd box.nw, 2nd box.sw>
.PE
.DS C
\fIFigure 1. Interprocess and outside world communication channels\fP
.DE
The basic channel handling is done by \f5chan_in\fP and \f5chan_out\fP. All
other routines are based on them. The routine \f5chan_any\fP only checks if
there's a value available on a given channel. (It does not read this value!)
\f5C_init\fP initializes an array of interprocess communication channels.
.LP
The following table shows Occam statements paired with the routines used to
execute them.
.TS H
center, box;
c | c | c
lf5 | lf5 | lf5.
Occam statement Channel handling routine Called as
=
.sp 0.5
.TH
T{
.nf
CHAN c:
CHAN c[z]:
.fi
T} T{
.nf
c_init(c, z)
chan *c; unsigned z;
.fi
T} T{
.nf
c_init(&c, 1);
c_init(&c, z);
.fi
T}
.sp 0.5
_
.sp 0.5
T{
.nf
c ? v
.fi
T} T{
.nf
chan_in(v, c)
long *v; chan *c;
.fi
T} T{
.nf
chan_in(&v, &c);
.fi
T}
.sp 0.5
T{
.nf
c ? b[byte i]
.fi
T} T{
.nf
cbyte_in(b, c)
char *b; chan *c;
.fi
T} T{
.nf
cbyte_in(&b[i], &c);
.fi
T}
.sp 0.5
T{
.nf
c ? a[i for z]
.fi
T} T{
.nf
c_wa_in(a, z, c)
long *a; unsigned z; chan *c;
.fi
T} T{
.nf
c_wa_in(&a[i], z, &c);
.fi
T}
.sp 0.5
T{
.nf
c ? a[byte i for z]
.fi
T} T{
.nf
c_ba_in(a, z, c)
long *a; unsigned z; chan *c;
.fi
T} T{
.nf
c_ba_in(&a[i], z, &c);
.fi
T}
.sp 0.5
_
.sp 0.5
T{
.nf
c ! v
.fi
T} T{
.nf
chan_out(v, c)
long *v; chan *c;
.fi
T} T{
.nf
chan_out(&v, &c);
.fi
T}
.sp 0.5
T{
.nf
c ! a[i for z]
.fi
T} T{
.nf
c_wa_out(a, z, c)
long *a; unsigned z; chan *c;
.fi
T} T{
.nf
c_wa_out(&a[i], z, &c);
.fi
T}
.sp 0.5
T{
.nf
c ! a[byte i for z]
.fi
T} T{
.nf
c_ba_out(a, z, c)
long *a; unsigned z; chan *c;
.fi
T} T{
.nf
c_ba_out(&a[i], z, &c);
.fi
T}
.sp 0.5
_
.sp 0.5
T{
.nf
alt
c ? ....
....
.fi
T} T{
.nf
int chan_any(c)
chan *c;
.fi
T} T{
.nf
deadlock=0;
for(;;) {
if (chan_any(&c)) {
....
....
.fi
T}
.sp 0.5
.TE
The code of \f5c_init\fP, \f5chan_in\fP, \f5chan_out\fP and \f5chan_any\fP
can be found in Appendix A.
.NH 3
Synchronization on interprocess communication channels
.PP
The synchronization field can hold three different values indicating the
state the channel is in:
.IP "- \fBC\(ulS\(ulFREE\fP:" 15
Ground state, channel not in use.
.IP "- \fBC\(ulS\(ulANY\fP:" 15
Channel holds a value, the sending process is waiting for an acknowledgement
about its receipt.
.IP "- \fBC\(ulS\(ulACK\fP:" 15
Channel data has been removed by a receiving process, the sending process can
set the channel free now.
.LP
A sending process cannot simply wait until the channel changes state C\(ulS\(ulANY
to state C\(ulS\(ulFREE before it continues. There is a third state needed to prevent
a third process from using the channel before our sending process is
acknowledged. Note, however that it is not allowed to use a channel for input
or output in more than one parallel process. This is too difficult to check
in practice, so we tried to smooth it a little.
.NH 2
NOW
.PP
\fBNOW\fP evaluates to the current time returned by the time(2) system call.
The code is simply:
.DS
.ft 5
.nf
long now()
{
deadlock=0;
return time((long *) 0);
}
.fi
.ft
.DE
The ``deadlock=0'' prevents deadlocks while using the clock.
.NH 2
UNIX interface
.PP
To handle the communication with the outside world the following channels are
defined:
.IP -
\fBinput\fP, that corresponds with the standard input file,
.IP -
\fBoutput\fP, that corresponds with the standard output file,
.IP -
\fBerror\fP, that corresponds with the standard error file.
.IP -
\fBfile\fP, an array of channels that can be subscripted with an index
obtained by the builtin named process ``\f5open\fP''. Note that
\fBinput\fP=\fBfile\fP[0], \fBoutput\fP=\fBfile\fP[1] and
\fBerror\fP=\fBfile\fP[2].
.LP
Builtin named processes to open and close files are defined as
.DS
.nf
.ft 5
proc open(var index, value name[], mode[]) = ..... :
proc close(value index) = ..... :
.fi
.ft
.DE
To open a file `junk', write nonsense onto it, and close it, goes as follows:
.DS
.ft 5
.nf
var i:
seq
open(i, "junk", "w")
file[i] ! nonsense
close(i)
.fi
.ft
.DE
Errors opening a file are reported by a negative index, which is the
negative value of the error number (called \fIerrno\fP in UNIX).
.LP
Bytes read from or written onto these channels are taken from occam variables.
As these variables can hold more than 256 values, some negative values are used
to control channels. These values are:
.IP "- \fBEOF\fP" 9
(-1): Eof from file channel is read as -1.
.IP "- \fBTEXT\fP" 9
(-2): A -2 written onto any channel connected to a terminal puts this
terminal in the normal line oriented mode (i.e. characters typed are echoed
and lines are buffered before they are read).
.IP "- \fBRAW\fP" 9
(-3): A -3 written onto any channel connected to a terminal puts it in raw mode
(i.e. no echoing of typed characters and no line buffering).
.LP
To exit an Occam program, e.g. after an error, a builtin named process
\f5exit\fP is available that takes an exit code as its argument.
.NH 2
Replicators and slices
.PP
Both the base and the count of replicators like in
.DS
.ft 5
par i = [ base for count ]
.ft
.DE
may be arbitrary expressions. The count in array slices like in
.DS
.ft 5
c ? A[ base for count ]
.ft
.DE
must be a constant expression however, the base is again free.

42
doc/occam/p4 Normal file
View file

@ -0,0 +1,42 @@
.NH
Particular details
.NH 2
Lower case/Upper case
.PP
Keywords must be either fully written in lower case or in upper case, thus
\fBPAR\fP is equivalent to \fBpar\fP but \fBPar\fP is not a keyword. Identifiers
may be of mixed case. Different styles are used in our examples just to indicate
what's accepted by the compiler.
.NH 2
File inclusion
.PP
The C preprocessor is applied to the input file before
compilation, so that files containing useful \fBPROC\fP and \fBDEF\fP
declarations can be used in your program by using the \fB#include\fP-directive
of the preprocessor.
.NH 2
Substitution
.PP
Named processes are not textually substituted. A procedure call is used instead.
The semantics of occam substitution imply this by letting a global variable
(i.e. not declared inside the named process' body) be found where the named
process is defined and not where it is substituted.
.NH 2
ANY
.PP
According to the occam syntax the \fBANY\fP keyword may be the only argument of
an input or output process. Thus,
.DS
.ft 5
c ? ANY; x
.ft
.DE
is not allowed. Because it was easy to add, and it was used by some programs,
our compiler allows it. (If you prefer portability you are advised not to make
use of it.)
.NH 2
Configuration
.PP
The special configuration keywords like \fBPLACED\fP, \fBALLOCATE\fP, \fBPORT\fP
and \fBLOAD\fP are not implemented. Only \fBPRI\fP works because \fBPAR\fP and
\fBALT\fP work the same without it.

18
doc/occam/p5 Normal file
View file

@ -0,0 +1,18 @@
.NH
Conclusions
.PP
Writing the compiler was very straightforward using the LLgen parser generator.
Its extended grammar and its way of conflict resolving were of great use to us,
especially
the indentation handling could be implemented quite easily. The automatic
error recovery given by LLgen took a great weight of our shoulders.
.PP
A set of parallelism simulation routines makes implementing \fBPAR\fP constructs
very simple. And we consider it a necessity to have such a layer to shield the
compiler writer from these details.
.PP
The translation to EM code was fairly direct, no great tricks were needed to
make things work. Only the different sizes of words and pointers that are given
as parameters to the compiler must be carefully watched. Variables or pointers
must sometimes be handled with double word instructions for different word or
pointer sizes.

5
doc/occam/p6 Normal file
View file

@ -0,0 +1,5 @@
.NH
Acknowledgement
.PP
We want to thank Dick Grune for his description of Occam which is used
in the introduction.

23
doc/occam/p7 Normal file
View file

@ -0,0 +1,23 @@
.bp
.NH
References
.LP
.IP [1]
INMOS limited, \fIOCCAM Programming manual\fP, Prentice-Hall, 1984.
.IP [2]
C. J. H. Jacobs, \fISome Topics in Parser Generation\fP,
Informatica Rapport IR-105, Vrije Universiteit, Amsterdam, October 1985.
.IP [3]
B. W. Kernighan and D. M. Ritchie, \fIThe C Programming Language\fP,
Prentice-Hall, 1978.
.IP [4]
M. E. Lesk, \fILex - A Lexical Analyser Generator\fP, Comp. Sci. Tech. Rep.
No. 39, Bell Laboratories, Murrey Hill, New Jersey, October 1975.
.IP [5]
A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
\fIDescription of a Machine Architecture for use with Block Structured
Languages\fP, Informatica Rapport IR-81, Vrije Universiteit, Amsterdam, 1983.
.IP [6]
K. Bot and E. Scheffer, \fIA set of multi-process primitives for stack based
machines\fP, Vrije Universiteit, Amsterdam, 1986.
.LP

16
doc/occam/p8 Normal file
View file

@ -0,0 +1,16 @@
.bp
.NH
Appendix A: Implementation of the channel routines
.DS L
.ft 5
.ta 0.65i 1.3i 1.95i 2.6i 3.25i 3.9i 4.55i 5.2i 5.85i 6.5i
.so channel.h.t
.ft
.DE
.bp
.DS L
.ft 5
.ta 0.65i 1.3i 1.95i 2.6i 3.25i 3.9i 4.55i 5.2i 5.85i 6.5i
.so channel.c.t
.ft
.DE

60
doc/occam/p9 Normal file
View file

@ -0,0 +1,60 @@
.bp
.NH
Appendix B: Translation of a \fBPAR\fP construct to EM code using the library
routines to simulate parallelism
.PP
Translation of the parallel construct:
.DS
.ft 5
par
P0
par i = [ 1 for n ]
P(i)
.DE
is
.TS
center;
lf5 lf5.
lal -20 ; Assume 20 bytes of local variables at this moment
cal $parbegin ; Set up a process group
asp 4 ; Assume pointersize = 4
cal $parfork ; Split stack in two from local -20
lfr 4 ; Assume wordsize = 4
zne *23 ; One end jumps to second process, other continues here
lor 0 ; Static link
cal $P0
asp 4
bra *24 ; Jump to the outer parend
23
cal $parfork ; Fork off `par i = ...' process
lfr 4
zne *25 ; One end jumps to end of outer par
lal -20 ; Place break just above i
cal $parbegin ; Set up another process group for the P(i)
loc 1
stl -24 ; i:=1
lol n ; Assume n can be addressed this simply
stl -28 ; A nameless counter
bra *26 ; Branch to counter test
27
cal $parfork ; Fork off one P(i)
lfr 4
zne *28 ; One jumps away to increment i, the other calls P(i)
lol -24
lor 0
cal $P
asp 8
bra *29
28
inl -24 ; i:=i+1
del -28 ; counter:=counter-1
26
lol -28
zgt *27 ; while counter>0 repeat loop
29
cal $parend ; Wait for the P(i) to finish, then delete group
bra *24 ; Jump to the higher up meeting place with P0
25 ; Note that the bra will be optimized away
24
cal $parend ; Wait for both processes to end, then delete group
.TE