Initial revision

1984-06-29 11:21:50 +00:00 · 1984-06-29 11:21:50 +00:00 · 59c2380f85
commit 59c2380f85
parent 71c9695eae
11 changed files with 7326 additions and 0 deletions
--- a/doc/Makefile
+++ b/doc/Makefile
@ -0,0 +1,39 @@
+SUF=pr
+PRINT=cat
+RESFILES=cref.$(SUF) pcref.$(SUF) val.$(SUF) v7bugs.$(SUF) install.$(SUF)\
+ack.$(SUF) cg.$(SUF) regadd.$(SUF) peep.$(SUF) toolkit.$(SUF)
+NROFF=nroff
+
+cref.$(SUF):        cref.doc
+		tbl $? | $(NROFF) >$@
+v7bugs.$(SUF):      v7bugs.doc
+		$(NROFF) -ms $? >$@
+ack.$(SUF):         ack.doc
+		$(NROFF) -ms $? >$@
+cg.$(SUF):		cg.doc
+		$(NROFF) -ms $? >$@
+regadd.$(SUF):		regadd.doc
+		$(NROFF) -ms $? >$@
+install.$(SUF):     install.doc
+		$(NROFF) -ms $? >$@
+pcref.$(SUF):       pcref.doc
+		$(NROFF) $? >$@
+peep.$(SUF):	peep.doc
+		$(NROFF) -ms $? >$@
+val.$(SUF):         val.doc
+		$(NROFF) $? >$@
+toolkit.$(SUF):	toolkit.doc
+		$(NROFF) -ms $? >$@
+
+install cmp:
+
+pr:
+		@make "SUF="$SUF "NROFF="$NROFF "PRINT="$PRINT $(RESFILES) \
+			>make.pr.out 2>&1
+		@$(PRINT) $(RESFILES)
+
+opr:
+		make pr | opr
+
+clean:
+		-rm -f *.old $(RESFILES) *.t
--- a/doc/ack.doc
+++ b/doc/ack.doc
@ -0,0 +1,419 @@
+.nr LL 7.5i
+.tr ~
+.nr PD 1v
+.TL
+Ack Description File
+.br
+Reference Manual
+.AU
+Ed Keizer
+.AI
+Wiskundig Seminarium
+Vrije Universiteit
+Amsterdam
+.NH
+Introduction
+.PP
+The program \fIack\fP(I) internally maintains a table of
+possible transformations and a table of string variables.
+The transformation table contains one entry for each possible
+transformation of a file.
+Which transformations are used depends on the suffix of the
+source file.
+Each transformation table entry tells which input suffixes are
+allowed and what suffix/name the output file has.
+When the output file does not already satisfy the request of the
+user, with the flag \fB-c.suffix\fP, the table is scanned
+starting with the next transformation in the table for another
+transformation that has as input suffix the output suffix of
+the previous transformation.
+A few special transformations are recognized, among them is the
+combiner.
+A program combining several files into one.
+When no stop suffix was specified (flag \fB-c.suffix\fP) \fIack\fP
+stops after executing the combiner with as arguments the -
+possibly transformed - input files and libraries.
+\fIAck\fP will only perform the transformations in the order in
+which they are presented in the table.
+.LP
+The string variables are used while creating the argument list
+and program call name for
+a particular transformation.
+.NH
+Which descriptions are used
+.PP
+\fIAck\fP always uses two description files: one to define the
+front-end transformations and one for the machine dependent
+back-end transformations.
+Each description has a name.
+First the way of determining
+the name of the descriptions needed is described.
+.PP
+When the shell environment variable ACKFE is set \fIack\fP uses
+that to determine the front-end table name, otherwise it uses
+\fBfe\fP.
+.PP
+The way the backend table name is determined is more
+convoluted.
+.br
+First, when the last filename in the program call name is not
+one of \fIack\fP, \fIcc\fP, \fIacc\fP, \fIpc\fP or \fIapc\fP,
+this filename is used as the backend description name.
+Second, when the \fB-m\fP is present the \fB-m\fP is chopped of this
+flag and the rest is used as the backend description name.
+Third, when both failed the shell environment variable ACKM is
+used.
+Last, when also ACKM was not present the default backend is
+used, determined by the definition of ACKM in h/local.h.
+The presence and value of the definition of ACKM is
+determined at compile time of \fIack\fP.
+.PP
+Now, we have the names, but that is only the first step.
+\fIAck\fP stores a few descriptions at compile time.
+This descriptions are simply files read in at compile time.
+At the moment of writing this document, the descriptions
+included are: pdp, fe, i86, m68k2, vax2 and int.
+The name of a description is first searched for internally,
+then in the directory lib/ack and finally in the current
+directory of the user.
+.NH
+Using the description file
+.PP
+Before starting on a narrative of the description file,
+the introduction of a few terms is necessary.
+All these terms are used to describe the scanning of zero
+terminated strings, thereby producing another string or
+sequence of strings.
+.IP Backslashing 5
+.br
+All characters preceded by \e are modified to prevent
+recognition at further scanning.
+This modification is undone before a string is passed to the
+outside world as argument or message.
+When reading the description files the
+sequences \e\e, \e# and \e<newline> have a special meaning.
+\e\e translates to a single \e, \e# translates to a single #
+that is not
+recognized as the start of comment, but can be used in
+recognition and finally, \e<newline> translates to nothing at
+all, thereby allowing continuation lines.
+.nr PD 0
+.IP "Variable replacement"
+.br
+The scan recognizes the sequences {{, {NAME} and {NAME?text}
+Where NAME can be any combination if characters excluding ? and
+} and text may be anything excluding }.
+(~\e} is allowed of course~)
+The first sequence produces an unescaped single {.
+The second produces the contents of the NAME, definitions are
+done by \fIack\fP and in description files.
+When the NAME is not defined an error message is produced on
+the diagnostic output.
+The last sequence produces the contents of NAME if it is
+defined and text otherwise.
+.PP
+.IP "Expression replacement"
+.br
+Syntax:  (\fIsuffix sequence\fP:\fIsuffix sequence\fP=\fItext\fP)
+.br
+Example: (.c.p.e:.e=tail_em)
+.br
+If the two suffix sequences have a common member -~\&.e in this
+case~- the text is produced.
+When no common member is present the empty string is produced.
+Thus the example given is a constant expression.
+Normally, one of the suffix sequences is produced by variable
+replacement.
+\fIAck\fP sets three variables while performing the diverse
+transformations: HEAD, TAIL and RTS.
+All three variables depend on the properties \fIrts\fP and
+\fIneed\fP from the transformations used.
+Whenever a transformation is used for the first time,
+the text following the \fIneed\fP is appended to both the HEAD and
+TAIL variable.
+The value of the variable RTS is determined by the first
+transformation used with a \fIrts\fP property.
+.LP
+Two runtime flags have effect on the value of one or more of
+these variables.
+The flag \fB-.suffix\fP has the same effect on these three variables
+as if a file with that \fBsuffix\fP was included in the argument list
+and had to be translated.
+The flag \fB-r.suffix\fP only has that effect on the TAIL
+variable.
+The program call names \fIacc\fP and \fIcc\fP have the effect
+of an automatic \fB-.c\fB flag.
+\fIApc\fP and \fIpc\fP have the effect of an automatic \fB-.p\fP flag.
+.IP "Line splitting"
+.br
+The string is transformed into a sequence of strings by replacing
+the blank space by string separators (nulls).
+.IP "IO replacement"
+.br
+The > in the string is replaced by the output file name.
+The < in the string is replaced by the input file name.
+When multiple input files are present the string is duplicated
+for each input file name.
+.nr PD 1v
+.LP
+Each description is a sequence of variable definitions followed
+by a sequence of transformation definitions.
+Variable definitions use a line each, transformations
+definitions consist of a sequence of lines.
+Empty lines are discarded, as are lines with nothing but
+comment.
+Comment is started by a # character, and continues to the end
+of the line.
+Three special two-characters sequences exist: \e#, \e\e and
+\e<newline>.
+Their effect is described under 'backslashing' above.
+Each - nonempty - line starts with a keyword, possibly
+preceded by blank space.
+The keyword can be followed by a further specification.
+The two are separated by blank space.
+.PP
+Variable definitions use the keyword \fIvar\fP and look like this:
+.DS X
+   var NAME=text
+.DE
+The name can be any identifier, the text may contain any
+character.
+Blank space before the equal sign is not part of the NAME.
+Blank space after the equal is considered as part of the text.
+The text is scanned for variable replacement before it is
+associated with the variable name.
+.br
+.sp 2
+The start of a transformation definition is indicated by the
+keyword \fIname\fP.
+The last line of such a definition contains the keyword
+\fIend\fP.
+The lines in between associate properties to a transformation
+and may be presented in any order.
+The identifier after the \fIname\fP keyword determines the name
+of the transformation.
+This name is used for debugging and by the \fB-R\fP flag.
+The keywords are used to specify which input suffices are
+recognized by that transformation,
+the program to run, the arguments to be handed to that program
+and the name or suffix of the resulting output file.
+Two keywords are used to indicate which run-time startoffs and
+libraries are needed.
+The possible keywords are:
+.IP \fIfrom\fP
+.br
+followed by a sequence of suffices.
+Each file with one of these suffices is allowed as input file.
+Preprocessor transformations, those with the \fBP\fP property
+after the \fIprop\fP keyword, do not need the \fIfrom\fP
+keyword. All other transformations do.
+.nr PD 0
+.IP \fIto\fP
+.br
+followed by the suffix of the output file name or in the case of a
+linker -~indicated by C option after the \fIprop\fP keyword~-
+the output file name.
+.IP \fIprogram\fP
+.br
+followed by name of the load file of the program, a pathname most likely
+starts with either a / or {EM}.
+This keyword must be
+present, the remainder of the line
+is subject to backslashing and variable replacement.
+.IP \fImapflag\fP
+.br
+The mapflags are used to grab flags given to \fIack\fP and
+pass them on to a specific transformation.
+This feature uses a few simple pattern matching and replacement
+facilities.
+Multiple occurences of this keyword are allowed.
+This text following the keyword is
+subjected to backslashing.
+The keyword is followed by a match expression and a variable
+assignment separated by blank space.
+As soon as both description files are read, \fIack\fP looks
+at all transformations in these files to find a match for the
+flags given to \fIack\fP.
+The flags \fB-m\fP, \fB-o\fP,
+\fI-O\fP, \fB-r\fP, \fB-v\fP, \fB-g\fP, -\fB-c\fP, \fB-t\fP,
+\fB-k\fP, \fB-R\fP and -\f-.\fP are specific to \fIack\fP and
+not handed down to any transformation.
+The matching is performed in the order in which the entries
+appear in the definition.
+The scanning stops after first match is found.
+When a match is found, the variable assignment is executed.
+A * in the match expression matches any sequence of characters,
+a * in the right hand part of the assignment is
+replaced by the characters matched by
+the * in the expression.
+The right hand part is also subject to variable replacement.
+The variable will probably be used in the program arguments.
+The \fB-l\fP flags are special,
+the order in which they are presented to \fIack\fP must be
+preserved.
+The identifier LNAME is used in conjunction with the scanning of
+\fB-l\fP flags.
+The value assigned to LNAME is used to replace the flag.
+The example further on shows the use all this.
+.IP \fIargs\fP
+.br
+The keyword is followed by the program call arguments.
+It is subject to backslashing, variable replacement, expression
+replacement, line splitting and IO replacement.
+The variables assigned to by \fImapflags\P will probably be
+used here.
+The flags not recognized by \fIack\fP or any of the transformations
+are passed to the linker and inserted before all other arguments.
+.IP \fIprop\fB
+.br
+This -~optional~- keyword is followed by a sequence of options,
+each option is indicated by one character
+signifying a special property of the transformation.
+The possible options are:
+.DS X
+   <            the input file will be read from standard input
+   >            the output file will be written on standard output
+   p            the input files must be preprocessed
+   m            the input files must be preprocessed when starting with #
+   O            this transformation is an optimizer and may be skipped
+   P            this transformation is the preprocessor
+   C            this transformation is the linker
+.DE
+.IP \fIrts\fP
+.br
+This -~optional~- keyword indicates that the rest of the line must be
+used to set the variable RTS, if it was not already set.
+Thus the variable RTS is set by the first transformation
+executed which such a property or as a result from \fIack\fP's program
+call name (acc, cc, apc or pc) or by the \fB-.suffix\fP flag.
+.IP \fIneed\fP
+.br
+This -~optional~- keyword indicates that the rest of the line must be
+concatenated to the NEEDS variable.
+This is done once for every transformation used or indicated
+by one of the program call names mentioned above or indicated
+by the \fB-.suffix\fP flag.
+.br
+.nr PD 1v
+.NH
+Conventions used in description files
+.PP
+\fIAck\fP reads two description files.
+A few of the variables defined in the machine specific file
+are used by the descriptions of the front-ends.
+Other variables, set by \fack\fB, are of use to all
+transformations.
+.PP
+\fIAck\fP sets the variable EM to the home directory of the
+Amsterdam Compiler Kit.
+The variable SOURCE is set to the name of the argument that is currently
+being massaged, this is usefull for debugging.
+.br
+The variable M indicates the
+directory in mach/{M}/lib/tail_..... and NAME is the string to
+be defined by the preprocessor with -D{NAME}.
+The definitions of {w}, {s}, {l}, {d}, {f} and {p} indicate
+EM_WSIZE, EM_SSIZE, EM_LSIZE, EM_DSIZE, EM_FSIZE and EM_PSIZE
+respectively.
+.br
+The variable INCLUDES is used as the last argument to \fIcpp\fP,
+it is currently used to add the directory {EM}/include to
+the list of directories containing #include files.
+{EM}/include contains a few files used by the library routines
+for part III from the
+.UX
+manual.
+These routines are included in the kit.
+.PP
+The variables HEAD, TAIL and RTS are set by \fIack\fP and used
+to compose the arguments for the linker.
+.NH
+Example
+.sp 1
+description for front-end
+.DS X
+name cpp                        # the C-preprocessor
+        # no from, it's governed by the P property
+        to .i                   # result files have suffix i
+        program {EM}/lib/cpp    # pathname of loadfile
+        mapflag -I* CPP_F={CPP_F?} -I*          # grab -I.. -U.. and
+        mapflag -U* CPP_F={CPP_F?} -U*          # -D.. to use as arguments
+        mapflag -D* CPP_F={CPP_F?} -D*          # in the variable CPP_F
+        args {CPP_F?} {INCLUDES?} -D{NAME} -DEM_WSIZE={w} -DEM_PSIZE={p} \
+-DEM_SSIZE={s} -DEM_LSIZE={l} -DEM_FSIZE={f} -DEM_DSIZE={d} <
+                                # The arguments are: first the -[IUD]...
+                                #  then the include dir's for this machine
+                                #  then the NAME and size valeus finally
+                                #  followed by the input file name
+        prop >P                 # Output on stdout, is preprocessor
+end
+name cem                        # the C-compiler proper
+        from .c                 # used for files with suffix .c
+        to .k                   # produces compact code files
+        program {EM}/lib/em_cem # pathname of loadfile
+        mapflag -p CEM_F={CEM_F?} -Xp   # pass -p as -Xp to cem
+        mapflag -L CEM_F={CEM_F?} -l    # pass -L as -l to cem
+        args -Vw{w}i{w}p{p}f{f}s{s}l{l}d{d} {CEM_F?}
+                                # the arguments are the object sizes in
+                                # the -V... flag and possibly -l and -Xp
+        prop <>p                # input on stdin, output on stdout, use cpp
+        rts .c                  # use the C run-time system
+        need .c                 # use the C libraries
+end
+name decode                     # make human readable files from compact code
+        from .k.m               # accept files with suffix .k or .m
+        to .e                   # produce .e files
+        program {EM}/lib/em_decode      # pathname of loadfile
+        args <                  # the input file name is the only argument
+        prop >                  # the output comes on stdout
+end
+.DE
+
+.DS X
+Example of a backend, in this case the EM assembler/loader.
+
+var w=2                         # wordsize 2
+var p=2                         # pointersize 2
+var s=2                         # short size 2
+var l=4                         # long size 4
+var f=4                         # float size 4
+var d=8                         # double size 8
+var M=int                       # Unused in this example
+var NAME=int22                  # for cpp (NAME=int results in #define int 1)
+var LIB=mach/int/lib/tail_      # part of file name for libraries
+var RT=mach/int/lib/head_       # part of file name for run-time startoff
+var SIZE_FLAG=-sm               # default internal table size flag
+var INCLUDES=-I{EM}/include     # use {EM}/include for #include files
+name asld                       # Assembler/loader
+        from .k.m.a             # accepts compact code and archives
+        to e.out                # output file name
+        program {EM}/lib/em_ass         # load file pathname
+        mapflag -l* LNAME={EM}/{LIB}*   # e.g. -ly becomes
+                                        #   {EM}/mach/int/lib/tail_y
+        mapflag -+* ASS_F={ASS_F?} -+*  # recognize -+ and --
+        mapflag --* ASS_F={ASS_F?} --*
+        mapflag -s* SIZE_FLAG=-s*       # overwrite old value of SIZE_FLAG
+        args {SIZE_FLAG} \
+                ({RTS}:.c={EM}/{RT}cc) ({RTS}:.p={EM}/{RT}pc) -o > < \
+                (.p:{TAIL}={EM}/{LIB}pc) \
+                (.c:{TAIL}={EM}/{LIB}cc.1s {EM}/{LIB}cc.2g) \
+                (.c.p:{TAIL}={EM}/{LIB}mon)
+                # -s[sml] must be first argument
+                # the next line contains the choice for head_cc or head_pc
+                # and the specification of in- and output.
+                # the last three args lines choose libraries
+        prop C  # This is the final stage
+end
+.DE
+
+The command "ack -mint -v -v -I../h -L -ly prog.c"
+ would result in the following
+calls (with exec(II)):
+.DS X
+1)  /lib/cpp -I../h -I/usr/em/include -Dint22 -DEM_WSIZE=2 -DEM_PSIZE=2
+      -DEM_SSIZE=2 -DEM_LSIZE=4 -DEM_FSIZE=4 -DEM_DSIZE=8 prog.c
+2)  /usr/em/lib/em_cem -Vw2i2p2f4s2l4d8 -l
+3)  /usr/em/lib/em_ass -sm /usr/em/mach/int/lib/head_cc -o e.out prog.k
+      /usr/em/mach/int/lib/tail_y /usr/em/mach/int/lib/tail_cc.1s
+      /usr/em/mach/int/lib/tail_cc.2g /usr/em/mach/int/lib/tail_mon
+.DE
--- a/doc/cg.doc
+++ b/doc/cg.doc
--- a/doc/cref.doc
+++ b/doc/cref.doc
@ -0,0 +1,317 @@
+.ll 72
+.nr ID 4
+.de hd
+'sp 2
+'tl ''-%-''
+'sp 3
+..
+.de fo
+'bp
+..
+.tr ~
+.               TITLE
+.de TL
+.sp 15
+.ce
+\\fB\\$1\\fR
+..
+.               AUTHOR
+.de AU
+.sp 15
+.ce
+by
+.sp 2
+.ce
+\\$1
+..
+.               DATE
+.de DA
+.sp 3
+.ce
+( Dated \\$1 )
+..
+.               INSTITUTE
+.de VU
+.sp 3
+.ce 4
+Wiskundig Seminarium
+Vrije Universteit
+De Boelelaan 1081
+Amsterdam
+..
+.               PARAGRAPH
+.de PP
+.sp
+.ti +\n(ID
+..
+.nr CH 0 1
+.               CHAPTER
+.de CH
+.nr SH 0 1
+.bp
+.in 0
+\\fB\\n+(CH.~\\$1\\fR
+.PP
+..
+.               SUBCHAPTER
+.de SH
+.sp 3
+.in 0
+\\fB\\n(CH.\\n+(SH.~\\$1\\fR
+.PP
+..
+.               INDENT START
+.de IS
+.sp
+.in +\n(ID
+..
+.               INDENT END
+.de IE
+.in -\n(ID
+.sp
+..
+.de PT
+.ti -\n(ID
+.ta \n(ID
+.fc " @
+"\\$1@"\c
+.fc
+..
+.               DOUBLE INDENT START
+.de DS
+.sp
+.in +\n(ID
+.ll -\n(ID
+..
+.               DOUBLE INDENT END
+.de DE
+.ll +\n(ID
+.in -\n(ID
+.sp
+..
+.               EQUATION START
+.de EQ
+.sp
+.nf
+..
+.               EQUATION END
+.de EN
+.fi
+.sp
+..
+.               ITEM
+.de IT
+.sp
+.in 0
+\\fB~\\$1\\fR
+.ti +5
+..
+.de CS
+.br
+~-~\\
+..
+.br
+.fi
+.TL "Ack-C reference manual"
+.AU "Ed Keizer"
+.DA "September 12, 1983"
+.VU
+.wh 0 hd
+.wh 60 fo
+.CH "Introduction"
+The C frontend included in the Amsterdam Compiler Kit
+translates UNIX-V7 C into compact EM code [1].
+The language accepted is described in [2] and [3].
+This document describes which implementation dependent choices were
+made in the Ack-C frontend and
+some restrictions and additions.
+.CH "The language"
+.PP
+Under the same heading as used in [2] we describe the
+properties of the Ack-C frontend.
+.IT "2.2 Identifiers"
+External identifiers are unique up to 7 characters and allow
+both upper and lower case.
+.IT "2.4.3 Character constants"
+The ASCII-mapping is used when a character is converted to an
+integer.
+.IT "2.4.4 Floating constants"
+To prevent loss of precision the compiler does not perform
+floating point constant folding.
+.IT "2.6 Hardware characteristics"
+The size of objects of the several arithmetic types and the two
+pointer types depend on the EM-implementation used.
+The ranges of the arithmetic types depend on the size used,
+the C-frontend assumes two's complement representation for the
+integral types. All sizes are multiples of bytes.
+The calling program \fIack\fP[4] passes information about the
+size of the types to the compiler proper.
+.br
+However, a few general remarks must be made:
+.sp 1
+.IS
+.PT (a)
+Two different pointer types exist: pointers to data and
+pointers to functions.
+The latter type is twice as large as the former.
+Pointers to functions use the same format as Pascal procedure
+parameters, thereby allowing C to use Pascal procedure
+parameters and vice-versa.
+The extra information passed indicates the scope level of the
+procedure.
+.PT (b)
+The size of pointers to data is a multiple of
+(or equal to) the size of an \fIint\fP.
+.PT (c)
+The following relations exist for the sizes of the types
+mentioned:
+.br
+.ti +5
+\fIchar<=short<=int<=long\fP
+.PT (d)
+Objects of type \fIchar\fP use one 8-bit byte of storage,
+although several bytes are allocated sometimes.
+.PT (e)
+All sizes are in multiples of bytes.
+.PT (f)
+Most EM implementations use 4 bytes for floats and 8 bytes
+for doubles, but exceptions to this rule occur.
+.IE
+.IT "6.1 Characters and integers"
+Objects of type \fIchar\fP are unsigned and do not cause
+sign-extension when converted to \fIint\fP.
+The range of characters values is from 0 to 255.
+.IT "6.3 Floating and integral"
+Floating point numbers are truncated towards zero when
+converted to the integral types.
+.IT "6.4 Pointers and integers"
+When a \fIlong\fP is added to or subtracted from a pointer and
+longs are larger then data pointers the \fIlong\fP is converted to an
+\fIint\fP before the operation is performed.
+.IT "8.5 Structure and union declarations"
+The only type allowed for fields is \fIint\fP.
+Fields with exactly the size of \fIint\fP are signed,
+all other fields are unsigned.
+.br
+The size of any single structure must be less then 4096 bytes.
+.IT "8.6 Initialization"
+Initialization of structures containing bit fields is not
+allowed.
+There is one restriction when using an 'address expression' to initialize
+an integral variable.
+The integral variable must have the size of a data pointer.
+Conversions altering the size of the address expression are not allowed.
+.IT "10.1 External function definitions"
+The total amount for storage used for parameters
+in any function must be less then 4096 bytes.
+The same holds for the total amount of storage occupied by the
+automatic variables declared inside any function.
+.sp
+Using formal parameters whose size is smaller the the size of an int
+is less efficient on several machines.
+At procedure entry these parameters are converted from integer to the
+declared type, because the compiler doesn't know where the least
+significant bytes are stored in the int.
+.IT "11.2 Scope of externals"
+Most C compilers are rather lax in enforcing the restriction
+that only one external definition without the keyword
+\fIextern\fP is allowed in a program.
+The Ack-C frontend is very strict in this.
+The only exception is that declarations of arrays with a
+missing first array bounds expression are regarded to have an
+explicit keyword \fIextern\fP.
+.IT "14.4 Explicit pointer conversions"
+Pointers may be larger the ints, thus assigning a pointer to an
+int and back will not always result in the same pointer.
+The process mentioned above works with integrals
+of the same size or larger as pointers in all EM implementations
+having such integrals.
+Note that pointers to functions have
+twice the size of pointers to data.
+When converting data pointers to an integral type or vice-versa,
+the pointers is seen as an unsigned with the same size a data-pointer.
+When converting function pointers to anything else the static link part
+of the pointer is discarded,
+the resulting value is treated as if it were a data pointer.
+When converting a data pointer or object of integral type to a function pointer
+a static link with the value 0 is added to complete the function pointer.
+.br
+EM guarantees that any object can be placed at a word boundary,
+this allows the C-programs to use \fIint\fP pointers
+as pointers to objects of any type not smaller than an \fIint\fP.
+.CH "Frontend options"
+The C-frontend has a few options, these are controlled
+by flags:
+.IS
+.PT -V
+This flag is followed by a sequence of letters each followed by
+positive integers. Each letter indicates a
+certain type, the integer following it specifies the size of
+objects of that type. One letter indicates the wordsize used.
+.IS
+.sp 1
+.TS
+center tab(:);
+l l16 l l.
+letter:type:letter:type
+
+w:wordsize:i:int
+s:short:l:long
+f:float:d:double
+p:pointer::
+.TE
+.sp 1
+All existing implementations use an integer size equal to the
+wordsize.
+.IE
+The calling program \fIack\fP[4] provides the frontend with
+this flag, with values depending on the machine used.
+.sp 1
+.PT -l
+The frontend normally generates code to keep track of the line
+number and source file name at runtime for debugging purposes.
+Currently a pointer to a
+string containing the filename is stored at a fixed place in
+memory at each function
+entry and the line number at the start of every expression.
+At the return from a function these memory locations are not reset to
+the values they had before the call.
+Most library routines do not use this feature and thus do not
+ruin the current line number and filename when called.
+However, you are really unlucky when your program crashes due
+to a bug in such a library function, because the line number
+and filename do not indicate that something went wrong inside
+the library function.
+.br
+Providing the flag -l to the frontend tells it not to generate
+the code updating line number and file name.
+This is, for example, used when translating the stdio library.
+.br
+When the \fIack\fP[4] is called with the -L flag it provides
+the frontend with this flag.
+.sp 1
+.PT -Xp
+When this flag is present the frontend generates a call to
+the function \fBprocentry\fP at each function entry and a
+call to \fBprocexit\fP at each function exit.
+Both functions are provided with one parameter,
+a pointer to a string containing the function name.
+.br
+When \fIack\fP is called with the -p flag it provides the
+frontend with this flag.
+.IE
+.CH References
+.IS
+.PT [1]
+A.S. Tanenbaum, Hans van Staveren, Ed Keizer and Johan
+Stevenson \fIDescription of a machine architecture for use with
+block structured languages\fP Informatica report IR-81.
+.sp 1
+.PT [2]
+B.W. Kernighan and D.M. Ritchie, \fIThe C Programming
+language\fP, Prentice-Hall, 1978
+.PT [3]
+D.M. Ritchie, \fIC Reference Manual\fP
+.sp
+.PT [4]
+UNIX manual ack(I).
--- a/doc/install.doc
+++ b/doc/install.doc
@ -0,0 +1,622 @@
+.nr LL 7.5i
+.nr PD 1v
+.TL
+Amsterdam Compiler Kit installation guide
+.AU
+Ed Keizer
+.AI
+Wiskundig Seminarium
+Vrije Universiteit
+Amsterdam
+.NH
+Introduction
+.PP
+This document
+describes the process of installing Amsterdam Compiler Kit.
+It depends on your combination of hard- and software how
+hard it will be to install the kit.
+This description is intended for a PDP 11/44 running
+.UX
+Version 7.
+Installation on other PDP 11's should be easy, as long
+as they have separate instruction and data space.
+Installation on machine's without this feature, like PDP 11/34,
+PDP 11/60 requires extensive surgery on some programs and is
+thought of as impossible.
+See chapter 6 for installation on other systems.
+.NH
+Restoring tree
+.PP
+The process of installing Amsterdam Compiler Kit is quite simple.
+It is important that the original Amsterdam Compiler Kit
+distribution tree structure is restored.
+Proceed as follows
+.IP "  -" 10
+Create a directory, for example /usr/em, on a device
+with at least 20000 blocks left.
+.IP "  -"
+Change to that directory (cd ...); it will be the working directory.
+.IP "  -"
+Extract all files from the distribution medium, for instance
+magtape:
+\fBtar x\fP.
+.IP "  -"
+Keep a copy of the original distribution to be able to repeat the process
+of installation in case of disasters.
+This copy is also useful as a reference point for diff-listings.
+.LP
+The directories in the tree contain the following information:
+.nr PD 1v
+.IP "lib" 14
+.br
+almost all binaries and shell files used by commands and
+library em_data.a from misc/data
+.IP "lib/ack"
+.br
+The command descriptor files used by the program ack.
+.nr PD 0
+.IP "bin"
+.br
+the few utilities that knot things together
+.IP "etc"
+.br
+The MAIN description of EM sits here.
+contains files (e.g. em_table) describing
+the opcodes and pseudos in use,
+the operands allowed, effect in stack etc. etc.
+Make in this directory creates most of the files in h
+.IP "include"
+.br
+More or less system independent include files needed by modules
+in the C library from lang/cem/libcc.
+Especially needed for "stdio".
+.IP "h"
+.br
+The #include files for:
+.nf
+as_spec.h    Used by EM assembler and interpreters.
+em_abs.h     Contains trap numbers and address for lin and fil
+em_flag.h    Definition of bits in array em_flag in lib/em_data.a
+             Describes parameters effect on flow of instructions
+em_mes.h     Definition of names for mes pseudo numbers
+em_mnem.h    instruction => compact mapping.
+em_pseu.h    pseudo instruction => compact mapping
+em_ptyp.h    Useful for compact code reading/writing,
+             defines classes of parameters
+em_spec.h    Definition of constants used in compact code
+local.h      Various definitions for local versions
+pc_err.h     Definitions of error numbers in Pascal
+pc_file.h    Macro's used in file handling in Pascal
+em_path.h    Pathnames used by \fIack\fP, intended
+             for all utilities
+pc_size.h    Sizes of objects used by Pascal compiler and
+             run-time system.
+em_reg.h     Definition of names for register types.
+.IP "doc"
+.br
+Documentation
+.nf
+cg.doc          Use and internal specification of the backend.
+.br
+regadd.doc      Update for cg.doc concerning register variables
+.br
+regadd.doc      Description of steps to add register variables.
+.br
+ack.doc         Layout of description files needed for each machine.
+.br
+cref.doc        C reference manual, addendum
+.br
+install.doc     Ack Installation Guide
+.br
+pcref.doc       Pascal reference manual, addendum
+.br
+peep.doc        Description of the peephole optimizer
+.br
+em.doc          EM reference manual
+.br
+toolkit.doc     A general overview of the toolkit
+.br
+v7bugs.doc      Bugs in the standard V7 system
+.br
+val.doc         Pascal validation suite version 3 report
+.nf
+.IP "doc/em.doc"
+.br
+The EM-manual IR-81
+.IP "doc/em.doc/int"
+.br
+The EM interpreter written in pascal
+.IP "mkun"
+.br
+The PUBMAC macro package for nroff/troff from the Katholieke Universiteit at
+Nijmegen.
+It is used for the EM reference manual,
+the Makefile installs the macro package in
+/usr/lib/tmac/tmac.mkun*.
+This package is in the public domain.
+.IP "mach"
+.br
+just there to group the directories for all machines
+these directories have sub-directories named:
+.nf
+  as      the assembler ( *.s + libraries => a.out )
+  cg      the new backend   ( *.m => *.s )
+  lib     the libraries for all run-time systems
+          these libraries are used by the assembler.
+  libpc   Used to create Pascal run-time system in 'lib'
+  libcc   Used to create C run-time system in 'lib'
+  libem   Sources for EM runtime system, result sits in 'lib'
+  test    Various tests
+  dl      Down-load programs
+  int     Source for an interpreter
+available are:
+    PMDS II 68000, wordsize 2, ptrsize 4
+        mach/m68k2
+        mach/m68k2/as
+        mach/m68k2/cg
+        mach/m68k2/libem
+        mach/m68k2/lib
+        mach/m68k2/dl
+        mach/m68k2/libpc
+        mach/m68k2/libcc
+        mach/m68k2/libsys
+    bare 6809
+        mach/6809
+        mach/6809/as
+    8080, wordsize 2, ptrsize 2
+        mach/8080
+        mach/8080/as
+        mach/8080/test
+        mach/8080/libcc
+        mach/8080/lib
+   bare 8086, wordsize 2, ptrsize 2
+        mach/i86
+        mach/i86/as
+        mach/i86/lib
+        mach/i86/libcc
+        mach/i86/dl
+        mach/i86/libem
+        mach/i86/libpc
+        mach/i86/saio  (library for stand-alone EM on 86/12A )
+    pdp 11, UNIX/V7, wordsize 2, ptrsize 2
+        mach/pdp
+        mach/pdp/test
+        mach/pdp/libem
+        mach/pdp/lib
+        mach/pdp/libcc
+        mach/pdp/libpc
+        mach/pdp/cg
+        mach/pdp/int         -PDP 11/44 EM interpreter
+    vax 780, UNIX V7, wordsize 4, ptrsize 4
+        mach/vax4
+        mach/vax4/cg
+        mach/vax4/lib
+        mach/vax4/libcc
+        mach/vax4/libem
+        mach/vax4/libpc
+    z80, CP/M, wordsize 2, ptrsize 2
+        mach/z80
+        mach/z80/as
+        mach/z80/libem
+        mach/z80/lib
+        mach/z80/libcc
+        mach/z80/libpc
+        mach/z80/int         -Z80 EM interpreter
+    z80, nascom
+        mach/z80a
+        mach/z80a/dl
+    vax 11/780, Berkeley UNIX, wordsize 2, ptrsize 4
+        mach/vax2
+        mach/vax2/cg
+        mach/vax2/lib
+        mach/vax2/libpc
+        mach/vax2/libem
+    bare 6500, wordsize 2, ptrsize 2
+        mach/6500
+        mach/6500/as
+        mach/6500/dl
+        mach/6500/libem
+        mach/6500/lib
+    bare 6800, wordsize 2, ptrsize 2
+        mach/6800
+        mach/6800/as
+    EM virtual machine code, wordsize 2, ptrsize 2
+        mach/int
+        mach/int/libcc
+        mach/int/libpc
+        mach/int/lib
+        mach/int/test
+    The directory proto contains files used by most machines.
+    e.g. makefiles for libraries for C and Pascal
+        mach/proto
+        mach/proto/libg
+.fi
+.IP "emtest"
+.br
+Contains prototype of em test set.
+.IP "man"
+.br
+Man files for various utilities
+.IP "lang"
+.br
+just there to group the directories for all front-ends
+.IP "lang/pc"
+.br
+Pascal front-end
+.IP "lang/pc/libpc"
+.br
+Source of Pascal run-time system ( in EM or C )
+.IP "lang/pc/test"
+.br
+Some test programs written in Pascal
+.IP "lang/pc/pem"
+.br
+The compiler proper
+.IP "lang/cem"
+.br
+C front-end
+.IP "lang/cem/libcc"
+.br
+Directories with sources of C runtime system, libraries (in EM or C)
+.IP "lang/cem/libcc/gen"
+.br
+Sources for routines in chapter III of UNIX programmers manual,
+excluding STDIO
+.IP "lang/cem/libcc/stdio"
+.br
+STDIO sources
+.IP "lang/cem/libcc/mon"
+.br
+Sources for routines in chapter II, written in EM
+.IP "lang/cem/comp"
+.br
+The compiler proper
+.IP "lang/cem/ctest"
+.br
+C test set
+.IP "lang/cem/ctest/cterr"
+.br
+Programs developed for pinpointing previous errors
+.IP "lang/cem/ctest/ct*"
+.br
+The test programs.
+.IP "util"
+.br
+Contains directories with various utilities
+.IP "util/opt"
+.br
+EM peephole optimizer (*.k => *.m)
+.IP "util/misc"
+.br
+Decode (*.[km] => *.e) + encode (*.e => *.k)
+.IP "util/data"
+.br
+The C-code for `lib/em_data.a`
+These sources are created by the Makefile in `etc`
+.IP "util/ass"
+.br
+The EM assembler ( *.[km] + libraries => e.out )
+.IP "util/arch"
+.br
+The archiver to be used for ALL EM utilities
+.IP "util/cgg"
+.br
+A program needed for compiling backends.
+.IP "util/cpp"
+.br
+The V7 C preprocessor.
+.LP
+All pathnames mentioned in the text of this document are relative to the
+working directory, unless they start with '/'.
+.PP
+The person doing the installation needs permission to write in the
+directories of the Amsterdam Compiler Kit distribution tree.
+Preferably you should log in as sys (uid=3,gid=0).
+.NH
+Pathnames
+.PP
+Absolute pathnames are concentrated in "h/em_path.h".
+Only the pascal runtime system and the utility \fIack\fP use
+absolute pathnames to access files in the kit.
+The tree is distributed with /usr/em as the working
+directory.
+The definition of EM_HOME in em_path.h should be altered to
+specify the root
+directory for the Compiler Kit distribution on your system.
+The trailing " in the definition of EM_HOME is intentionally
+missing!
+Em_path.h also specifies which directory should be used for
+temporary files.
+Most programs from the kit do indeed use that directory
+although some remain stubborn and use /tmp or /usr/tmp.
+.LP
+The shape of the tree should not be altered lightly because
+most Makefiles and the
+utility \fIack\fP know the shape of the ACK tree.
+All pathnames in all Makefiles are relative, that is do not
+have "/" as the first character.
+The knowledge of the utility \fIack\fP about the shape of the tree is
+concentrated in the files in the directory lib/ack.
+.NH
+Commands
+.PP
+The kit is distributed with all available commands in the bin
+directory.
+The commands distributed are:
+.IP "\fIack\fP, \fIacc\fP, \fIapc\fP and their links"
+.br
+They are used to compile the Pascal, C, etc... programs.
+.IP \fIarch\fP
+.br
+The archiver used for the EM- and universal assembler.
+.IP "\fIem\fP and \fIeminform\fP"
+.br
+The EM interpretator for the PDP-11 and the program to unravel
+its post-mortem information.
+.LP
+We currently make the kit available to our users by telling
+them that they should include the bin directory of the kit in
+their PATH shell variable.
+The programs will still work when moved to a different
+directory.
+The copying should preferably be done with tar, since links are
+heavily used.
+Renaming of the programs linked to \fIack\fP will not always
+produce the desired result.
+This program uses its call name as an argument.
+Any call name not being \fIcc\fP, \fIacc\fP, \fIpc\fP or \fIapc\fP will be
+interpreted as the name of a 'machine description' and the
+program will try to find a description file with that name.
+All recompilations will only touch the utilities in the bin
+directory, not your own copies.
+.NH
+Options
+.PP
+There is one important option in h/local.h.
+The utility \fIack\fP uses a default machine name when called
+as \fIacc\fP, \fIcc\fP, \fIapc\fP, \fIpc\fP or \fIack\fP.
+The machine name used for default is determined by the
+definition of ACKM in h/local.h.
+The current definition is \fIpdp\fP.
+.PP
+The distribution is tailored to one specific opreating system per CPU type.
+For some of these  CPU's it is possible to tailor the distribution to another
+operating system.
+The steps to be taken are described in READ_ME (or README) files in the
+subdirectories of the directory in EM_HOME/mach for that particular machine.
+For example: The vax2 distribution is tailoerd to BSD4.1, but has #define's
+for BSD4.1c and BSD4.2.
+For the names and places of these define's look in EM_HOME/mach/vax2/cg and
+EM_HOME/mach/vax2/libem.
+.NH
+Recompilation
+.PP
+The kit comes with binaries in the directories \fBbin\fP and
+\fBlib\fP.
+Some directories among mach/*/lib contain archives with object files,
+notably mach/pdp/lib.
+The binaries and object files are for a PDP 11/44 with floating
+point running UNIX V7.
+.PP
+Almost all directories contain a "Makefile" or a shell command file called
+"make".
+Apart from commands applying to that specific directory these
+files all recognize a few special commands.
+When called with one of these they will apply the command to
+their own directory and all subdirectories.
+The special commands are:
+.IP "install" 20
+recompile and install all binaries and libraries.
+.br
+Some Makefiles allow errors to occur in the programs they call.
+They ignore such errors and notify the user with the message
+"~....... error code n: ignored".
+Whenever such a message appears in the output you can ignore it
+too.
+.br
+The installation of the PUBMAC macro package is not done
+automatically from the higher level directory.
+.IP "cmp"
+recompile all binaries and libraries and compare them to the
+ones already installed.
+.IP pr
+print the sources and documentation on the standard output.
+.IP opr
+make pr | opr
+.br
+Opr should be an off-line printer daemon.
+On some systems it exists under another name e.g. lpr.
+The easiest way to call such a spooler is using a shell script
+with the name opr that calls lpr.
+This script should be placed in /usr/bin or EM_HOME/bin or
+one of the directories in your PATH.
+.IP clean
+remove all files not needed for day-to-day use,
+that is binaries not in bin or lib, object files etc.
+.LP
+Example:
+.nf
+.sp 1
+        make install
+.sp 1
+.fi
+given as command in the home directory will cause
+recompilation of all programs in the kit.
+.LP
+Recompilation of the complete kit lasts about 9 hours an a PDP
+11/44.
+.NH 2
+Recompilation on a different machine.
+.PP
+Installation on other systems will often require recompilation
+of all programs.
+The presence of a C compiler is essential for recompilation.
+Except the Pascal compiler proper all programs are written in C.
+Some modules are derived from \fIyacc\fP sources.
+Retranslating these programs from that yacc source is not
+necessary, although it might improve performance.
+Some versions of \fIyacc\fP 'know' that the resulting C programs will
+run on a 32-bit int machine.
+C modules produced by such a \fIyacc\fP are not portable and
+should not be used to (cross)compile programs for 16-bit machines.
+We assume a version UNIX which, apart from the C-compiler,
+contains most normal utilities, like ed, sed, grep, make, the
+Bourne shell etc.
+All Makefiles use the system C-compiler.
+The existence of a backend for your system is of course essential
+if you wish to produce executable files for that system.
+When the backend exists it is also possible to boot the Pascal
+Compiler,
+that is written in Pascal itself.
+The kit contains the compact code files for the 2/2 and 2/4
+versions of the Pascal compiler.
+The current version of this compiler can only be used on machines
+with a 16-bit word size and 16- or 32-bit pointers.
+The Makefile automatically tries to boot the Pascal compiler
+from one of these compact code files, if the compiler proves
+unable to compile itself.
+.PP
+The native assemblers and loaders are used on PDP-11 and VAX.
+The description files in lib/ack for other systems use our
+universal assembler.
+The load file produced by this assembler is not directly
+usable in any system known to us,
+but has to be converted before it can be put to use.
+The \fIdl\fP programs present for some machines unravel
+these load files and transmit commands to load memory
+to a microprocessor over a serial line.
+The PDP-11 version of our universal assembler is supplied
+with a conversion program.
+The file man/a.out.5 contains a description of the format of
+the universal assembler load file,
+it might be useful to those who wish or need to write their
+own conversion programs.
+.br
+Berkeley UNIX for the VAX'en has (at least) three different
+versions, BSD4.1a, BSD4.1c and BSD4.2. The READ_ME files in the
+directories mach/vax2/cg, mach/vax2/libem, mach/vax4/cg and
+mach/vax4/libem tell you how to adapt the vax2 and vax4 backend
+to these versions.
+.NH 2
+Recompiling libraries
+.PP
+The kit contains sources for part II and III of the C-library, except
+the math functions, they are grabbed from our V7 system and sometimes
+altered in a EM dependent way or replaced altogether when the original
+was in assembly.
+These files can be used to make libraries for the Ack C-compiler.
+The recompilation process uses a few include files.
+The include directory in the EM home directory contains a few more
+or less system independent include files.
+The system dependent include files are fetched from /usr/include
+on the system you use to recompile.
+This may lead to several problems.
+Sometimes the system differs so much from V7 that certain manifest constants
+do not exist any more.
+At other times these include files were written for a compiler without
+a restriction on name length.
+In that case - I've seen it happen - people tend to use differing
+identifiers that are identical in the first eight characters.
+All these problems you have to solve yourself,
+the libraries are only included as an extra and too much system
+dependent to give any guarantees.
+.NH
+Fixes to the UNIX V7 system
+.PP
+UNIX System V7 has a few bugs that prevent a part of or the whole kit
+from working properly.
+To be honest, we do not know which of the following changes are
+essential to the functioning of our kit.
+.PP
+The file "doc/v7bugs.doc" gives for each of the following bugs
+a small test program and a diff listing of the source files that have to be
+modified.
+.IP 1
+Bug in the C optimizer for unsigned comparison
+.nr PD 0
+.IP 2
+The loader 'ld' fails for large data and text portions
+.IP 3
+Floating point registers are not saved if more memory is needed.
+.IP 4
+Floating point registers are not copied to child in fork().
+.nr PD 1v
+.LP
+Use the test programs to see if the errors are present in your system
+and to check if the modifications are effective.
+.NH
+Testing
+.PP
+Test sets are available in Pascal, C and EM assembly.
+.IP em 8
+.br
+The directory emtest contains a few EM test programs.
+The EM assembly files in these tests must be transformed into
+load files, thereby avoiding use of the EM optimizer.
+These tests use the LIN and NOP instructions to mark the passing of each
+test.
+The NOP instruction prints the current line number during the
+test phase.
+Each test notifies its correctness by calling LIN with a unique
+number followed by a NOP which prints this line number.
+The test finishes normally with 0 as the last number printed
+In all other cases a bug showed its
+existence.
+.IP Pascal
+.br
+The directory lang/pc/test contains a few pascal test programs.
+All these programs print the number of errors found and a
+identification of these errors.
+.IP C
+.br
+The sub-directories in lang/cem/ctest contain C test programs.
+The idea behind these tests is:
+when you have a program called xx.c, compile it into xx.cem.
+Run it with standard output to xx.cem.r, compare this file to
+xx.cem.g, a file containing the 'ideal' output.
+Any differences will point to implementation differences or
+bugs.
+Giving the command "run gen" or plain "run" starts this
+process.
+The differences will be presented on standard output.
+The contents of the result files depend on the wordsize,
+the xx.cem.g files on the distribution are intended for a
+16-bit machine.
+.NH
+Documentation
+.PP
+Manual pages for Amsterdam Compiler Kit can be copied
+to "/usr/man/man?" by the
+following commands:
+.DS
+cd man
+make install
+.DE
+.LP
+Several documents are provided:
+.DS
+doc/toolkit.doc: a general overview
+doc/pcref.doc: the Pascal-frontend reference manual
+doc/val.doc: the results of running the Pascal Validation Suite
+doc/cref.doc: the C-frontend manual
+doc/em.doc: a description of the EM machine architecture
+doc/peep.doc: internal documentation for the peephole optimizer
+doc/cg.doc: documentation for backend writers and maintainers
+doc/regadd.doc: addendum to previous document describing register variables
+doc/install.doc: this document
+.DE
+.LP
+The Validation Suite is a collection of more than 200 Pascal programs,
+designed by Brian Wichmann and Arthur Sale to test Pascal compilers.
+We are not allowed to distribute it, but you may
+request a copy from
+.DS
+Richard J. Cichelli
+A.N.P.A.
+1350 Sullivan Trail
+P.O. Box 598
+Easton, Pennsylvania 18042
+USA
+.DE
+.LP
+Good luck.
--- a/doc/pcref.doc
+++ b/doc/pcref.doc
--- a/doc/peep.doc
+++ b/doc/peep.doc
@ -0,0 +1,505 @@
+.TL
+Internal documentation on the peephole optimizer
+.br
+from the Amsterdam Compiler Kit
+.NH 1
+Introduction
+.PP
+Part of the Amsterdam Compiler Kit is a program to do
+peephole optimization on an EM program.
+The optimizer scans the program to match patterns from a table
+and if found makes the optimization from the table,
+and with the result of the optimization
+it tries to find yet another optimization
+continuing until no more optimizations are found.
+.PP
+Furthermore it does some optimizations that can not be called
+peephole optimizations for historical reasons,
+like branch chaining and the deletion of unreachable code.
+.PP
+The peephole optimizer consists of three parts
+.IP 1)
+A driving table
+.IP 2)
+A program translating the table to internal format
+.IP 3)
+C code compiled with the table to make the optimizer proper
+.PP
+In this document the table format, internal format and 
+data structures in the optimizer will be explained,
+plus a hint on what the code does where it might not be obvious.
+It is a simple program mostly.
+.NH 1
+Table format
+.PP
+The driving table consists of pattern/replacement pairs,
+in principle one per line,
+although a line starting with white space is considered
+a continuation line for the previous.
+The general format is:
+.DS
+optimization : pattern ':' replacement '\en'
+.sp
+pattern : EMlist optional_boolean_expression
+.sp
+replacement : EM_plus_operand_list
+.DE
+Example of a simple one
+.DS
+loc stl $1==0 : zrl $2
+.DE
+There is no real limit for the length of the pattern or the replacement,
+the replacement might even be longer than the pattern,
+and expressions can be made arbitrarily complicated.
+.PP
+The expressions in the table are made of the following pieces:
+.IP -
+Integer constants
+.IP -
+$\fIn\fP, standing for the operand of the \fIn\fP'th EM
+instruction in the pattern,
+undefined if that instruction has no operand.
+.IP -
+w, standing for the wordsize of the code optimized.
+.IP -
+p, for the pointersize.
+.IP -
+defined(expr), true if expression is defined
+.IP -
+samesign(expr,expr), true if expressions have the same sign.
+.IP -
+sfit(expr,expr), ufit(expr,expr),
+true if the first expression fits signed or unsigned in the number
+of bits given in the second expression.
+.IP -
+rotate(expr,expr),
+first expression rotated left the number of bits given by the second expression.
+.IP -
+notreg(expr),
+true if the local with the expression as number is not a candidate to put
+in a register.
+.IP -
+rom(\fIn\fP,expr), contents of the rom descriptor at index expr that
+is associated with the global label that should be the argument of
+the \fIn\fP'th EM instruction.
+Undefined if such a thing does not exist.
+.PP
+The usual arithmetic operators may be used on integer values,
+if any operand is undefined the expression is undefined,
+except for the defined() function above.
+An undefined expression used for its truth value is false.
+All arithmetic on local label operands is forbidden,
+only things allowed are tests for equality.
+Arithmetic on global labels makes sense,
+i.e. one can add a global label and a constant,
+but not two global labels.
+.PP
+In the table one can use five additional EM instructions in patterns.
+These are:
+.IP lab
+Stands for a local label
+.IP LLP
+Load Local Pointer, translates into a 
+.B lol
+or into a 
+.B ldl
+depending on the relationship between wordsize and pointersize.
+.IP LEP
+Load External Pointer, translates into a 
+.B loe
+or into a 
+.B lde .
+.IP SLP
+Store Local Pointer,
+.B stl
+or 
+.B sdl .
+.IP SEP
+Store External Pointer,
+.B ste
+or
+.B sde .
+.PP
+There is only one peephole optimizer,
+so the substitutions to be made for the last four instructions
+are made at run time before the first optimizations are made.
+.NH 1
+Internal format
+.PP
+The translating program,
+.I mktab
+converts the table into an array of bytes where all
+patterns follow unaligned.
+Format of a pattern is:
+.IP 1)
+One byte for high byte of hash value,
+will be explained later on.
+.IP 2)
+Two bytes for the index of the next pattern in a chain.
+.IP 3)
+An integer\u*\d,
+.FS
+* An integer is encoded as a byte when less than 255,
+otherwise as a byte containing 255 followed by two
+bytes with the real value.
+.FE
+pattern length.
+.IP 4)
+The list of pattern opcodes, one per byte.
+.IP 5)
+An integer expression index, 0 if not used.
+.IP 6)
+An integer, replacement length.
+.IP 7)
+A list of pairs consisting of a one byte opcode and an integer
+expression index.
+.PP
+The expressions are kept in an array of triples,
+implementing a binary tree.
+The
+.I mktab
+program tries to minimize the number of triples by reusing
+duplicates and even reverses the operands of commutative operators
+when doing so would spare a triple.
+.NH 1
+A tour through the sources
+.PP
+Now we will walk through the sources and note things of interest.
+.NH 2
+The header files
+.PP
+The header files are the place where data structures and options reside.
+.NH 3
+alloc.h
+.PP
+In the header file alloc.h several defines can be used to select various
+kinds of core allocation schemes.
+This is important on small machines like the PDP-11 since a complete
+procedure must be in core at the same space,
+and the peephole optimizer should not be the limiting factor in
+determining the maximum size of procedures if possible.
+Options are:
+.IP -
+USEMALLOC, standard malloc() and free() are used instead of the own
+core allocation package.
+Not recommended unless the own package does not work on some bizarre
+machine.
+.IP -
+COREDEBUG, prints large amounts of information about core management.
+Better not define it unless you change the code and it stops working.
+.IP -
+SEPID, if you define this you will get an extra procedure that will
+go through a lot of work to scrape the last bytes together if the
+system won't provide more.
+This is not a good idea if memory is scarce and code and data reside
+in the same spaces, since the room used by the procedure might well
+be more than the room saved.
+.IP -
+STACKROOM, number of shorts used in stack space.
+This is used if memory is scarce and stack space and data space are
+different.
+On the PDP-11 a UNIX process starts with an 8K stack segment which
+cannot be transferred to the data segment.
+Under these conditions one can use a lot of the stack space for storage.
+.NH 3
+assert.h
+.PP
+Just defines the assert macro.
+When compiled with -DNDEBUG all asserts will be off.
+.NH 3
+ext.h
+.PP
+Gives external definitions of variables used by more than one module.
+.NH 3
+line.h
+.PP
+Defines the structures used to keep instructions,
+one structure per line of EM code,
+and the structure to keep arguments of pseudos,
+one structure per argument.
+Both structures essentially contain a pointer to the next,
+a type,
+and a union containing information depending on the type.
+Core is allocated only for the part of the union used.
+.PP
+The 
+.I
+struct line
+.R
+has a very compact encoding for small integers,
+they are encoded in the type field.
+On the PDP-11 this gives a line structure of only 4 bytes for most
+instructions.
+.NH 3
+lookup.h
+.PP
+Contains definition of the struct used for symbol table management,
+global labels and procedure names are kept in one table.
+.NH 3
+optim.h
+.PP
+If one defines the DIAGOPT option in this header file,
+for every optimization performed a number is written on stderr.
+The number gives the number of the pattern in the table
+or one of the four special numbers in this header file.
+.NH 3
+param.h
+.PP
+Contains one settable option,
+LONGOFF.
+If this is not defined the optimizer can only optimize programs
+with wordsize 2 and pointersize 2.
+Set this only if it must be run on a Z80 or something pathetic like that.
+.PP
+Other defines here should not be touched.
+.NH 3
+pattern.h
+.PP
+Contains defines of indices in a pattern,
+definition of the expression triples,
+definitions of the various expression operators
+and definition of the result struct where expression results are put.
+.PP
+This header file is the main one that is also included by
+.I mktab .
+.NH 3
+proinf.h
+.PP
+This one contains definitions 
+for the local label table structs
+and for the struct where all information for one procedure is kept.
+This is in one struct so it can be saved easily when recursive
+procedures have to be resolved.
+.NH 3
+types.h
+.PP
+Collection of typedefs to be used by almost all modules.
+.NH 2
+The C code itself.
+.PP
+The C code will now be the center of our attention.
+We will make a walk through the sources and we will try
+to follow the sources in a logical order.
+So we will start at
+.NH 3
+main.c
+.PP
+The main.c module contains the main() function.
+Here nothing spectacular happens,
+only thing of interest is the handling of flags:
+.IP -L
+This is an instruction to the peephole optimizer to perform
+one of its auxiliary functions, the generation of a library module.
+This makes the peephole optimizer write its output on a temporary file,
+and at the end making the real output by first generating a list
+of exported symbols and then copying the temporary file behind it.
+.IP -n
+Disables all optimization.
+Only thing the optimizer does now is filling in the blank after the
+.I END
+pseudo and resolving recursive procedures.
+.PP
+The place where main() is left is the call to getlines() which brings
+us to
+.NH 3
+getline.c
+.PP
+This module reads the EM code and constructs a list of 
+.I
+struct line
+.R
+records,
+linked together backwards,
+i.e. the first instruction read is the last in the list.
+Pseudos are handled here also,
+for most pseudos this just means that a chain of argument records
+is linked into the linked line list but some pseudos get special attention:
+.IP exc
+This pseudo is acted upon right away.
+Lines read are shuffled around according to instruction.
+.IP mes
+Some messages are acted upon.
+These are:
+.RS
+.IP ms_err 8
+The input is drained, just in case it is a pipe.
+After that the optimizer exits.
+.IP ms_opt
+The do not optimize flag is set.
+Acts just like -n on the command line.
+.IP ms_emx
+The word- and pointersize are read,
+complain if we are not able to handle this.
+.IP ms_reg
+We take notice of the offset of this local.
+See also comments in the description of peephole.c
+.RE
+.IP pro
+A new procedure starts, if we are already in one save the status,
+else process collected input.
+Collect information about this procedure and if already in a procedure
+call getlines() recursively.
+.IP end
+Process collected input.
+.PP
+The phrase "process collected input" is used twice,
+which brings us to
+.NH 3
+process.c
+.PP
+This module contains the entry point process() which is called at any
+time the collected input must be processed.
+It calls a variety of other routines to get the real work done.
+Routines in this module are in chronological order:
+.IP symknown 12
+Marks all symbols seen until now as known,
+i.e. it is now known whether their scope is local or global.
+This information is used again during output.
+.IP symvalue
+Runs through the chain of pseudos to give values to data labels.
+This needs an extra pass.
+It cannot be done during the getlines pass, since an
+.B exc
+pseudo could destroy things.
+Nor can it be done during the backward pass since it is impossible
+to do good fragment numbering backward.
+.IP checklocs
+Checks whether all local labels referenced are defined.
+It needs to be sure about this since otherwise the
+semi global optimizations made cannot work.
+.IP relabel
+This routine finds the final destination for each label in the procedure.
+Labels followed by unconditional branches or other labels are marked during
+the peephole fase and this leeds to chains of identical labels.
+These chains are followed here, and in the local label table each label
+has associated with it its replacement label, after this procedure is run.
+Care is taken in this routine to prevent a loop in the program to
+cause the optimizer to loop.
+.IP cleanlocals
+This routine empties the local label table after everything
+is processed.
+.PP
+But before this can all be done,
+the backward linked list of instructions first has to be reversed,
+so here comes
+.NH 3
+backward.c
+.PP
+The routine backward has a number of functions:
+.IP -
+It reverses the backward linked list, making two forward linked lists,
+one for the instructions and one for the pseudos.
+.IP -
+It notes the last occurrence of data labels in the backward linked list
+and puts it in the global symbol table.
+This is of course the first occurence in the procedure.
+This information is needed to decide whether the symbols are global
+or local to this module.
+.IP -
+It decides about the fragment boundaries of data blocks.
+Fragments are numbered backwards starting at 3.
+This is done to be able to make the type of an expression
+containing a symbol equal to its fragment.
+This type can then not clash with the types integer and local label.
+.IP -
+It allocates a rom buffer to every data label with a rom behind
+it, if that rom contains only plain integers at the start.
+.PP
+The first thing done after process() has called backward() and some
+of its own little routines is a call to the real routine,
+the one that does the work the program was written for
+.NH 3
+peephole.c
+.PP
+The first routines in peephole.c 
+implement a linked list for the offsets of local variables
+that are candidates for a register implementation.
+Several patterns use the notreg() function,
+since it is forbidden to combine a load of that variable
+with the load of another and
+it is not allowed to take the address of that variable.
+.PP
+The routine peephole hashes the patterns the first time it is called
+after which it doesn't do much more than calling optimize.
+But first hashpatterns().
+.PP
+The patterns are hashed at run time of the optimizer because of
+the
+.B LLP ,
+.B LEP ,
+.B SLP 
+and
+.B SEP
+instructions added to the instruction set in this optimizer.
+These are first replaced everywhere in the table by the correct
+replacement after which the first three instructions of the
+pattern are hashed and the pattern is linked into one of the
+256 linked lists.
+There is a define CHK_HASH in this module that you
+can set if you do not trust the randomness of the hashing
+function.
+.PP
+The attention now shifts to optimize().
+This routine calls  basicblock() for every piece of code between two labels.
+It also notes which labels have another label or a branch behind them
+so the relabel() routine from process.c can do something with that.
+.PP
+Basicblock() keeps making passes over its basic block
+until no more optimizations are found.
+This might be inefficient if there is a long basicblock with some
+deep recursive optimization in one part of it.
+The entire basic block is then scanned a lot of times just for
+that one piece.
+The alternative is backing up after making an optimization and running
+through the same code again, but that is difficult
+in a single linked list.
+.PP
+It hashes instructions and calls trypat() for every pattern that has
+a full hash value match,
+i.e. lower byte and upper byte equal.
+Longest pattern is tried first.
+.PP
+Trypat() checks length and opcodes of the pattern.
+If correct it fills the iargs[] array with argument values
+and calculates the expression.
+If that is also correct the work shifts to tryrepl().
+.PP
+Tryrepl() generates the list of replacement instructions,
+links it into the list and returns true.
+Why then the name tryrepl() if it always succeeds?
+Well, there is a mechanism in the optimizer,
+unused until today that makes it possible to do optimizations that cannot
+be described by the table.
+It is possible to give a number as a replacement which will cause the
+optimizer to call a routine special() to do some work.
+This routine might decide not to do an optimization and return false.
+.PP
+The last routine that is called from process() is putline()
+to write the optimized code, bringing us to
+.NH 3
+putline.c
+.PP
+The major part of putline.c is the standard set of routines
+that makes EM compact code.
+The extra functions performed are:
+.IP -
+For every occurence of a global symbol it might be necessary to
+output a 
+.B exa ,
+.B exp ,
+.B ina
+or 
+.B inp
+pseudo instruction.
+That task is performed.
+.IP -
+The
+.B lin
+instructions are optimized here,
+.B lni
+instructions added for 
+.B lin
+instructions and superfluous
+.B lin
+instructions deleted.
+
--- a/doc/regadd.doc
+++ b/doc/regadd.doc
@ -0,0 +1,131 @@
+.TL
+Addition of register variables to an existing table.
+.NH 1
+Introduction
+.PP
+This is a short description of the newest feature in the
+table driven code generator for the Amsterdam Compiler Kit.
+It describes how to add register variables to an existing table.
+This assumes you have the distribution of October 1983 or later.
+It is not clear whether you should read this when starting with
+a table for a new machine,
+or whether you should wait till the table is well debugged already.
+.NH 1
+Modifications to the table itself.
+.NH 2
+Register section
+.PP
+You can add just before the properties of the register one
+of the following:
+.IP - 2
+regvar
+.IP -
+regvar ( pointer )
+.IP -
+regvar ( loop )
+.IP -
+regvar ( float )
+.LP
+All register variables of one type must be of the same size,
+and they may have no subregisters.
+.NH 2
+Codesection
+.PP
+.IP - 2
+Two pseudo functions are added to the list allowed inside expressions:
+.RS
+.IP 1) 3
+inreg ( expr ) has as a parameter the offset of a local,
+and returns 0,1 or 2:
+.RS
+.IP 2: 3
+if the variable is in a register.
+.IP 1:
+if the variable could be in a register but isn't.
+.IP 0:
+if the variable cannot be in a register.
+.RE
+.IP 2)
+regvar ( expr ) returns the register associated with the variable.
+Undefined if it is not in a register.
+So regvar ( expr ) is defined if and only if inreg (expr ) == 2.
+.RE
+.IP -
+It is now possible to remove() a register expression,
+this is of course needed for a store into a register local.
+.IP -
+The return out of a procedure may now involve register restores,
+so the special word 'return' in the table will invoke a user defined
+function.
+.NH 1
+Modifications to mach.c
+.PP
+If register variables are used in a table, the program
+.I cgg
+will define the word REGVARS during compilation of the sources.
+So the following functions described here should be bracketed
+by #ifdef REGVARS and #endif.
+.IP - 2
+regscore(off,size,typ,freq,totyp) long off;
+.br
+This function should assign a score to a register variable,
+the score should preferably be the estimated number of bytes
+gained when it is put in a register.
+Off and size are the offset and size of the variable,
+typ is the type, that is reg_any, reg_pointer, reg_loop or reg_float.
+Freq is the number of times it occurs statically, and totyp
+is the type of the register it is planned to go into.
+.br
+Keep in mind that the gain should be net, that is the cost for
+register save/restore sequences and the cost of initialisation
+in the case of parameters should already be included.
+.IP -
+i_regsave()
+.br
+This function is called at the start of a procedure, just before
+register saves are done.
+It can be used to initialise some variables if needed.
+.IP -
+f_regsave()
+.br
+This function is called at end of the register save sequence.
+It can be used to do the real saving if multiple register move
+instructions are available.
+.IP -
+regsave(regstr,off,size) char *regstr; long off;
+.br
+Should either do the real saving or set up a table to have
+it done by f_regsave.
+Note that initialisation of parameters should also be done,
+or planned here.
+.IP -
+regreturn()
+.br
+Should restore saved registers and return.
+The function result is already in the function return area by now.
+.NH 1
+Examples
+.PP
+Here are some examples out of the PDP 11 table
+.DS
+lol inreg($1)==2| |		| regvar($1)			| |
+
+lil inreg($1)==2| |		| {regdef2, regvar($1)}		| |
+
+stl inreg($1)==2| xsource2 |
+			remove(regvar($1))
+			move(%[1],regvar($1))              |       | |
+
+inl inreg($1)==2| |     remove(regvar($1))
+			"inc %(regvar($1)%)"
+			setcc(regvar($1))          |       | |
+.NH 1
+Afterthoughts.
+.PP
+At the time of this writing the tables for the PDP 11 and the M68000 and
+the VAX are converted, in all cases the two byte wordsize versions.
+No big problems have occurred, but experience has shown that it is
+necessary to check your table carefully for all patterns with locals in them
+because if you forget one code will be generated by that one coderule
+to use the memoryslot the local is not in.
+
--- a/doc/toolkit.doc
+++ b/doc/toolkit.doc
@ -0,0 +1,896 @@
+.RP
+.ND
+.nr LL 78m
+.tr ~
+.ds as *
+.TL
+A Practical Tool Kit for Making Portable Compilers
+.AU
+Andrew S. Tanenbaum
+Hans van Staveren
+E. G. Keizer
+Johan W. Stevenson
+.AI
+Mathematics Dept.
+Vrije Universiteit
+Amsterdam, The Netherlands
+.AB
+The Amsterdam Compiler Kit is an integrated collection of programs designed to
+simplify the task of producing portable (cross) compilers and interpreters.
+For each language to be compiled, a program (called a front end) 
+must be written to
+translate the source program into a common intermediate code.
+This intermediate code can be optimized and then either directly interpreted
+or translated to the assembly language of the desired target machine.
+The paper describes the various pieces of the tool kit in some detail, as well
+as discussing the overall strategy.
+.sp
+Keywords: Compiler, Interpreter, Portability, Translator
+.sp
+CR Categories: 4.12, 4.13, 4.22
+.sp 12
+Author's present addresses:
+  A.S. Tanenbaum, H. van Staveren, E.G. Keizer: Mathematics
+     Dept., Vrije Universiteit, Postbus 7161, 1007 MC Amsterdam,
+     The Netherlands
+
+  J.W. Stevenson: NV Philips, S&I, T&M, Building TQ V5, Eindhoven,
+     The Netherlands
+.AE
+.NH 1
+Introduction
+.PP
+As more and more organizations acquire many micro- and minicomputers,
+the need for portable compilers is becoming more and more acute.
+The present situation, in which each hardware vendor provides its own
+compilers -- each with its own deficiencies and extensions, and none of them
+compatible -- leaves much to be desired.
+The ideal situation would be an integrated system containing a family
+of (cross) compilers, each compiler accepting a standard source language and
+producing code for a wide variety of target machines.
+Furthermore, the compilers should be compatible, so programs written in
+one language can call procedures written in another language.
+Finally, the system should be designed so as to make adding new languages
+and new machines easy.
+Such an integrated system is being built at the Vrije Universiteit.
+Its design and implementation is the subject of this article.
+.PP
+Our compiler building system, which is called the "Amsterdam Compiler Kit"
+(ACK), can be thought of as a "tool kit."
+It consists of a number of parts that can be combined to form compilers
+(and interpreters) with various properties.
+The tool kit is based on an idea (UNCOL) that was first suggested in 1960
+[7], but which never really caught on then.
+The problem which UNCOL attempts to solve is how to make a compiler for
+each of
+.I N
+languages on
+.I M
+different machines without having to write 
+.I N
+x
+.I M
+programs.
+.PP
+As shown in Fig. 1, the UNCOL approach is to write
+.I N
+"front ends," each
+of which translates one source language to a common intermediate language,
+UNCOL (UNiversal Computer Oriented Language), and
+.I M
+"back ends," each
+of which translates programs in UNCOL to a specific machine language.
+Under these conditions, only
+.I N
+
+.I M
+programs must be written to provide all
+.I N
+languages on all
+.I M
+machines, instead of 
+.I N
+x
+.I M
+programs.
+.PP
+Various researchers have attempted to design a suitable UNCOL
+[2,8], but none of these have become popular.
+It is our belief that previous attempts have failed because they have been
+too ambitious, that is, they have tried to cover all languages
+and all machines using a single UNCOL.
+Our approach is more modest: we cater only to algebraic languages
+and machines whose memory consists of 8-bit bytes, each with its own address.
+Typical languages that could be handled include
+Ada, ALGOL 60, ALGOL 68, BASIC, C, FORTRAN,
+Modula, Pascal, PL/I, PL/M, PLAIN, and RATFOR,
+whereas COBOL, LISP, and SNOBOL would be less efficient.
+Examples of machines that could be included are the Intel 8080 and 8086,
+Motorola 6800, 6809, and 68000, Zilog Z80 and Z8000, DEC PDP-11 and VAX,
+and IBM 370 but not the Burroughs 6700, CDC Cyber, or Univac 1108 (because
+they are not byte-oriented).
+With these restrictions, we believe the old UNCOL idea can be used as the
+basis of a practical compiler-building system.
+.KF
+.sp 15P
+.ce 1
+Fig. 1.  The UNCOL model.
+.sp
+.KE
+.NH 1
+An Overview of the Amsterdam Compiler Kit
+.PP
+The tool kit consists of eight components:
+.sp
+  1. The preprocessor.
+  2. The front ends.
+  3. The peephole optimizer.
+  4. The global optimizer.
+  5. The back end.
+  6. The target machine optimizer.
+  7. The universal assembler/linker.
+  8. The utility package.
+.sp
+.PP
+A fully optimizing compiler,
+depicted in Fig. 2, has seven cascaded phases.
+Conceptually, each component reads an input file and writes a
+transformed output file to be used as input to the next component.
+In practice, some components may use temporary files to allow multiple
+passes over the input or internal intermediate files.
+.KF
+.sp 12P
+.ce 1
+Fig. 2.  Structure of the Amsterdam Compiler Kit.
+.sp
+.KE
+.PP
+In the following paragraphs we will briefly describe each component.
+After this overview, we will look at all of them again in more detail.
+A program to be compiled is first fed into the (language independent)
+preprocessor, which provides a simple macro facility,
+and similar textual facilties.
+The preprocessor's output is a legal program in one of the programming
+languages supported, whereas the input is a program possibly augmented
+with macros, etc.
+.PP
+This output goes into the appropriate front end, whose job it is to
+produce intermediate code.
+This intermediate code (our UNCOL) is the machine language for a simple
+stack machine called EM (Encoding Machine).
+A typical front end might build a parse tree from the input, and then
+use the parse tree to generate EM code, which is similar to reverse Polish.
+In order to perform this work, the front end has to maintain tables of
+declared variables, labels, etc., determine where to place the
+data structures in memory, and so on.
+.PP
+The EM code generated by the front end is fed into the peephole optimizer,
+which scans it with a window of a few instructions, replacing certain
+inefficient code sequences by better ones.
+Such a search is important because EM contains instructions to handle
+numerous important special cases efficiently
+(e.g., incrementing a variable by 1).
+It is our strategy to relieve the front ends of the burden of hunting for
+special cases because there are many front ends and only one peephole
+optimizer.
+By handling the special cases in the peephole optimizer, 
+the front ends become simpler, easier to write and easier to maintain.
+.PP
+Following the peephole optimizer is a global optimizer [5], which
+unlike the peephole optimizer, examines the program as a whole.
+It builds a data flow graph to make possible a variety of 
+global optimizations,
+among them, moving invariant code out of loops, avoiding redundant
+computations, live/dead analysis and eliminating tail recursion.
+Note that the output of the global optimizer is still EM code.
+.PP
+Next comes the back end, which differs from the front ends in a
+fundamental way.
+Each front end is a separate program, whereas the back end is a single
+program that is driven by a machine dependent driving table.
+The driving table for a specific machine tells how the EM code is mapped
+onto the machine's assembly language.
+Although a simple driving table might just macro expand each EM instruction
+into a sequence of target machine instructions, a much more sophisticated
+translation strategy is normally used, as described later.
+For speed, the back end does not actually read in the driving table at run time.
+Instead, the tables are compiled along with the back end in advance, resulting
+in one binary program per machine.
+.PP
+The output of the back end is a program in the assembly language of some
+particular machine.
+The next component in the pipeline reads this program and performs peephole
+optimization on it.
+The optimizations performed here involve idiosyncracies
+of the target machine that cannot be performed in the machine-independent
+EM-to-EM peephole optimizer.
+Typically these optimizations take advantage of special instructions or special
+addressing modes.
+.PP
+The optimized target machine assembly code then goes into the final
+component in the pipeline, the universal assembler/linker.
+This program assembles the input to object format, extracting routines from
+libraries and including them as needed.
+.PP
+The final component of the tool kit is the utility package, which contains
+various test programs, interpreters for EM code, 
+EM libraries, conversion programs, and other aids for the implementer and
+user.
+.NH 1
+The Preprocessor
+.PP
+The function of the preprocessor is to extend all the programming languages
+by adding certain generally useful facilities to them in a uniform way.
+One of these is a simple macro system, in which the user can give names to
+character strings.
+The names can be used in the program, with the knowledge that they will be
+macro expanded prior to being input to the front end.
+Macros can be used for named constants, expanding short "procedures"
+in line, etc.
+.PP
+Another useful facility provided by the preprocessor is the ability to
+include compile-time libraries.
+On large projects, it is common to have all the declarations and definitions
+gathered together in a few files that are textually included in the programs
+by instructing the preprocessor to read them in, thus fooling the front end
+into thinking that they were part of the source program.
+.PP
+A third feature of the preprocessor is conditional compilation.
+The input program can be split up into labeled sections.
+By setting flags, some of the sections can be deleted by the preprocessor,
+thus allowing a family of slightly different programs to be conveniently stored
+on a single file.
+.NH 1
+The Front Ends
+.PP
+A front end is a program that converts input in some source language to a
+program in EM.
+At present, front ends 
+exist or are in preparation for Pascal, C, and Plain, and are being considered
+for Ada, ALGOL 68, FORTRAN 77, and Modula 2.
+Each of the present front ends is independent of all the other ones,
+although a general-purpose, table-driven front end is conceivable, provided
+one can devise a way to express the semantics of the source language in the
+driving tables.
+The Pascal front end uses a top-down parsing algorithm (recursive descent),
+whereas the C and Plain front ends are bottom-up.
+.PP
+All front ends, independent of the language being compiled,
+produce a common intermediate code called EM, which is
+the assembly language for a simple stack machine.
+The EM machine is based on a memory architecture
+containing a stack for local variables, a (static) data area for variables
+declared in the outermost block and global to the whole program, and a heap
+for dynamic data structures.
+In some ways EM resembles P-code [6], but is more general, since it is
+intended for a wider class of languages than just Pascal.
+.PP
+The EM instruction set has been described elsewhere
+[9,10,11]
+so we will only briefly summarize it here.
+Instructions exist to:
+.sp
+  1. Load a variable or constant of some length onto the stack.
+  2. Store the top item on the stack in memory.
+  3. Add, subtract, multiply, divide, etc. the top two stack items.
+  4. Examine the top one or two stack items and branch conditionally.
+  5. Call procedures and return from them.
+.sp
+.PP
+Loads and stores come in several variations, corresponding to the most common
+programming language semantics, for example, constants, simple variables,
+fields of a record, elements of an array, and so on.
+Distinctions are also made between variables local to the current block
+(i.e., stack frame), those in the outermost block (static storage), and those
+at intermediate lexicographic levels, which are accessed by following the
+static chain at run time.
+.PP
+All arithmetic instructions have a type (integer, unsigned, real,
+pointer, or set) and an
+operand length, which may either be explicit or may be popped from the stack
+at run time.
+Monadic branch instructions pop an item from the stack and branch if it is
+less than zero, less than or equal to zero, etc.
+Dyadic branch instructions pop two items, compare them, and branch accordingly.
+.PP
+In addition to these basic EM instructions, there is a collection of special
+purpose instructions (e.g., to increment a local variable), which are typically
+produced from the simple ones by the peephole optimizer.
+Although the complete EM instruction set contains nearly 150 instructions,
+only about 60 of them are really primitive; the rest are simply abbreviations
+for commonly occurring EM instruction sequences.
+.PP
+Of particular interest is the way object sizes are parametrized.
+The front ends allow the user to indicate how many bytes an integer, real, etc.
+should occupy.
+Given this information, the front ends can allocate memory, determining 
+the placement of variables within the stack frame.
+Sizes for primitive types are restricted to 8, 16, 32, 64, etc. bits.
+The front ends are also parametrized by the target machine's word length
+and address size so they can tell, for example, how many "load" instructions
+to generate to move a 32-bit integer.
+In the examples used henceforth,
+we will assume a 16-bit word size and 16-bit integers.
+.PP
+Since only byte-addressable target machines are permitted,
+it is nearly
+always possible to implement any requested sizes on any target machine.
+For example, the designer of the back end tables for the Z80 should provide
+code for 8-, 16-, and 32-bit arithmetic.
+In our view, the Pascal, C, or Plain programmer specifies what lengths 
+are needed,
+without reference to the target machine,
+and the back end provides it.
+This approach greatly enhances portability.
+While it is true that doing all arithmetic using 32-bit integers on the Z80
+will not be terribly fast, we feel that if that is what the programmer needs,
+it should be possible to implement it.
+.PP
+Like all assembly languages, EM has not only machine instructions, but also
+pseudoinstructions.
+These are used to indicate the start and end of each procedure, allocate
+and initialize storage for data, and similar functions.
+One particularly important pseudoinstruction is the one that is used to
+transmit information to the back end for optimization purposes.
+It can be used to suggest variables that are good candidates to assign to
+registers, delimit the scope of loops, indicate that certain variables 
+contain a useful value (next operation is a load) or not (next operation is
+a store), and various other things.
+.NH 1
+The Peephole Optimizer
+.PP
+The peephole optimizer reads in unoptimized EM programs and writes out
+optimized ones.
+Both the input and output are expressed in a highly compact code, rather than
+in ASCII, to reduce the i/o time, which would otherwise dominate the CPU
+time.
+The program itself is table driven, and is, by and large, ignorant of the
+semantics of EM.
+The knowledge of EM is contained in a
+language- and machine-independent table consisting of about 400
+pattern-replacement pairs.
+We will briefly describe the kinds of optimizations it performs below;
+a more complete discussion can be found in [9].
+.PP
+Each line in the driving table describes one optimization, consisting of a
+pattern part and a replacement part.
+The pattern part is a series of one or more EM instructions and a boolean
+expression.
+The replacement part is a series of EM instructions with operands.
+A typical optimization might be:
+.sp
+  LOL  LOC  ADI  STL  ($1 = $4) and ($2 = 1) and ($3 = 2) ==> INL $1
+.sp
+where the text prior to the ==> symbol is the pattern and the text after it is
+the replacement.
+LOL loads a local variable onto the stack, LOC loads a constant onto the stack,
+ADI is integer addition, and STL is store local.
+The pattern specifies that four consecutive EM instructions are present, with
+the indicated opcodes, and that furthermore the operand of the first 
+instruction (denoted by $1) and the fourth instruction (denoted by $4) are the
+same, the constant pushed by LOC is 1, and the size of the integers added by
+ADI is 2 bytes.
+(EM instructions have at most one operand, so it is not necessary to specify
+the operand number.)
+Under these conditions, the four instructions can be replaced by a single INL
+(increment local) instruction whose operand is equal to that of LOL.
+.PP
+Although the optimizations cover a wide range, the main ones
+can be roughly divided into the following categories.
+\fIConstant folding\fR
+is used to evaluate constant expressions, such as 2*3~+~7 at
+compile time instead of run time.
+\fIStrength reduction\fR
+is used to replace one operation, such as multiply, by
+another, such as shift.
+\fIReordering of expressions\fR
+helps in cases like -K/5, which can be better
+evaluated as K/-5, because the former requires
+a division and a negation, whereas the latter requires only a division.
+\fINull instructions\fR
+include resetting the stack pointer after a call with 0 parameters,
+offsetting zero bytes to access the
+first element of a record, or jumping to the next instruction.
+\fISpecial instructions\fR
+are those like INL, which deal with common special cases
+such as adding one to a variable or comparing something to zero.
+\fIGroup moves\fR
+are useful because a sequence
+of consecutive moves can often be replaced with EM code
+that allows the back end to generate a loop instead of in line code.
+\fIDead code elimination\fR
+is a technique for removing unreachable statements, possibly made unreachable
+by previous optimizations.
+\fIBranch chain compression\fR
+can be applied when a branch instruction jumps to another branch instruction.
+The first branch can jump directly to the final destination instead of
+indirectly.
+.PP
+The last two optimizations logically belong in the global optimizer but are
+in the local optimizer for historical reasons (meaning that the local
+optimizer has been the only optimizer for many years and the optimizations were
+easy to do there).
+.NH 1
+The Global Optimizer
+.PP
+In contrast to the peephole optimizer, which examines the EM code a few lines
+at a time through a small window, the global optimizer examines the 
+program's large scale structure.
+Three distinct types of optimizations can be found here:
+.sp
+  1. Interprocedural optimizations.
+  2. Intraprocedural optimizations.
+  3. Basic block optimizations.
+.sp
+We will now look at each of these in turn.
+.PP
+Interprocedural optimizations are those spanning procedure boundaries.
+The most important one is deciding to expand procedures in line,
+especially short procedures that occur in loops and pass several parameters.
+If it takes more time or memory to pass the parameters than to do the work,
+the program can be improved by eliminating the procedure.
+The inverse optimization -- discovering long common code sequences and
+turning them into a procedure -- is also possible, but much more difficult.
+Like much of the global optimizer's work, the decision to make or not make
+a certain program transformation is a heuristic one, based on knowledge of
+how the back end works, how most target machines are organized, etc.
+.PP
+The heart of the global optimizer is its analysis of individual
+procedures.
+To perform this analysis, the optimizer must locate the basic blocks,
+instruction sequences which can be entered only at the top and exited
+only at the bottom.
+It then constructs a data flow graph, with the basic blocks as nodes and
+jumps between blocks as arcs.
+.PP
+From the data flow graph, many important properties of the program can be
+discovered and exploited.
+Chief among these is the presence of loops, indicated by cycles in the graph.
+One important optimization is looking for code that can be moved outside the
+loop, either prior to it or subsequent to it.
+Such code motion saves execution time, although it does not save memory.
+Unrolling loops is also possible and desirable in some cases.
+.PP
+Another area in which global analysis of loops is especially important is
+in register allocation. 
+While it is true that EM does not have any registers to allocate,
+the optimizer can easily collect information to allow the
+back end to allocate registers wisely.
+For example, the global optimizer can collect static frequency-of-use
+and live/dead information about variables.
+(A variable is dead at some point in the program if its current value is
+not needed, i.e., the next reference to it overwrites it rather than
+reading it; if the current value will eventually be used, the variable is
+live.)
+If two variables are never simultaneously live over some interval of code
+(e.g., the body of a loop), they can be packed into a single variable,
+which, if used often enough, may warrant being assigned to a register.
+.PP
+Many loops involve arrays: this leads to other optimizations.
+If an array is accessed sequentially, with each iteration using the next
+higher numbered element, code improvement is often possible.
+Typically, a pointer to the bottom element of each array can be set up
+prior to the loop.
+Within the loop the element is accessed indirectly via the pointer, which is
+also incremented by the element size on each iteration.
+If the target machine has an autoincrement addressing mode and the pointer
+is assigned to a register, an array access can often be done in a single
+instruction.
+.PP
+Other intraprocedural optimizations include removing tail recursion
+(last statement is a recursive call to the procedure itself),
+topologically sorting the basic blocks to minimize the number of branch
+instructions, and common subexpression recognition.
+.PP
+The third general class of optimizations done by the global optimizer is
+improving the structure of a basic block.
+For the most part these involve transforming arithmetic or boolean
+expressions into forms that are likely to result in better target code.
+As a simple example, A~+~B*C can be converted to B*C~+~A.
+The latter can often
+be handled by loading B into a register, multiplying the register by C, and
+then adding in A, whereas the former may involve first putting A into a
+temporary, depending on the details of the code generation table.
+Another example of this kind of basic block optimization is transforming
+-B~+~A~<~0 into the equivalent, but simpler, A~<~B.
+.NH 1
+The Back End
+.PP
+The back end reads a stream of EM instructions and generates assembly code
+for the target machine.
+Although the algorithm itself is machine independent, for each target
+machine a machine dependent driving table must be supplied.
+The driving table effectively defines the mapping of EM code to target code.
+.PP
+It will be convenient to think of the EM instructions being read as a
+stream of tokens.
+For didactic purposes, we will concentrate on two kinds of tokens:
+those that load something onto the stack, and those that perform some operation
+on the top one or two values on the stack.
+The back end maintains at compile time a simulated stack whose behavior
+mirrors what the stack of a hardware EM machine would do at run time.
+If the current input token is a load instruction, a new entry is pushed onto
+the simulated stack.
+.PP
+Consider, as an example, the EM code produced for the statement K~:=~I~+~7.
+If K and I are
+2-byte local variables, it will normally be LOL I; LOC 7; ADI~2; STL K.
+Initially the simulated stack is empty.
+After the first token has been read and processed, the simulated stack will
+contain a stack token of type MEM with attributes telling that it is a local,
+giving its address, etc.
+After the second token has been read and processed, the top two tokens on the
+simulated stack will be CON (constant) on top and MEM directly underneath it.
+.PP
+At this point the back end reads the ADI~2 token and
+looks in the driving table to find a line or lines that define the
+action to be taken for ADI~2.
+For a typical multiregister machine, instructions will exist to add constants
+to registers, but not to memory.
+Consequently, the driving table will not contain an entry for ADI~2 with stack
+configuration CON, MEM.
+.PP
+The back end is now faced with the problem of how to get from its
+current stack configuration, CON, MEM, which is not listed, to one that is
+listed.
+The table will normally contain rules (which we call "coercions")
+for converting between CON, REG, MEM, and similar tokens.
+Therefore the back end attempts to "coerce" the stack into a configuration
+that
+.I is
+present in the table.
+A typical coercion rule might tell how to convert a MEM into
+a REG, namely by performing the actions of allocating a
+register and emitting code to move the memory word to that register.
+Having transformed the compile-time stack into a configuration allowed for
+ADI~2, the rule can be carried out.
+A typical rule 
+for ADI~2 might have stack configuration REG, MEM
+and would emit code to add the MEM to the REG, leaving the stack
+with a single REG token instead of the REG and MEM tokens present before the
+ADI~2.
+.PP
+In general, there will be more than one possible coercion path.
+Assuming reasonable coercion rules for our example,
+we might be able to convert
+CON MEM into CON REG by loading the variable I into a register.
+Alternatively, we could coerce CON to REG by loading the constant into a register.
+The first coercion path does the add by first loading I into a register and
+then adding 7 to it.
+The second path first loads 7 into a register and then adds I to it.
+On machines with a fast LOAD IMMEDIATE instruction for small constants
+but no fast ADD IMMEDIATE, or vice
+versa, one code sequence will be preferable to the other.
+.PP
+In fact, we actually have more choices than suggested above.
+In both coercion paths a register must be allocated.
+On many machines, not every register can be used in every operation, so the
+choice may be important.
+On some machines, for example, the operand of a multiply must be in an odd
+register.
+To summarize, from any state (i.e., token and stack configuration), a
+variety of choices can be made, leading to a variety of different target
+code sequences.
+.PP
+To decide which of the various code sequences to emit, the back end must have
+some information about the time and memory cost of each one.
+To provide this information, each rule in the driving table, including
+coercions, specifies both the time and memory cost of the code emitted when
+the rule is applied.
+The back end can then simply try each of the legal possibilities (including all
+the possible register allocations) to find the cheapest one.
+.PP
+This situation is similar to that found in a chess or other game-playing
+program, in which from any state a finite number of moves can be made.
+Just as in a chess program, the back end can look at all the "moves" that can
+be made from each state reachable from the original state, and thus find the
+sequence that gives the minimum cost to a depth of one.
+More generally, the back end can evaluate all paths corresponding to accepting
+the next
+.I N
+input tokens, find the cheapest one, and then make the first move along
+that path, precisely the way a chess program would.
+.PP
+Since the back end is analogous to both a parser and a chess playing program,
+some clarifying remarks may be helpful.
+First, chess programs and the back end must do some look ahead, whereas the
+parser for a well-designed grammar can usually suffice with one input token
+because grammars are supposed to be unambiguous.
+In contrast, many legal mappings
+from a sequence of EM instructions to target code may exist.
+Second, like a parser but unlike a chess program, the back end has perfect
+information -- it does not have to contend with an unpredictable opponent's
+moves.
+Third, chess programs normally make a static evaluation of the board and
+label the
+.I nodes
+of the tree with the resulting scores.
+The back end, in contrast, associates costs with
+.I arcs
+(moves) rather than nodes (states).
+However, the difference is not essential, since it could 
+also label each node with the cumulative cost from the root to that node.
+.PP
+As mentioned above, the cost field in the table contains
+.I both
+the time and memory costs for the code emitted.
+It should be clear that the back end could use either one
+or some linear combination of them as the scoring function for evaluating moves.
+A user can instruct the compiler to optimize for time or for memory or
+for, say,  0.3 x time + 0.7 x memory.
+Thus the same compiler can provide a wide range of performance options to
+the user.
+The writer of the back end table can take advantage of this flexibility by
+providing several code sequences with different tradeoffs for each EM
+instruction (e.g., in line code vs. call to a run time routine).
+.PP
+In addition to the time-space tradeoffs, by specifying the depth of search
+parameter,
+.I N ,
+the user can effectively also tradeoff compile time vs. object
+code quality, for whatever code metric has been chosen.
+In summary, by combining the properties of a parser and a game playing program,
+it is possible to make a code generator that is table driven,
+highly flexible, and has the ability to produce good code from a
+stack machine intermediate code.
+.NH 1
+The Target Machine Optimizer
+.PP
+In the model of Fig 2., the peephole optimizer comes before the global
+optimizer.
+It may happen that the code produced by the global optimizer can also
+be improved by another round of peephole optimization.
+Conceivably, the system could have been designed to iterate peephole and
+global optimizations until no more of either could be performed.
+.PP
+However, both of these optimizations are done on the machine independent
+EM code.
+Neither is able to take advantage of the peculiarities and idiosyncracies with
+which most target machines are well endowed.
+It is the function of the final 
+optimizer to do any (peephole) optimizations that still remain.
+.PP
+The algorithm used here is the same as in the EM peephole optimizer.
+In fact, if it were not for the differences between EM syntax, which is
+very restricted, and target assembly language syntax,
+which is less so, precisely the same program could be used for both.
+Nevertheless, the same ideas apply concerning patterns and replacements, so
+our discussion of this optimizer will be restricted to one example.
+.PP
+To see what the target optimizer might do, consider the
+PDP-11 instruction sequence sub #2,r0;  mov (r0),x.
+First 2 is subtracted from register 0, then the word pointed to by it
+is moved to x.
+The PDP-11 happens to have an addressing mode to perform this sequence in
+one instruction: mov -(r0),x.
+Although it is conceivable that this instruction could be included in the
+back end driving table for the PDP-11, it is awkward to do so because it
+can occur in so many contexts.
+It is much easier to catch things like this in a separate program.
+.NH 1
+The Universal Assembler/Linker
+.PP
+Although assembly languages for different machines may appear very different
+at first glance, they have a surprisingly large intersection.
+We have been able to construct an assembler/linker that is almost entirely
+independent of the assembly language being processed.
+To tailor the program to a specific assembly language, it is necessary to
+supply a table giving the list of instructions, the bit patterns required for
+each one, and the language syntax.
+The machine independent part of the assembler/linker is then compiled with the
+table to produce an assembler and linker for a particular target machine.
+Experience has shown that writing the necessary table for a new machine can be
+done in less than a week.
+.PP
+To enforce a modicum of uniformity, we have chosen to use a common set of
+pseudoinstructions for all target machines.
+They are used to initialize memory, allocate uninitialized memory, determine the
+current segment, and similar functions found in most assemblers.
+.PP
+The assembler is also a linker.
+After assembling a program, it checks to see if there are any
+unsatisfied external references.
+If so, it begins reading the libraries to find the necessary routines, including
+them in the object file as it finds them.
+This approach requires libraries to be maintained in assembly language form,
+but eliminates the need for inventing a language to express relocatable
+object programs in a machine independent way.
+It also simplifies the assembler, since producing absolute object code is
+easier than producing relocatable object code.
+Finally, although assembly language libraries may be somewhat larger than
+relocatable object module libraries, the loss in speed due to having more
+input may be more than compensated for by not having to pass an intermediate
+file between the assembler and linker.
+.NH 1
+The Utility Package
+.PP
+The utility package is a collection of programs designed to aid the
+implementers of new front ends or new back ends.
+The most useful ones are the test programs.
+For example, one test set, EMTEST, systematically checks out a back end by
+executing an ever larger subset of the EM instructions.
+It starts out by testing LOC, LOL and a few of the other essential instructions.
+If these appear to work, it then tries out new instructions one at a time,
+adding them to the set of instructions "known" to work as they pass the tests.
+.PP
+Each instruction is tested with a variety of operands chosen from values 
+where problems can be expected.
+For example, on target machines which have 16-bit index registers but only
+allow 8-bit displacements, a fundamentally different algorithm may be needed
+for accessing
+the first few bytes of local variables and those with offsets of thousands.
+The test programs have been carefully designed to thoroughly test all relevant
+cases.
+.PP
+In addition to EMTEST, test programs in Pascal, C, and other languages are also
+available.
+A typical test is:
+.sp
+   i := 9; \fBif\fP i + 250 <> 259 \fBthen\fP error(16);
+.sp
+Like EMTEST, the other test programs systematically exercise all features of the
+language being tested, and do so in a way that makes it possible to pinpoint
+errors precisely.
+While it has been said that testing can only demonstrate the presence of errors
+and not their absence, our experience is that 
+the test programs have been invaluable in debugging new parts of the system
+quickly.
+.PP
+Other utilities include programs to convert
+the highly compact EM code produced by front ends to ASCII and vice versa,
+programs to build various internal tables from human writable input formats,
+a variety of libraries written in or compiled to EM to make them portable,
+an EM assembler, and EM interpreters for various machines.
+.PP
+Interpreting the EM code instead of translating it to target machine language
+is useful for several reasons.
+First, the interpreters provide extensive run time diagnostics including
+an option to list the original source program (in Pascal, C, etc.) with the
+execution frequency or execution time for each source line printed in the
+left margin.
+Second, since an EM program is typically about one-third the size of a
+compiled program, large programs can be executed on small machines.
+Third, running the EM code directly makes it easier to pinpoint errors in 
+the EM output of front ends still being debugged.
+.NH 1
+Summary and Conclusions
+.PP
+The Amsterdam Compiler Kit is a tool kit for building
+portable (cross) compilers and interpreters.
+The main pieces of the kit are the front ends, which convert source programs
+to EM code, optimizers, which improve the EM code, and back ends, which convert
+the EM code to target assembly language.
+The kit is highly modular, so writing one front end
+(and its associated runtime routines)
+is sufficient to implement
+a new language on a dozen or more machines, and writing one back end table
+and one universal assembler/linker table is all that is needed to bring up all
+the previously implemented languages on a new machine.
+In this manner, the contents, and hopefully the usefulness, of the toolkit
+will increase in time.
+.PP
+We believe the principal lesson to be learned from our work is that the old
+UNCOL idea is basically a sound way to produce compilers, provided suitable
+restrictions are placed on the source languages and target machines.
+We also believe that although compilers produced by this technology may not
+be equal to the very best handcrafted compilers,
+in terms of object code quality, they are certainly
+competitive with many existing compilers.
+However, when one factors in the cost of producing the compiler,
+the possible slight loss in performance may be more than compensated for by the
+large decrease in production cost.
+As a consequence of our work and similar work by other researchers [1,3,4],
+we expect integrated compiler building kits to become increasingly popular
+in the near future.
+.PP
+The toolkit is now available for various computers running the
+.UX
+operating system.
+For information, contact the authors.
+.NH 1
+References
+.LP
+.nr r 0 1
+.in +4
+.ti -4
+\fB~\n+r.\fR Graham, S.L.
+Table-Driven Code Generation.
+.I "Computer~13" ,
+8 (August 1980), 25-34.
+.PP
+A discussion of systematic ways to do code generation,
+in particular, the idea of having a table with templates that match parts of
+the parse tree and convert them into machine instructions.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Haddon, B.K., and Waite, W.M.
+Experience with the Universal Intermediate Language Janus.
+.I "Software Practice & Experience~8" ,
+5 (Sept.-Oct. 1978), 601-616.
+.PP
+An intermediate language for use with ALGOL 68, Pascal, etc. is described.
+The paper discusses some problems encountered and how they were dealt with.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Johnson, S.C.
+A Portable Compiler: Theory and Practice.
+.I "Ann. ACM Symp. Prin. Prog. Lang." ,
+Jan. 1978.
+.PP
+A cogent discussion of the portable C compiler.
+Particularly interesting are the author's thoughts on the value of
+computer science theory.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Leverett, B.W., Cattell, R.G.G, Hobbs, S.O., Newcomer, J.M.,
+Reiner, A.H., Schatz, B.R., and Wulf, W.A.
+An Overview of the Production-Quality Compiler-Compiler Project.
+.I Computer~13 ,
+8 (August 1980), 38-49.
+.PP
+PQCC is a system for building compilers similar in concept but differing in
+details from the Amsterdam Compiler Kit.
+The paper describes the intermediate representation used and the code generation
+strategy.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Lowry, E.S., and Medlock, C.W.
+Object Code Optimization.
+.I "Commun.~ACM~12",
+(Jan. 1969), 13-22.
+.PP
+A classic paper on global object code optimization.
+It covers data flow analysis, common subexpressions, code motion, register
+allocation and other techniques.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Nori, K.V., Ammann, U., Jensen, K., Nageli, H.
+The Pascal P Compiler Implementation Notes.
+Eidgen. Tech. Hochschule, Zurich, 1975.
+.PP
+A description of the original P-code machine, used to transport the Pascal-P
+compiler to new computers.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Steel, T.B., Jr. UNCOL: the Myth and the Fact. in
+.I "Ann. Rev. Auto. Prog."
+Goodman, R. (ed.), vol 2., (1960), 325-344.
+.PP
+An introduction to the UNCOL idea by its originator.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Steel, T.B., Jr.
+A First Version of UNCOL.
+.I "Proc. Western Joint Comp. Conf." ,
+(1961), 371-377.
+.PP
+The first detailed proposal for an UNCOL.  By current standards it is a
+primitive language, but it is interesting for its historical perspective.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Tanenbaum, A.S., van Staveren, H., and Stevenson, J.W.
+Using Peephole Optimization on Intermediate Code.
+.I "ACM Trans. Prog. Lang. and Sys. 3" ,
+1 (Jan. 1982) pp. 21-36.
+.PP
+A detailed description of a table-driven peephole optimizer.
+The driving table provides a list of patterns to match as well as the
+replacement text to use for each successful match.
+.sp 2
+.ti -4
+\fB\n+r.\fR Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
+Description of an Experimental Machine Architecture for use with Block
+Structured Languages.
+Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
+.PP
+The defining document for EM.
+.sp 2
+.ti -4
+\fB\n+r.\fR Tanenbaum, A.S.
+Implications of Structured Programming for Machine Architecture.
+.I "Comm. ACM~21" ,
+3 (March 1978), 237-246.
+.PP
+The background and motivation for the design of EM.
+This early version emphasized the idea of interpreting the intermediate
+code (then called EM-1) rather than compiling it.
--- a/doc/v7bugs.doc
+++ b/doc/v7bugs.doc
@ -0,0 +1,302 @@
+.wh 0 hd
+.wh 60 fo
+.de hd
+'sp 5
+..
+.de fo
+'bp
+..
+.nr e 0 1
+.de ER
+.br
+.ne 20
+.sp 2
+.in 5
+.ti -5
+ERROR \\n+e:
+..
+.de PS
+.sp
+.nf
+.in +5
+..
+.de PE
+.sp
+.fi
+.in -5
+..
+.sp 3
+.ce
+UNIX version 7 bugs
+.sp 3
+This document describes the UNIX version 7 errors fixed at the
+Vrije Universiteit, Amsterdam.
+Several of these are discovered at the VU.
+Others are quoted from a list of bugs distributed by BellLabs.
+.sp
+For each error the differences between the original and modified
+source files are given,
+as well as a test program.
+.ER
+C optimizer bug for unsigned comparison
+.sp
+The following C program caused an IOT trap, while it should not
+(compile with 'cc -O prog.c'):
+.PS
+unsigned	i = 0;
+
+main() {
+	register j;
+
+	j = -1;
+	if (i > 40000)
+		abort();
+}
+.PE
+BellLabs suggests to make the following patch in c21.c:
+.PS
+/* modified /usr/src/cmd/c/c21.c */
+
+189		if (r==0) {
+190	/* next 2 lines replaced as indicated by
+191	 * Bell Labs bug distribution ( v7optbug )
+192			p->back->back->forw = p->forw;
+193			p->forw->back = p->back->back;
+194	  End of lines changed */
+195			if (p->forw->op==CBR
+196			  || p->forw->op==SXT
+197			  || p->forw->op==CFCC) {
+198				p->back->forw = p->forw;
+199				p->forw->back = p->back;
+200			} else {
+201				p->back->back->forw = p->forw;
+202				p->forw->back = p->back->back;
+203			}
+204	/* End of new lines */
+205			decref(p->ref);
+206			p = p->back->back;
+207			nchange++;
+208		} else if (r>0) {
+.PE
+Use the previous program to test before and after the modification.
+.ER
+The loader fails for large data or text portions
+.sp
+The loader 'ld' produces a "local symbol botch" error
+for the following C program.
+.PS
+int	big1[10000] = {
+	1
+};
+int	big2[10000] = {
+	2
+};
+
+main() {
+	printf("loader is fine\\n");
+}
+.PE
+We have made the following fix:
+.PS
+/* original /usr/src/cmd/ld.c */
+
+113	struct {
+114		int	fmagic;
+115		int	tsize;
+116		int	dsize;
+117		int	bsize;
+118		int	ssize;
+119		int	entry;
+120		int	pad;
+121		int	relflg;
+122	} filhdr;
+
+/* modified /usr/src/cmd/ld.c */
+
+113	/*
+114	 * The original Version 7 loader had problems loading large
+115	 * text or data portions.
+116	 * Why not include <a.out.h> ???
+117	 * then they would be declared unsigned
+118	 */
+119	struct {
+120		int	fmagic;
+121		unsigned	tsize;		/* not int !!! */
+122		unsigned	dsize;		/* not int !!! */
+123		unsigned	bsize;		/* not int !!! */
+124		unsigned	ssize;		/* not int !!! */
+125		unsigned	entry;		/* not int !!! */
+126		unsigned	pad;		/* not int !!! */
+127		unsigned	relflg;		/* not int !!! */
+128	} filhdr;
+.PE
+.ER
+Floating point registers
+.sp
+When a program is swapped to disk if it needs more memory,
+then the floating point registers were not saved, so that
+it may have different registers when it is restarted.
+A small assembly program demonstrates this for the status register.
+If the error is not fixed, then the program generates an IOT error.
+A "memory fault" is generated if all is fine.
+.PS
+start:	ldfps	$7400
+1:	stfps	r0
+	mov	r0,-(sp)
+	cmp	r0,$7400
+	beq	1b
+	4
+.PE
+You have to dig into the kernel to fix it.
+The following patch will do:
+.PS
+/* original /usr/sys/sys/slp.c */
+
+563		a2 = malloc(coremap, newsize);
+564		if(a2 == NULL) {
+565			xswap(p, 1, n);
+566			p->p_flag |= SSWAP;
+567			qswtch();
+568			/* no return */
+569		}
+
+/* modified /usr/sys/sys/slp.c */
+
+590		a2 = malloc(coremap, newsize);
+591		if(a2 == NULL) {
+592	#ifdef FPBUG
+593			/*
+594			 * copy floating point register and status,
+595			 * but only if you must switch processes
+596			 */
+597			if(u.u_fpsaved == 0) {
+598				savfp(&u.u_fps);
+599				u.u_fpsaved = 1;
+600			}
+601	#endif
+602			xswap(p, 1, n);
+603			p->p_flag |= SSWAP;
+604			qswtch();
+605			/* no return */
+606		}
+.PE
+.ER
+Floating point registers.
+.sp
+A similar problem arises when a process forks.
+The child will have random floating point registers as is
+demonstrated by the following assembly language program.
+The child process will die by an IOT trap and the father prints
+the message "child failed".
+.PS
+exit	= 1.
+fork	= 2.
+write	= 4.
+wait	= 7.
+
+start:	ldfps	$7400
+	sys	fork
+	br	child
+	sys	wait
+	tst	r1
+	bne	bad
+	stfps	r2
+	cmp	r2,$7400
+	beq	start
+	4
+child:	stfps	r2
+	cmp	r2,$7400
+	beq	ex
+	4
+bad:	clr	r0
+	sys	write;mess;13.
+ex:	clr	r0
+	sys	exit
+
+	.data
+mess:	<child failed\\n>
+.PE
+The same file slp.c should be patched as follows:
+.PS
+/* original /usr/sys/sys/slp.c */
+
+499		/*
+500		 * When the resume is executed for the new process,
+501		 * here's where it will resume.
+502		 */
+503		if (save(u.u_ssav)) {
+504			sureg();
+505			return(1);
+506		}
+507		a2 = malloc(coremap, n);
+508		/*
+509		 * If there is not enough core for the
+510		 * new process, swap out the current process to generate the
+511		 * copy.
+512		 */
+
+/* modified /usr/sys/sys/slp.c */
+
+519		/*
+520		 * When the resume is executed for the new process,
+521		 * here's where it will resume.
+522		 */
+523		if (save(u.u_ssav)) {
+524			sureg();
+525			return(1);
+526		}
+527	#ifdef FPBUG
+528		/* copy the floating point registers and status to child */
+529		if(u.u_fpsaved == 0) {
+530			savfp(&u.u_fps);
+531			u.u_fpsaved = 1;
+532		}
+533	#endif
+534		a2 = malloc(coremap, n);
+535		/*
+536		 * If there is not enough core for the
+537		 * new process, swap out the current process to generate the
+538		 * copy.
+539		 */
+.PE
+.ER
+/usr/src/libc/v6/stat.c
+.sp
+Some system calls are changed from version 6 to version 7.
+A library of system call entries, that make a version 6 UNIX look like
+a version 7 system, is provided to enable you to run some
+useful version 7 utilities, like 'tar', on UNIX-6.
+The entry for 'stat' contained two bugs:
+the 24-bit file size was incorrectly converted to 32 bits
+(sign extension of bit 15)
+and the uid/gid fields suffered from sign extension.
+.sp
+Transferring your files from version 6 to version 7 using 'tar'
+will fail for all files for which
+.sp
+	( (size & 0100000) != 0 )
+.sp
+These two errors are fixed if stat.c is modified as follows:
+.PS
+/* original /usr/src/libc/v6/stat.c */
+
+11		char  os_size0;
+12		short os_size1;
+13		short os_addr[8];
+
+49		buf->st_nlink = osbuf.os_nlinks;
+50		buf->st_uid = osbuf.os_uid;
+51		buf->st_gid = osbuf.os_gid;
+52		buf->st_rdev = 0;
+
+/* modified /usr/src/libc/v6/stat.c */
+
+11		char  os_size0;
+12		unsigned os_size1;
+13		short os_addr[8];
+
+49		buf->st_nlink = osbuf.os_nlinks;
+50		buf->st_uid = osbuf.os_uid & 0377;
+51		buf->st_gid = osbuf.os_gid & 0377;
+52		buf->st_rdev = 0;
+.PE
--- a/doc/val.doc
+++ b/doc/val.doc
@ -0,0 +1,752 @@
+.ll 72
+.wh 0 hd
+.wh 60 fo
+.de hd
+'sp 5
+..
+.de fo
+'bp
+..
+.tr ~
+.               PARAGRAPH
+.de PP
+.sp
+..
+.               CHAPTER
+.de CH
+.br
+.ne 15
+.sp 3
+.in 0
+\\fB\\$1\\fR
+.in 5
+.PP
+..
+.               SUBCHAPTER
+.de SH
+.br
+.ne 10
+.sp
+.in 5
+\\fB\\$1\\fR
+.in 10
+.PP
+..
+.               INDENT START
+.de IS
+.sp
+.in +5
+..
+.               INDENT END
+.de IE
+.in -5
+.sp
+..
+.               DOUBLE INDENT START
+.de DS
+.sp
+.in +5
+.ll -5
+..
+.               DOUBLE INDENT END
+.de DE
+.ll +5
+.in -5
+.sp
+..
+.               EQUATION START
+.de EQ
+.sp
+.nf
+..
+.               EQUATION END
+.de EN
+.fi
+.sp
+..
+.               TEST
+.de TT
+.ti -5
+Test~\\$1:~
+.br
+..
+.               IMPLEMENTATION 1
+.de I1
+.br
+Implementation~1:
+..
+.               IMPLEMENTATION 2
+.de I2
+.br
+Implementation~2:
+..
+.de CS
+.br
+~-~\\
+..
+.br
+.fi
+.sp 5
+.ce
+\fBPascal Validation Suite Report\fR
+.CH "Pascal processor identification"
+The ACK-Pascal compiler produces code for an EM machine
+as defined in [1].
+It is up to the implementor of the EM machine whether errors like
+integer overflow, undefined operand and range bound error are recognized or not.
+Therefore it depends on the EM machine implementation whether these errors
+are recognized in Pascal programs or not.
+The validation suite results of all known implementations are given.
+.PP
+There does not (yet) exist a hardware EM machine.
+Therefore, EM programs must be interpreted, or translated into
+instructions for a target machine.
+The following implementations currently exist:
+.IS
+.I1
+an interpreter running on a PDP-11 (using UNIX).
+The normal mode of operation for this interpreter is to check
+for undefined integers, overflow, range errors etc.
+.sp
+.I2
+a translator into PDP-11 instructions (using UNIX).
+Less checks are performed than in the interpreter, because the translator
+is intended to speed up the execution of well-debugged programs.
+.IE
+.CH "Test Conditions"
+Tester: E.G. Keizer
+.br
+Date: October 1983
+.br
+Validation Suite version: 3.0
+.PP
+The final test run is made with a slightly
+modified validation suite.
+.SH "Erroneous programs"
+Some test did not conform to the standard proposal of February 1979.
+It is this version of the standard proposal that is used
+by the authors of the validation suite.
+.IS
+.TT 6.6.3.7-4
+The semicolon between high and integer on line 17 is replaced
+by a colon.
+.sp
+.TT 6.7.2.2-13
+The div operator on line 14 replaced by mod.
+.CH "Conformance tests"
+Number of tests passed = 150
+.br
+Number of tests failed = 6
+.SH "Details of failed tests"
+.IS
+.TT 6.1.2-1
+Character sequences starting with the 8 characters 'procedur'
+or 'function' are
+erroneously classified as the word-symbols 'procedure' and 'function'.
+.sp
+.TT 6.1.3-2
+Identifiers identical in the first eight characters, but
+differing in ninth or higher numbered characters are treated as
+identical.
+.sp
+.TT 6.5.1-1
+ACK-Pascal requires all formal program parameters to be
+declared with type \fIfile\fP.
+.sp
+.TT 6.6.6.5-1
+Gives run-time error eof seen at call to eoln.
+A have a hunch that this is a error in the suit.
+.sp
+.TT 6.6.4.1-1
+Redefining the names of some standard procedures leads to incorrect
+behaviour of the runtime system.
+In this case it crashes without a sensible error message.
+.sp
+.TT 6.9.3.5.1-1
+This test can not be translated by our compiler because two
+non-identical variables are used in the same block with the same first eight
+characters.
+The test passed after replacement of one of those names.
+.IE
+.CH "Deviance tests"
+Number of deviations correctly detected = 120
+.br
+Number of tests not detecting deviations = 20
+.SH "Details of deviations"
+The following tests are compiled without a proper error
+indication although they do
+not conform to the standard.
+.IS
+.TT 6.1.6-5
+ACK-Pascal allows labels in the range 0..32767.
+A warning is produced when testing for deviations from the
+standard.
+.sp
+.TT 6.1.8-5
+A missing space between a number and a word symbol is not
+detected.
+.sp
+.TT 6.2.2-8
+.TT 6.3-6
+.TT 6.4.1-3
+.TT 6.6.1-3
+.TT 6.6.1-4
+Undetected scope error. The scope of an identifier should start at the
+beginning of the block in which it is declared.
+In the ACK-Pascal compiler the scope starts just after the declaration,
+however.
+.sp
+.TT 6.4.3.3-7
+The values of fields from one variant are accessible from
+another variant.
+The correlation is exact.
+.sp
+.TT 6.6.3.3-4
+The passing as a variable parameter of the selector of a
+variant part is not detected.
+A runtime error is produced because the variant selector is not
+initialized.
+.sp
+.TT 6.8.2.4-2
+.TT 6.8.2.4-3
+.TT 6.8.2.4-4
+.TT 6.8.2.4-5
+.TT 6.8.2.4-6
+The ACK-Pascal compiler does not restrict the places from where
+you may jump to a label by means of a goto-statement.
+.sp
+.TT 6.8.3.9-5
+.TT 6.8.3.9-6
+.TT 6.8.3.9-7
+.TT 6.8.3.9-16
+There are no errors produced for assignments to a variable
+in use as control-variable of a for-statement.
+.TT 6.8.3.9-8
+.TT 6.8.3.9-9
+Use of a controlled variable after leaving the loop without
+intervening initialization is not detected.
+.IE
+.CH "Error handling"
+The results depend on the EM implementation.
+.sp
+Number of errors correctly detected =
+.in +5
+.I1
+32
+.I2
+17
+.in -5
+Number of errors not detected =
+.in +5
+.I1
+21
+.I2
+36
+.in -5
+Number of errors incorrectly detected =
+.in +5
+.I1
+2
+.I2
+2
+.in -5
+.SH "Details of errors not detected"
+The following test fails because the ACK-Pascal compiler only
+generates a warning that does not prevent to run the tests.
+.IS
+.TT 6.6.2-8
+A warning is produced if there is no assignment to a function-identifier.
+.IE
+With this test the ACK-Pascal compiler issues an error message for a legal
+construct not directly related to the error to be detected.
+.IS
+.TT 6.5.5-2
+Program does not compile.
+Buffer variable of text file is not allowed as variable
+parameter.
+.IE
+The following errors are not detected at all.
+.IS
+.TT 6.2.1-11
+.I2
+The use of an undefined integer is not caught as an error.
+.sp
+.TT 6.4.3.3-10
+.TT 6.4.3.3-11
+.TT 6.4.3.3-12
+.TT 6.4.3.3-13
+The notion of 'current variant' is not implemented, not even if a tagfield
+is present.
+.sp
+.TT 6.4.5-15
+.TT 6.4.6-9
+.TT 6.4.6-10
+.TT 6.4.6-11
+.TT 6.5.3.2-2
+.I2
+Subrange bounds are not checked.
+.sp
+.TT 6.4.6-12
+.TT 6.4.6-13
+.TT 6.7.2.4-4
+If the base-type of a set is a subrange, then the set elements are not checked
+against the bounds of the subrange.
+Only the host-type of this subrange-type is relevant for ACK-Pascal.
+.sp
+.TT 6.5.4-1
+.I2
+Nil pointers are not detected.
+.sp
+.TT 6.5.4-2
+.I2
+Undefined pointers are not detected.
+.sp
+.TT 6.5.5-3
+Changing the file position while the window is in use as actual variable
+parameter or as an element of the record variable list of a with-statement
+is not detected.
+.sp
+.TT 6.6.2-9
+An undefined function result is not detected,
+because it is never used in an expression.
+.sp
+.TT 6.6.5.3-6
+.TT 6.6.5.3-7
+Disposing a variable while it is in use as actual variable parameter or
+as an element of the record variable list of a with-statement is not detected.
+.sp
+.TT 6.6.5.3-8
+.TT 6.6.5.3-9
+.TT 6.6.5.3-10
+It is not detected that a record variable, created with the variant form
+of new, is used as an operand in an expression or as the variable in an
+assignment or as an actual value parameter.
+.sp
+.TT 6.6.5.3-11
+Use of a variable that is not reinitialized after a dispose is
+not detected.
+.sp
+.TT 6.6.6.4-4
+.TT 6.6.6.4-5
+.TT 6.6.6.4-7
+.I2
+There are no range checks for pred, succ and chr.
+.sp
+.TT 6.6.6.5-6
+ACK-Pascal considers a rewrite of a file as a defining
+occurence.
+.sp
+.TT 6.7.2.2-8
+.TT 6.7.2.2-9
+.TT 6.7.2.2-10
+.TT 6.7.2.2-12
+.I2
+Division by 0 or integer overflow is not detected.
+.sp
+.TT 6.8.3.9-18
+The use of the some control variable in two nested for
+statements in not detected.
+.sp
+.TT 6.8.3.9-19
+Access of a control variable after leaving the loop results in
+the final-value, although an error should be produced.
+.sp
+.TT 6.9.3.2-3
+The program stops with a file not open error.
+The rewrite before the write is missing in the program.
+.sp
+.TT 6.9.3.2-4
+.TT 6.9.3.2-5
+Illegal FracDigits values are not detected.
+.CH "Implementation dependence"
+Number of tests run = 14
+.br
+Number of tests incorrectly handled = 0
+.SH "Details of implementation dependence"
+.IS
+.TT 6.1.9-5
+Alternate comment delimiters are implemented
+.sp
+.TT 6.1.9-6
+The equivalent symbols @ for ^, (. for [ and .) for ] are not
+implemented.
+.sp
+.TT 6.4.2.2-10
+Maxint = 32767
+.sp
+.TT 6.4.3.4-5
+Only elements with non-negative ordinal value are allowed in sets.
+.sp
+.TT 6.6.6.1-1
+Standard procedures and functions are not allowed as parameters.
+.sp
+.TT 6.6.6.2-11
+Details of the machine characteristics regarding real numbers:
+.IS
+.nf
+beta =       2
+t =         56
+rnd =        1
+ngrd =       0
+machep =   -56
+negep =    -56
+iexp =       8
+minexp =  -128
+maxexp =   127
+eps =     1.387779e-17
+epsneg =  1.387779e-17
+xmin =    2.938736e-39
+xmax =    1.701412e+38
+.fi
+.IE
+.sp
+.TT 6.7.2.3-3
+.TT 6.7.2.3-4
+All operands of boolean expressions are evaluated.
+.sp
+.TT 6.8.2.2-1
+.TT 6.8.2.2-2
+The expression in an assignment statement is evaluated
+before the variable selection if this involves pointer
+dereferencing or array indexing.
+.sp
+.TT 6.8.2.3-2
+Actual parameters are evaluated in reverse order.
+.sp
+.TT 6.9.3.2-6
+The default width for integer, Boolean and real are 6, 5 and 13.
+.sp
+.TT 6.9.3.5.1-2
+The number of digits written in an exponent is 2.
+.sp
+.TT 6.9.3.6-1
+The representations of true and false are (~true) and (false).
+The parenthesis serve to indicate width.
+.IE
+.CH "Quality measurement"
+Number of tests run = 60
+.br
+Number of tests handled incorrectly = 1
+.SH "Results of tests"
+Several test perform operations on reals on indicate the error
+introduced by these operations.
+For each of these tests the following two quality measures are extracted:
+.sp
+.in +5
+maxRE:~~maximum relative error
+.br
+rmsRE:~~root-mean-square relative error
+.in -5
+.sp 2
+.IS
+.TT 1.2-1
+.I1
+25 thousand Whetstone instructions per second.
+.I2
+169 thousand Whetstone instructions per second.
+.sp
+.TT 1.2-2
+The value of (TRUEACC-ACC)*2^56/100000 is 1.4 .
+This is well within the bounds specified in [3].
+.br
+The GAMM measure is:
+.I1
+238 microseconds
+.I2
+26.3 microseconds.
+.sp
+.TT 1.2-3
+The number of procedure calls calculated in this test exceeds
+the maximum integer value.
+The program stops indicating overflow.
+.sp
+.TT 6.1.3-3
+The number of significant characters for identifiers is 8.
+.sp
+.TT 6.1.5-8
+There is no maximum to the line length.
+.sp
+.TT 6.1.5-9
+The error message "too many digits" is given for numbers larger
+than maxint.
+.sp
+.TT 6.1.5-10
+.TT 6.1.5-11
+.TT 6.1.5-12
+Normal values are allowed for real constants and variables.
+.sp
+.TT 6.1.7-14
+A reasonably large number of strings is allowed.
+.sp
+.TT 6.1.8-6
+No warning is given for possibly unclosed comments.
+.sp
+.TT 6.2.1-12
+.TT 6.2.1-13
+.TT 6.2.1-14
+.TT 6.2.1-15
+.TT 6.5.1-2
+Large lists of declarations are possible in each block.
+.sp
+.TT 6.4.3.2-6
+An 'array[integer] of' is not allowed.
+.sp
+.TT 6.4.3.2-7
+.TT 6.4.3.2-8
+Large values are allowed for arrays and indices.
+.sp
+.TT 6.4.3.3-14
+Large amounts of case-constant values are allowed in variants.
+.sp
+.TT 6.4.3.3-15
+Large amounts of record sections can appear in the fixed part of
+a record.
+.sp
+.TT 6.4.3.3-16
+Large amounts of variants are allowed in a record.
+.TT 6.4.3.4-4
+Size and speed of Warshall's algorithm depend on the
+implementation of EM:
+.IS
+.I1
+.br
+size: 122 bytes
+.br
+speed: 5.2 seconds
+.sp
+.I2
+.br
+size: 196 bytes
+.br
+speed: 0.7 seconds
+.IE
+.TT 6.5.3.2-3
+Deep nesting of array indices is allowed.
+.sp
+.TT 6.5.3.2-4
+.TT 6.5.3.2-5
+Arrays can have at least 8 dimensions.
+.sp
+.TT 6.6.1-8
+Deep static nesting of procedure is allowed.
+.sp
+.TT 6.6.3.1-6
+Large amounts of formal parameters are allowed.
+.sp
+.TT 6.6.5.3-12
+Dispose is fully implemented.
+.sp
+.TT 6.6.6.2-6
+Test sqrt(x): no errors.
+The error is within acceptable bounds.
+.in +5
+maxRE:~~2~**~-55.50
+.br
+rmsRE:~~2~**~-57.53
+.in -5
+.sp
+.TT 6.6.6.2-7
+Test arctan(x): may cause underflow or overflow errors.
+The error is within acceptable bounds.
+.in +5
+.br
+maxRE:~~2~**~-55.00
+.br
+rmsRE:~~2~**~-56.36
+.in -5
+.sp
+.TT 6.6.6.2-8
+Test exp(x): may cause underflow or overflow errors.
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-50.03
+.br
+rmsRE:~~2~**~-51.03
+.in -5
+.sp
+.TT 6.6.6.2-9
+Test sin(x): may cause underflow errors.
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-38.20
+.br
+rmsRE:~~2~**~-43.68
+.in -5
+.sp
+Test cos(x): may cause underflow errors.
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-41.33
+.br
+rmsRE:~~2~**~-46.62
+.in -5
+.sp
+.TT 6.6.6.2-10
+Test ln(x):
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-54.05
+.br
+rmsRE:~~2~**~-55.77
+.in -5
+.sp
+.TT 6.7.1-3
+.TT 6.7.1-4
+.TT 6.7.1-5
+Complex nested expressions are allowed.
+.sp
+.TT 6.7.2.2-14
+Test real division:
+The error is within acceptable bounds.
+.in +5
+maxRE:~~0
+.br
+rmsRE:~~0
+.in -5
+.sp
+.TT 6.7.2.2-15
+Operations of reals in the integer range are exact.
+.sp
+.TT 6.7.3-1
+.TT 6.8.3.2-1
+.TT 6.8.3.4-2
+.TT 6.8.3.5-15
+.TT 6.8.3.7-4
+.TT 6.8.3.8-3
+.TT 6.8.3.9-20
+.TT 6.8.3.10-7
+Static deep nesting of function calls,
+compound statements, if statements, case statements, repeat
+loops, while loops, for loops and with statements is possible.
+.sp
+.TT 6.8.3.2-2
+Large amounts of statements are allowed in a compound
+statement.
+.sp
+.TT 6.8.3.5-12
+The compiler requires case constants to be compatible with
+the case selector.
+.sp
+.TT 6.8.3.5-13
+.TT 6.8.3.5-14
+Large case statements are possible.
+.sp
+.TT 6.9-2
+Recursive IO on the same file is well-behaved.
+.sp
+.TT 6.9.1-6
+The reading of real values from a text file is done with
+sufficient accuracy.
+.in +5
+maxRE:~~2~**~-54.61
+.br
+rmsRE:~~2~**~-56.32
+.in -5
+.sp
+.TT 6.9.1-7
+.TT 6.9.2-2
+.TT 6.9.3-3
+.TT 6.9.4-2
+Read, readln, write and writeln may have large amounts of
+parameters.
+.sp
+.TT 6.9.1-8
+The loss of precision for reals written on a text file and read
+back is:
+.in +5
+maxRE:~~2~**~-53.95
+.br
+rmsRE:~~2~**~-55.90
+.in -5
+.sp
+.TT 6.9.3-2
+File IO buffers without trailing marker are correctly flushed.
+.sp
+.TT 6.9.3.5.2-2
+Reals are written with sufficient accuracy.
+.in +5
+maxRE:~~0
+.br
+rmsRE:~~0
+.in -5
+.IE
+.CH "Level 1 conformance tests"
+Number of test passed = 4
+.br
+Number of tests failed = 1
+.SH "Details of failed tests"
+.IS
+.TT 6.6.3.7-4
+An expression indicated by parenthesis whose
+value is a conformant array is not allowed.
+.IE
+.CH "Level 1 deviance tests"
+Number of deviations correctly detected = 4
+.br
+Number of tests not detecting deviations = 0
+.IE
+.CH "Level 1 error handling"
+The results depend on the EM implementation.
+.sp
+Number of errors correctly detected =
+.in +5
+.I1
+1
+.I2
+0
+.in -5
+Number of errors not detected =
+.in +5
+.I1
+0
+.I2
+1
+.in -5
+.SH "Details of errors not detected"
+.IS
+.TT 6.6.3.7-9
+.I2
+Subrange bounds are not checked.
+.IE
+.CH "Level 1 quality measurement"
+Number of tests run = 1
+.SH "Results of test"
+.IS
+.TT 6.6.3.7-10
+Large conformant arrays are allowed.
+.IE
+.CH "Extensions"
+Number of tests run = 3
+.SH Details of test failed
+.IS
+.TT 6.1.9-7
+The alternative relational operators are not allowed.
+.sp
+.TT 6.1.9-8
+The alternative symbols for colon, semicolon and assignment are
+not allowed.
+.sp
+.TT 6.8.3.5-16
+The otherwise selector in case statements is not allowed.
+.IE
+.CH "References"
+.ti -5
+[1]~~\
+A.S.Tanenbaum, E.G.Keizer, J.W.Stevenson, Hans van Staveren,
+"Description of a machine architecture for use with block structured
+languages",
+Informatica rapport IR-81.
+.ti -5
+[2]~~\
+ISO standard proposal ISO/TC97/SC5-N462, dated February 1979.
+The same proposal, in slightly modified form, can be found in:
+A.M.Addyman e.a., "A draft description of Pascal",
+Software, practice and experience, May 1979.
+An improved version, received March 1980,
+is followed as much as possible for the
+current ACK-Pascal.
+.ti -5
+[3]~~\
+B. A. Wichman and J du Croz,
+A program to calculate the GAMM measure, Computer Journal,
+November 1979.