ack/doc/toolkit.doc

.\" $Header$
.RP
.ND
.nr LL 78m
.tr ~
.ds as *
.TL
A Practical Tool Kit for Making Portable Compilers
.AU
Andrew S. Tanenbaum
Hans van Staveren
E. G. Keizer
Johan W. Stevenson
.AI
Mathematics Dept.
Vrije Universiteit
Amsterdam, The Netherlands
.AB
The Amsterdam Compiler Kit is an integrated collection of programs designed to
simplify the task of producing portable (cross) compilers and interpreters.
For each language to be compiled, a program (called a front end)
must be written to
translate the source program into a common intermediate code.
This intermediate code can be optimized and then either directly interpreted
or translated to the assembly language of the desired target machine.
The paper describes the various pieces of the tool kit in some detail, as well
as discussing the overall strategy.
.sp
Keywords: Compiler, Interpreter, Portability, Translator
.sp
CR Categories: 4.12, 4.13, 4.22
.sp 12
Author's present addresses:
  A.S. Tanenbaum, H. van Staveren, E.G. Keizer: Mathematics
     Dept., Vrije Universiteit, Postbus 7161, 1007 MC Amsterdam,
     The Netherlands

  J.W. Stevenson: NV Philips, S&I, T&M, Building TQ V5, Eindhoven,
     The Netherlands
.AE
.NH 1
Introduction
.PP
As more and more organizations acquire many micro- and minicomputers,
the need for portable compilers is becoming more and more acute.
The present situation, in which each hardware vendor provides its own
compilers -- each with its own deficiencies and extensions, and none of them
compatible -- leaves much to be desired.
The ideal situation would be an integrated system containing a family
of (cross) compilers, each compiler accepting a standard source language and
producing code for a wide variety of target machines.
Furthermore, the compilers should be compatible, so programs written in
one language can call procedures written in another language.
Finally, the system should be designed so as to make adding new languages
and new machines easy.
Such an integrated system is being built at the Vrije Universiteit.
Its design and implementation is the subject of this article.
.PP
Our compiler building system, which is called the "Amsterdam Compiler Kit"
(ACK), can be thought of as a "tool kit."
It consists of a number of parts that can be combined to form compilers
(and interpreters) with various properties.
The tool kit is based on an idea (UNCOL) that was first suggested in 1960
[7], but which never really caught on then.
The problem which UNCOL attempts to solve is how to make a compiler for
each of
.I N
languages on
.I M
different machines without having to write
.I N
x
.I M
programs.
.PP
As shown in Fig. 1, the UNCOL approach is to write
.I N
"front ends," each
of which translates one source language to a common intermediate language,
UNCOL (UNiversal Computer Oriented Language), and
.I M
"back ends," each
of which translates programs in UNCOL to a specific machine language.
Under these conditions, only
.I N
+
.I M
programs must be written to provide all
.I N
languages on all
.I M
machines, instead of
.I N
x
.I M
programs.
.PP
Various researchers have attempted to design a suitable UNCOL
[2,8], but none of these have become popular.
It is our belief that previous attempts have failed because they have been
too ambitious, that is, they have tried to cover all languages
and all machines using a single UNCOL.
Our approach is more modest: we cater only to algebraic languages
and machines whose memory consists of 8-bit bytes, each with its own address.
Typical languages that could be handled include
Ada, ALGOL 60, ALGOL 68, BASIC, C, FORTRAN,
Modula, Pascal, PL/I, PL/M, PLAIN, and RATFOR,
whereas COBOL, LISP, and SNOBOL would be less efficient.
Examples of machines that could be included are the Intel 8080 and 8086,
Motorola 6800, 6809, and 68000, Zilog Z80 and Z8000, DEC PDP-11 and VAX,
and IBM 370 but not the Burroughs 6700, CDC Cyber, or Univac 1108 (because
they are not byte-oriented).
With these restrictions, we believe the old UNCOL idea can be used as the
basis of a practical compiler-building system.
.KF
.sp 15P
.ce 1
Fig. 1.  The UNCOL model.
.sp
.KE
.NH 1
An Overview of the Amsterdam Compiler Kit
.PP
The tool kit consists of eight components:
.sp
  1. The preprocessor.
  2. The front ends.
  3. The peephole optimizer.
  4. The global optimizer.
  5. The back end.
  6. The target machine optimizer.
  7. The universal assembler/linker.
  8. The utility package.
.sp
.PP
A fully optimizing compiler,
depicted in Fig. 2, has seven cascaded phases.
Conceptually, each component reads an input file and writes a
transformed output file to be used as input to the next component.
In practice, some components may use temporary files to allow multiple
passes over the input or internal intermediate files.
.KF
.sp 12P
.ce 1
Fig. 2.  Structure of the Amsterdam Compiler Kit.
.sp
.KE
.PP
In the following paragraphs we will briefly describe each component.
After this overview, we will look at all of them again in more detail.
A program to be compiled is first fed into the (language independent)
preprocessor, which provides a simple macro facility,
and similar textual facilties.
The preprocessor's output is a legal program in one of the programming
languages supported, whereas the input is a program possibly augmented
with macros, etc.
.PP
This output goes into the appropriate front end, whose job it is to
produce intermediate code.
This intermediate code (our UNCOL) is the machine language for a simple
stack machine called EM (Encoding Machine).
A typical front end might build a parse tree from the input, and then
use the parse tree to generate EM code, which is similar to reverse Polish.
In order to perform this work, the front end has to maintain tables of
declared variables, labels, etc., determine where to place the
data structures in memory, and so on.
.PP
The EM code generated by the front end is fed into the peephole optimizer,
which scans it with a window of a few instructions, replacing certain
inefficient code sequences by better ones.
Such a search is important because EM contains instructions to handle
numerous important special cases efficiently
(e.g., incrementing a variable by 1).
It is our strategy to relieve the front ends of the burden of hunting for
special cases because there are many front ends and only one peephole
optimizer.
By handling the special cases in the peephole optimizer,
the front ends become simpler, easier to write and easier to maintain.
.PP
Following the peephole optimizer is a global optimizer [5], which
unlike the peephole optimizer, examines the program as a whole.
It builds a data flow graph to make possible a variety of
global optimizations,
among them, moving invariant code out of loops, avoiding redundant
computations, live/dead analysis and eliminating tail recursion.
Note that the output of the global optimizer is still EM code.
.PP
Next comes the back end, which differs from the front ends in a
fundamental way.
Each front end is a separate program, whereas the back end is a single
program that is driven by a machine dependent driving table.
The driving table for a specific machine tells how the EM code is mapped
onto the machine's assembly language.
Although a simple driving table might just macro expand each EM instruction
into a sequence of target machine instructions, a much more sophisticated
translation strategy is normally used, as described later.
For speed, the back end does not actually read in the driving table at run time.
Instead, the tables are compiled along with the back end in advance, resulting
in one binary program per machine.
.PP
The output of the back end is a program in the assembly language of some
particular machine.
The next component in the pipeline reads this program and performs peephole
optimization on it.
The optimizations performed here involve idiosyncracies
of the target machine that cannot be performed in the machine-independent
EM-to-EM peephole optimizer.
Typically these optimizations take advantage of special instructions or special
addressing modes.
.PP
The optimized target machine assembly code then goes into the final
component in the pipeline, the universal assembler/linker.
This program assembles the input to object format, extracting routines from
libraries and including them as needed.
.PP
The final component of the tool kit is the utility package, which contains
various test programs, interpreters for EM code,
EM libraries, conversion programs, and other aids for the implementer and
user.
.NH 1
The Preprocessor
.PP
The function of the preprocessor is to extend all the programming languages
by adding certain generally useful facilities to them in a uniform way.
One of these is a simple macro system, in which the user can give names to
character strings.
The names can be used in the program, with the knowledge that they will be
macro expanded prior to being input to the front end.
Macros can be used for named constants, expanding short "procedures"
in line, etc.
.PP
Another useful facility provided by the preprocessor is the ability to
include compile-time libraries.
On large projects, it is common to have all the declarations and definitions
gathered together in a few files that are textually included in the programs
by instructing the preprocessor to read them in, thus fooling the front end
into thinking that they were part of the source program.
.PP
A third feature of the preprocessor is conditional compilation.
The input program can be split up into labeled sections.
By setting flags, some of the sections can be deleted by the preprocessor,
thus allowing a family of slightly different programs to be conveniently stored
on a single file.
.NH 1
The Front Ends
.PP
A front end is a program that converts input in some source language to a
program in EM.
At present, front ends
exist or are in preparation for Pascal, C, and Plain, and are being considered
for Ada, ALGOL 68, FORTRAN 77, and Modula 2.
Each of the present front ends is independent of all the other ones,
although a general-purpose, table-driven front end is conceivable, provided
one can devise a way to express the semantics of the source language in the
driving tables.
The Pascal front end uses a top-down parsing algorithm (recursive descent),
whereas the C and Plain front ends are bottom-up.
.PP
All front ends, independent of the language being compiled,
produce a common intermediate code called EM, which is
the assembly language for a simple stack machine.
The EM machine is based on a memory architecture
containing a stack for local variables, a (static) data area for variables
declared in the outermost block and global to the whole program, and a heap
for dynamic data structures.
In some ways EM resembles P-code [6], but is more general, since it is
intended for a wider class of languages than just Pascal.
.PP
The EM instruction set has been described elsewhere
[9,10,11]
so we will only briefly summarize it here.
Instructions exist to:
.sp
  1. Load a variable or constant of some length onto the stack.
  2. Store the top item on the stack in memory.
  3. Add, subtract, multiply, divide, etc. the top two stack items.
  4. Examine the top one or two stack items and branch conditionally.
  5. Call procedures and return from them.
.sp
.PP
Loads and stores come in several variations, corresponding to the most common
programming language semantics, for example, constants, simple variables,
fields of a record, elements of an array, and so on.
Distinctions are also made between variables local to the current block
(i.e., stack frame), those in the outermost block (static storage), and those
at intermediate lexicographic levels, which are accessed by following the
static chain at run time.
.PP
All arithmetic instructions have a type (integer, unsigned, real,
pointer, or set) and an
operand length, which may either be explicit or may be popped from the stack
at run time.
Monadic branch instructions pop an item from the stack and branch if it is
less than zero, less than or equal to zero, etc.
Dyadic branch instructions pop two items, compare them, and branch accordingly.
.PP
In addition to these basic EM instructions, there is a collection of special
purpose instructions (e.g., to increment a local variable), which are typically
produced from the simple ones by the peephole optimizer.
Although the complete EM instruction set contains nearly 150 instructions,
only about 60 of them are really primitive; the rest are simply abbreviations
for commonly occurring EM instruction sequences.
.PP
Of particular interest is the way object sizes are parametrized.
The front ends allow the user to indicate how many bytes an integer, real, etc.
should occupy.
Given this information, the front ends can allocate memory, determining
the placement of variables within the stack frame.
Sizes for primitive types are restricted to 8, 16, 32, 64, etc. bits.
The front ends are also parametrized by the target machine's word length
and address size so they can tell, for example, how many "load" instructions
to generate to move a 32-bit integer.
In the examples used henceforth,
we will assume a 16-bit word size and 16-bit integers.
.PP
Since only byte-addressable target machines are permitted,
it is nearly
always possible to implement any requested sizes on any target machine.
For example, the designer of the back end tables for the Z80 should provide
code for 8-, 16-, and 32-bit arithmetic.
In our view, the Pascal, C, or Plain programmer specifies what lengths
are needed,
without reference to the target machine,
and the back end provides it.
This approach greatly enhances portability.
While it is true that doing all arithmetic using 32-bit integers on the Z80
will not be terribly fast, we feel that if that is what the programmer needs,
it should be possible to implement it.
.PP
Like all assembly languages, EM has not only machine instructions, but also
pseudoinstructions.
These are used to indicate the start and end of each procedure, allocate
and initialize storage for data, and similar functions.
One particularly important pseudoinstruction is the one that is used to
transmit information to the back end for optimization purposes.
It can be used to suggest variables that are good candidates to assign to
registers, delimit the scope of loops, indicate that certain variables
contain a useful value (next operation is a load) or not (next operation is
a store), and various other things.
.NH 1
The Peephole Optimizer
.PP
The peephole optimizer reads in unoptimized EM programs and writes out
optimized ones.
Both the input and output are expressed in a highly compact code, rather than
in ASCII, to reduce the i/o time, which would otherwise dominate the CPU
time.
The program itself is table driven, and is, by and large, ignorant of the
semantics of EM.
The knowledge of EM is contained in a
language- and machine-independent table consisting of about 400
pattern-replacement pairs.
We will briefly describe the kinds of optimizations it performs below;
a more complete discussion can be found in [9].
.PP
Each line in the driving table describes one optimization, consisting of a
pattern part and a replacement part.
The pattern part is a series of one or more EM instructions and a boolean
expression.
The replacement part is a series of EM instructions with operands.
A typical optimization might be:
.sp
  LOL  LOC  ADI  STL  ($1 = $4) and ($2 = 1) and ($3 = 2) ==> INL $1
.sp
where the text prior to the ==> symbol is the pattern and the text after it is
the replacement.
LOL loads a local variable onto the stack, LOC loads a constant onto the stack,
ADI is integer addition, and STL is store local.
The pattern specifies that four consecutive EM instructions are present, with
the indicated opcodes, and that furthermore the operand of the first
instruction (denoted by $1) and the fourth instruction (denoted by $4) are the
same, the constant pushed by LOC is 1, and the size of the integers added by
ADI is 2 bytes.
(EM instructions have at most one operand, so it is not necessary to specify
the operand number.)
Under these conditions, the four instructions can be replaced by a single INL
(increment local) instruction whose operand is equal to that of LOL.
.PP
Although the optimizations cover a wide range, the main ones
can be roughly divided into the following categories.
\fIConstant folding\fR
is used to evaluate constant expressions, such as 2*3~+~7 at
compile time instead of run time.
\fIStrength reduction\fR
is used to replace one operation, such as multiply, by
another, such as shift.
\fIReordering of expressions\fR
helps in cases like -K/5, which can be better
evaluated as K/-5, because the former requires
a division and a negation, whereas the latter requires only a division.
\fINull instructions\fR
include resetting the stack pointer after a call with 0 parameters,
offsetting zero bytes to access the
first element of a record, or jumping to the next instruction.
\fISpecial instructions\fR
are those like INL, which deal with common special cases
such as adding one to a variable or comparing something to zero.
\fIGroup moves\fR
are useful because a sequence
of consecutive moves can often be replaced with EM code
that allows the back end to generate a loop instead of in line code.
\fIDead code elimination\fR
is a technique for removing unreachable statements, possibly made unreachable
by previous optimizations.
\fIBranch chain compression\fR
can be applied when a branch instruction jumps to another branch instruction.
The first branch can jump directly to the final destination instead of
indirectly.
.PP
The last two optimizations logically belong in the global optimizer but are
in the local optimizer for historical reasons (meaning that the local
optimizer has been the only optimizer for many years and the optimizations were
easy to do there).
.NH 1
The Global Optimizer
.PP
In contrast to the peephole optimizer, which examines the EM code a few lines
at a time through a small window, the global optimizer examines the
program's large scale structure.
Three distinct types of optimizations can be found here:
.sp
  1. Interprocedural optimizations.
  2. Intraprocedural optimizations.
  3. Basic block optimizations.
.sp
We will now look at each of these in turn.
.PP
Interprocedural optimizations are those spanning procedure boundaries.
The most important one is deciding to expand procedures in line,
especially short procedures that occur in loops and pass several parameters.
If it takes more time or memory to pass the parameters than to do the work,
the program can be improved by eliminating the procedure.
The inverse optimization -- discovering long common code sequences and
turning them into a procedure -- is also possible, but much more difficult.
Like much of the global optimizer's work, the decision to make or not make
a certain program transformation is a heuristic one, based on knowledge of
how the back end works, how most target machines are organized, etc.
.PP
The heart of the global optimizer is its analysis of individual
procedures.
To perform this analysis, the optimizer must locate the basic blocks,
instruction sequences which can be entered only at the top and exited
only at the bottom.
It then constructs a data flow graph, with the basic blocks as nodes and
jumps between blocks as arcs.
.PP
From the data flow graph, many important properties of the program can be
discovered and exploited.
Chief among these is the presence of loops, indicated by cycles in the graph.
One important optimization is looking for code that can be moved outside the
loop, either prior to it or subsequent to it.
Such code motion saves execution time, although it does not save memory.
Unrolling loops is also possible and desirable in some cases.
.PP
Another area in which global analysis of loops is especially important is
in register allocation.
While it is true that EM does not have any registers to allocate,
the optimizer can easily collect information to allow the
back end to allocate registers wisely.
For example, the global optimizer can collect static frequency-of-use
and live/dead information about variables.
(A variable is dead at some point in the program if its current value is
not needed, i.e., the next reference to it overwrites it rather than
reading it; if the current value will eventually be used, the variable is
live.)
If two variables are never simultaneously live over some interval of code
(e.g., the body of a loop), they can be packed into a single variable,
which, if used often enough, may warrant being assigned to a register.
.PP
Many loops involve arrays: this leads to other optimizations.
If an array is accessed sequentially, with each iteration using the next
higher numbered element, code improvement is often possible.
Typically, a pointer to the bottom element of each array can be set up
prior to the loop.
Within the loop the element is accessed indirectly via the pointer, which is
also incremented by the element size on each iteration.
If the target machine has an autoincrement addressing mode and the pointer
is assigned to a register, an array access can often be done in a single
instruction.
.PP
Other intraprocedural optimizations include removing tail recursion
(last statement is a recursive call to the procedure itself),
topologically sorting the basic blocks to minimize the number of branch
instructions, and common subexpression recognition.
.PP
The third general class of optimizations done by the global optimizer is
improving the structure of a basic block.
For the most part these involve transforming arithmetic or boolean
expressions into forms that are likely to result in better target code.
As a simple example, A~+~B*C can be converted to B*C~+~A.
The latter can often
be handled by loading B into a register, multiplying the register by C, and
then adding in A, whereas the former may involve first putting A into a
temporary, depending on the details of the code generation table.
Another example of this kind of basic block optimization is transforming
-B~+~A~<~0 into the equivalent, but simpler, A~<~B.
.NH 1
The Back End
.PP
The back end reads a stream of EM instructions and generates assembly code
for the target machine.
Although the algorithm itself is machine independent, for each target
machine a machine dependent driving table must be supplied.
The driving table effectively defines the mapping of EM code to target code.
.PP
It will be convenient to think of the EM instructions being read as a
stream of tokens.
For didactic purposes, we will concentrate on two kinds of tokens:
those that load something onto the stack, and those that perform some operation
on the top one or two values on the stack.
The back end maintains at compile time a simulated stack whose behavior
mirrors what the stack of a hardware EM machine would do at run time.
If the current input token is a load instruction, a new entry is pushed onto
the simulated stack.
.PP
Consider, as an example, the EM code produced for the statement K~:=~I~+~7.
If K and I are
2-byte local variables, it will normally be LOL I; LOC 7; ADI~2; STL K.
Initially the simulated stack is empty.
After the first token has been read and processed, the simulated stack will
contain a stack token of type MEM with attributes telling that it is a local,
giving its address, etc.
After the second token has been read and processed, the top two tokens on the
simulated stack will be CON (constant) on top and MEM directly underneath it.
.PP
At this point the back end reads the ADI~2 token and
looks in the driving table to find a line or lines that define the
action to be taken for ADI~2.
For a typical multiregister machine, instructions will exist to add constants
to registers, but not to memory.
Consequently, the driving table will not contain an entry for ADI~2 with stack
configuration CON, MEM.
.PP
The back end is now faced with the problem of how to get from its
current stack configuration, CON, MEM, which is not listed, to one that is
listed.
The table will normally contain rules (which we call "coercions")
for converting between CON, REG, MEM, and similar tokens.
Therefore the back end attempts to "coerce" the stack into a configuration
that
.I is
present in the table.
A typical coercion rule might tell how to convert a MEM into
a REG, namely by performing the actions of allocating a
register and emitting code to move the memory word to that register.
Having transformed the compile-time stack into a configuration allowed for
ADI~2, the rule can be carried out.
A typical rule
for ADI~2 might have stack configuration REG, MEM
and would emit code to add the MEM to the REG, leaving the stack
with a single REG token instead of the REG and MEM tokens present before the
ADI~2.
.PP
In general, there will be more than one possible coercion path.
Assuming reasonable coercion rules for our example,
we might be able to convert
CON MEM into CON REG by loading the variable I into a register.
Alternatively, we could coerce CON to REG by loading the constant into a register.
The first coercion path does the add by first loading I into a register and
then adding 7 to it.
The second path first loads 7 into a register and then adds I to it.
On machines with a fast LOAD IMMEDIATE instruction for small constants
but no fast ADD IMMEDIATE, or vice
versa, one code sequence will be preferable to the other.
.PP
In fact, we actually have more choices than suggested above.
In both coercion paths a register must be allocated.
On many machines, not every register can be used in every operation, so the
choice may be important.
On some machines, for example, the operand of a multiply must be in an odd
register.
To summarize, from any state (i.e., token and stack configuration), a
variety of choices can be made, leading to a variety of different target
code sequences.
.PP
To decide which of the various code sequences to emit, the back end must have
some information about the time and memory cost of each one.
To provide this information, each rule in the driving table, including
coercions, specifies both the time and memory cost of the code emitted when
the rule is applied.
The back end can then simply try each of the legal possibilities (including all
the possible register allocations) to find the cheapest one.
.PP
This situation is similar to that found in a chess or other game-playing
program, in which from any state a finite number of moves can be made.
Just as in a chess program, the back end can look at all the "moves" that can
be made from each state reachable from the original state, and thus find the
sequence that gives the minimum cost to a depth of one.
More generally, the back end can evaluate all paths corresponding to accepting
the next
.I N
input tokens, find the cheapest one, and then make the first move along
that path, precisely the way a chess program would.
.PP
Since the back end is analogous to both a parser and a chess playing program,
some clarifying remarks may be helpful.
First, chess programs and the back end must do some look ahead, whereas the
parser for a well-designed grammar can usually suffice with one input token
because grammars are supposed to be unambiguous.
In contrast, many legal mappings
from a sequence of EM instructions to target code may exist.
Second, like a parser but unlike a chess program, the back end has perfect
information -- it does not have to contend with an unpredictable opponent's
moves.
Third, chess programs normally make a static evaluation of the board and
label the
.I nodes
of the tree with the resulting scores.
The back end, in contrast, associates costs with
.I arcs
(moves) rather than nodes (states).
However, the difference is not essential, since it could
also label each node with the cumulative cost from the root to that node.
.PP
As mentioned above, the cost field in the table contains
.I both
the time and memory costs for the code emitted.
It should be clear that the back end could use either one
or some linear combination of them as the scoring function for evaluating moves.
A user can instruct the compiler to optimize for time or for memory or
for, say,  0.3 x time + 0.7 x memory.
Thus the same compiler can provide a wide range of performance options to
the user.
The writer of the back end table can take advantage of this flexibility by
providing several code sequences with different tradeoffs for each EM
instruction (e.g., in line code vs. call to a run time routine).
.PP
In addition to the time-space tradeoffs, by specifying the depth of search
parameter,
.I N ,
the user can effectively also tradeoff compile time vs. object
code quality, for whatever code metric has been chosen.
In summary, by combining the properties of a parser and a game playing program,
it is possible to make a code generator that is table driven,
highly flexible, and has the ability to produce good code from a
stack machine intermediate code.
.NH 1
The Target Machine Optimizer
.PP
In the model of Fig 2., the peephole optimizer comes before the global
optimizer.
It may happen that the code produced by the global optimizer can also
be improved by another round of peephole optimization.
Conceivably, the system could have been designed to iterate peephole and
global optimizations until no more of either could be performed.
.PP
However, both of these optimizations are done on the machine independent
EM code.
Neither is able to take advantage of the peculiarities and idiosyncracies with
which most target machines are well endowed.
It is the function of the final
optimizer to do any (peephole) optimizations that still remain.
.PP
The algorithm used here is the same as in the EM peephole optimizer.
In fact, if it were not for the differences between EM syntax, which is
very restricted, and target assembly language syntax,
which is less so, precisely the same program could be used for both.
Nevertheless, the same ideas apply concerning patterns and replacements, so
our discussion of this optimizer will be restricted to one example.
.PP
To see what the target optimizer might do, consider the
PDP-11 instruction sequence sub #2,r0;  mov (r0),x.
First 2 is subtracted from register 0, then the word pointed to by it
is moved to x.
The PDP-11 happens to have an addressing mode to perform this sequence in
one instruction: mov -(r0),x.
Although it is conceivable that this instruction could be included in the
back end driving table for the PDP-11, it is awkward to do so because it
can occur in so many contexts.
It is much easier to catch things like this in a separate program.
.NH 1
The Universal Assembler/Linker
.PP
Although assembly languages for different machines may appear very different
at first glance, they have a surprisingly large intersection.
We have been able to construct an assembler/linker that is almost entirely
independent of the assembly language being processed.
To tailor the program to a specific assembly language, it is necessary to
supply a table giving the list of instructions, the bit patterns required for
each one, and the language syntax.
The machine independent part of the assembler/linker is then compiled with the
table to produce an assembler and linker for a particular target machine.
Experience has shown that writing the necessary table for a new machine can be
done in less than a week.
.PP
To enforce a modicum of uniformity, we have chosen to use a common set of
pseudoinstructions for all target machines.
They are used to initialize memory, allocate uninitialized memory, determine the
current segment, and similar functions found in most assemblers.
.PP
The assembler is also a linker.
After assembling a program, it checks to see if there are any
unsatisfied external references.
If so, it begins reading the libraries to find the necessary routines, including
them in the object file as it finds them.
This approach requires libraries to be maintained in assembly language form,
but eliminates the need for inventing a language to express relocatable
object programs in a machine independent way.
It also simplifies the assembler, since producing absolute object code is
easier than producing relocatable object code.
Finally, although assembly language libraries may be somewhat larger than
relocatable object module libraries, the loss in speed due to having more
input may be more than compensated for by not having to pass an intermediate
file between the assembler and linker.
.NH 1
The Utility Package
.PP
The utility package is a collection of programs designed to aid the
implementers of new front ends or new back ends.
The most useful ones are the test programs.
For example, one test set, EMTEST, systematically checks out a back end by
executing an ever larger subset of the EM instructions.
It starts out by testing LOC, LOL and a few of the other essential instructions.
If these appear to work, it then tries out new instructions one at a time,
adding them to the set of instructions "known" to work as they pass the tests.
.PP
Each instruction is tested with a variety of operands chosen from values
where problems can be expected.
For example, on target machines which have 16-bit index registers but only
allow 8-bit displacements, a fundamentally different algorithm may be needed
for accessing
the first few bytes of local variables and those with offsets of thousands.
The test programs have been carefully designed to thoroughly test all relevant
cases.
.PP
In addition to EMTEST, test programs in Pascal, C, and other languages are also
available.
A typical test is:
.sp
   i := 9; \fBif\fP i + 250 <> 259 \fBthen\fP error(16);
.sp
Like EMTEST, the other test programs systematically exercise all features of the
language being tested, and do so in a way that makes it possible to pinpoint
errors precisely.
While it has been said that testing can only demonstrate the presence of errors
and not their absence, our experience is that
the test programs have been invaluable in debugging new parts of the system
quickly.
.PP
Other utilities include programs to convert
the highly compact EM code produced by front ends to ASCII and vice versa,
programs to build various internal tables from human writable input formats,
a variety of libraries written in or compiled to EM to make them portable,
an EM assembler, and EM interpreters for various machines.
.PP
Interpreting the EM code instead of translating it to target machine language
is useful for several reasons.
First, the interpreters provide extensive run time diagnostics including
an option to list the original source program (in Pascal, C, etc.) with the
execution frequency or execution time for each source line printed in the
left margin.
Second, since an EM program is typically about one-third the size of a
compiled program, large programs can be executed on small machines.
Third, running the EM code directly makes it easier to pinpoint errors in
the EM output of front ends still being debugged.
.NH 1
Summary and Conclusions
.PP
The Amsterdam Compiler Kit is a tool kit for building
portable (cross) compilers and interpreters.
The main pieces of the kit are the front ends, which convert source programs
to EM code, optimizers, which improve the EM code, and back ends, which convert
the EM code to target assembly language.
The kit is highly modular, so writing one front end
(and its associated runtime routines)
is sufficient to implement
a new language on a dozen or more machines, and writing one back end table
and one universal assembler/linker table is all that is needed to bring up all
the previously implemented languages on a new machine.
In this manner, the contents, and hopefully the usefulness, of the toolkit
will increase in time.
.PP
We believe the principal lesson to be learned from our work is that the old
UNCOL idea is basically a sound way to produce compilers, provided suitable
restrictions are placed on the source languages and target machines.
We also believe that although compilers produced by this technology may not
be equal to the very best handcrafted compilers,
in terms of object code quality, they are certainly
competitive with many existing compilers.
However, when one factors in the cost of producing the compiler,
the possible slight loss in performance may be more than compensated for by the
large decrease in production cost.
As a consequence of our work and similar work by other researchers [1,3,4],
we expect integrated compiler building kits to become increasingly popular
in the near future.
.PP
The toolkit is now available for various computers running the
.UX
operating system.
For information, contact the authors.
.NH 1
References
.LP
.nr r 0 1
.in +4
.ti -4
\fB~\n+r.\fR Graham, S.L.
Table-Driven Code Generation.
.I "Computer~13" ,
8 (August 1980), 25-34.
.PP
A discussion of systematic ways to do code generation,
in particular, the idea of having a table with templates that match parts of
the parse tree and convert them into machine instructions.
.sp 2
.ti -4
\fB~\n+r.\fR Haddon, B.K., and Waite, W.M.
Experience with the Universal Intermediate Language Janus.
.I "Software Practice & Experience~8" ,
5 (Sept.-Oct. 1978), 601-616.
.PP
An intermediate language for use with ALGOL 68, Pascal, etc. is described.
The paper discusses some problems encountered and how they were dealt with.
.sp 2
.ti -4
\fB~\n+r.\fR Johnson, S.C.
A Portable Compiler: Theory and Practice.
.I "Ann. ACM Symp. Prin. Prog. Lang." ,
Jan. 1978.
.PP
A cogent discussion of the portable C compiler.
Particularly interesting are the author's thoughts on the value of
computer science theory.
.sp 2
.ti -4
\fB~\n+r.\fR Leverett, B.W., Cattell, R.G.G, Hobbs, S.O., Newcomer, J.M.,
Reiner, A.H., Schatz, B.R., and Wulf, W.A.
An Overview of the Production-Quality Compiler-Compiler Project.
.I Computer~13 ,
8 (August 1980), 38-49.
.PP
PQCC is a system for building compilers similar in concept but differing in
details from the Amsterdam Compiler Kit.
The paper describes the intermediate representation used and the code generation
strategy.
.sp 2
.ti -4
\fB~\n+r.\fR Lowry, E.S., and Medlock, C.W.
Object Code Optimization.
.I "Commun.~ACM~12",
(Jan. 1969), 13-22.
.PP
A classic paper on global object code optimization.
It covers data flow analysis, common subexpressions, code motion, register
allocation and other techniques.
.sp 2
.ti -4
\fB~\n+r.\fR Nori, K.V., Ammann, U., Jensen, K., Nageli, H.
The Pascal P Compiler Implementation Notes.
Eidgen. Tech. Hochschule, Zurich, 1975.
.PP
A description of the original P-code machine, used to transport the Pascal-P
compiler to new computers.
.sp 2
.ti -4
\fB~\n+r.\fR Steel, T.B., Jr. UNCOL: the Myth and the Fact. in
.I "Ann. Rev. Auto. Prog."
Goodman, R. (ed.), vol 2., (1960), 325-344.
.PP
An introduction to the UNCOL idea by its originator.
.sp 2
.ti -4
\fB~\n+r.\fR Steel, T.B., Jr.
A First Version of UNCOL.
.I "Proc. Western Joint Comp. Conf." ,
(1961), 371-377.
.PP
The first detailed proposal for an UNCOL.  By current standards it is a
primitive language, but it is interesting for its historical perspective.
.sp 2
.ti -4
\fB~\n+r.\fR Tanenbaum, A.S., van Staveren, H., and Stevenson, J.W.
Using Peephole Optimization on Intermediate Code.
.I "ACM Trans. Prog. Lang. and Sys. 3" ,
1 (Jan. 1982) pp. 21-36.
.PP
A detailed description of a table-driven peephole optimizer.
The driving table provides a list of patterns to match as well as the
replacement text to use for each successful match.
.sp 2
.ti -4
\fB\n+r.\fR Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
Description of an Experimental Machine Architecture for use with Block
Structured Languages.
Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
.PP
The defining document for EM.
.sp 2
.ti -4
\fB\n+r.\fR Tanenbaum, A.S.
Implications of Structured Programming for Machine Architecture.
.I "Comm. ACM~21" ,
3 (March 1978), 237-246.
.PP
The background and motivation for the design of EM.
This early version emphasized the idea of interpreting the intermediate
code (then called EM-1) rather than compiling it.