897 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			897 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.\" $Header$
 | 
						|
.RP
 | 
						|
.ND
 | 
						|
.nr LL 78m
 | 
						|
.tr ~
 | 
						|
.ds as *
 | 
						|
.TL
 | 
						|
A Practical Tool Kit for Making Portable Compilers
 | 
						|
.AU
 | 
						|
Andrew S. Tanenbaum
 | 
						|
Hans van Staveren
 | 
						|
E. G. Keizer
 | 
						|
Johan W. Stevenson
 | 
						|
.AI
 | 
						|
Mathematics Dept.
 | 
						|
Vrije Universiteit
 | 
						|
Amsterdam, The Netherlands
 | 
						|
.AB
 | 
						|
The Amsterdam Compiler Kit is an integrated collection of programs designed to
 | 
						|
simplify the task of producing portable (cross) compilers and interpreters.
 | 
						|
For each language to be compiled, a program (called a front end) 
 | 
						|
must be written to
 | 
						|
translate the source program into a common intermediate code.
 | 
						|
This intermediate code can be optimized and then either directly interpreted
 | 
						|
or translated to the assembly language of the desired target machine.
 | 
						|
The paper describes the various pieces of the tool kit in some detail, as well
 | 
						|
as discussing the overall strategy.
 | 
						|
.sp
 | 
						|
Keywords: Compiler, Interpreter, Portability, Translator
 | 
						|
.sp
 | 
						|
CR Categories: 4.12, 4.13, 4.22
 | 
						|
.sp 12
 | 
						|
Author's present addresses:
 | 
						|
  A.S. Tanenbaum, H. van Staveren, E.G. Keizer: Mathematics
 | 
						|
     Dept., Vrije Universiteit, Postbus 7161, 1007 MC Amsterdam,
 | 
						|
     The Netherlands
 | 
						|
 | 
						|
  J.W. Stevenson: NV Philips, S&I, T&M, Building TQ V5, Eindhoven,
 | 
						|
     The Netherlands
 | 
						|
.AE
 | 
						|
.NH 1
 | 
						|
Introduction
 | 
						|
.PP
 | 
						|
As more and more organizations acquire many micro- and minicomputers,
 | 
						|
the need for portable compilers is becoming more and more acute.
 | 
						|
The present situation, in which each hardware vendor provides its own
 | 
						|
compilers -- each with its own deficiencies and extensions, and none of them
 | 
						|
compatible -- leaves much to be desired.
 | 
						|
The ideal situation would be an integrated system containing a family
 | 
						|
of (cross) compilers, each compiler accepting a standard source language and
 | 
						|
producing code for a wide variety of target machines.
 | 
						|
Furthermore, the compilers should be compatible, so programs written in
 | 
						|
one language can call procedures written in another language.
 | 
						|
Finally, the system should be designed so as to make adding new languages
 | 
						|
and new machines easy.
 | 
						|
Such an integrated system is being built at the Vrije Universiteit.
 | 
						|
Its design and implementation is the subject of this article.
 | 
						|
.PP
 | 
						|
Our compiler building system, which is called the "Amsterdam Compiler Kit"
 | 
						|
(ACK), can be thought of as a "tool kit."
 | 
						|
It consists of a number of parts that can be combined to form compilers
 | 
						|
(and interpreters) with various properties.
 | 
						|
The tool kit is based on an idea (UNCOL) that was first suggested in 1960
 | 
						|
[7], but which never really caught on then.
 | 
						|
The problem which UNCOL attempts to solve is how to make a compiler for
 | 
						|
each of
 | 
						|
.I N
 | 
						|
languages on
 | 
						|
.I M
 | 
						|
different machines without having to write 
 | 
						|
.I N
 | 
						|
x
 | 
						|
.I M
 | 
						|
programs.
 | 
						|
.PP
 | 
						|
As shown in Fig. 1, the UNCOL approach is to write
 | 
						|
.I N
 | 
						|
"front ends," each
 | 
						|
of which translates one source language to a common intermediate language,
 | 
						|
UNCOL (UNiversal Computer Oriented Language), and
 | 
						|
.I M
 | 
						|
"back ends," each
 | 
						|
of which translates programs in UNCOL to a specific machine language.
 | 
						|
Under these conditions, only
 | 
						|
.I N
 | 
						|
+
 | 
						|
.I M
 | 
						|
programs must be written to provide all
 | 
						|
.I N
 | 
						|
languages on all
 | 
						|
.I M
 | 
						|
machines, instead of 
 | 
						|
.I N
 | 
						|
x
 | 
						|
.I M
 | 
						|
programs.
 | 
						|
.PP
 | 
						|
Various researchers have attempted to design a suitable UNCOL
 | 
						|
[2,8], but none of these have become popular.
 | 
						|
It is our belief that previous attempts have failed because they have been
 | 
						|
too ambitious, that is, they have tried to cover all languages
 | 
						|
and all machines using a single UNCOL.
 | 
						|
Our approach is more modest: we cater only to algebraic languages
 | 
						|
and machines whose memory consists of 8-bit bytes, each with its own address.
 | 
						|
Typical languages that could be handled include
 | 
						|
Ada, ALGOL 60, ALGOL 68, BASIC, C, FORTRAN,
 | 
						|
Modula, Pascal, PL/I, PL/M, PLAIN, and RATFOR,
 | 
						|
whereas COBOL, LISP, and SNOBOL would be less efficient.
 | 
						|
Examples of machines that could be included are the Intel 8080 and 8086,
 | 
						|
Motorola 6800, 6809, and 68000, Zilog Z80 and Z8000, DEC PDP-11 and VAX,
 | 
						|
and IBM 370 but not the Burroughs 6700, CDC Cyber, or Univac 1108 (because
 | 
						|
they are not byte-oriented).
 | 
						|
With these restrictions, we believe the old UNCOL idea can be used as the
 | 
						|
basis of a practical compiler-building system.
 | 
						|
.KF
 | 
						|
.sp 15P
 | 
						|
.ce 1
 | 
						|
Fig. 1.  The UNCOL model.
 | 
						|
.sp
 | 
						|
.KE
 | 
						|
.NH 1
 | 
						|
An Overview of the Amsterdam Compiler Kit
 | 
						|
.PP
 | 
						|
The tool kit consists of eight components:
 | 
						|
.sp
 | 
						|
  1. The preprocessor.
 | 
						|
  2. The front ends.
 | 
						|
  3. The peephole optimizer.
 | 
						|
  4. The global optimizer.
 | 
						|
  5. The back end.
 | 
						|
  6. The target machine optimizer.
 | 
						|
  7. The universal assembler/linker.
 | 
						|
  8. The utility package.
 | 
						|
.sp
 | 
						|
.PP
 | 
						|
A fully optimizing compiler,
 | 
						|
depicted in Fig. 2, has seven cascaded phases.
 | 
						|
Conceptually, each component reads an input file and writes a
 | 
						|
transformed output file to be used as input to the next component.
 | 
						|
In practice, some components may use temporary files to allow multiple
 | 
						|
passes over the input or internal intermediate files.
 | 
						|
.KF
 | 
						|
.sp 12P
 | 
						|
.ce 1
 | 
						|
Fig. 2.  Structure of the Amsterdam Compiler Kit.
 | 
						|
.sp
 | 
						|
.KE
 | 
						|
.PP
 | 
						|
In the following paragraphs we will briefly describe each component.
 | 
						|
After this overview, we will look at all of them again in more detail.
 | 
						|
A program to be compiled is first fed into the (language independent)
 | 
						|
preprocessor, which provides a simple macro facility,
 | 
						|
and similar textual facilties.
 | 
						|
The preprocessor's output is a legal program in one of the programming
 | 
						|
languages supported, whereas the input is a program possibly augmented
 | 
						|
with macros, etc.
 | 
						|
.PP
 | 
						|
This output goes into the appropriate front end, whose job it is to
 | 
						|
produce intermediate code.
 | 
						|
This intermediate code (our UNCOL) is the machine language for a simple
 | 
						|
stack machine called EM (Encoding Machine).
 | 
						|
A typical front end might build a parse tree from the input, and then
 | 
						|
use the parse tree to generate EM code, which is similar to reverse Polish.
 | 
						|
In order to perform this work, the front end has to maintain tables of
 | 
						|
declared variables, labels, etc., determine where to place the
 | 
						|
data structures in memory, and so on.
 | 
						|
.PP
 | 
						|
The EM code generated by the front end is fed into the peephole optimizer,
 | 
						|
which scans it with a window of a few instructions, replacing certain
 | 
						|
inefficient code sequences by better ones.
 | 
						|
Such a search is important because EM contains instructions to handle
 | 
						|
numerous important special cases efficiently
 | 
						|
(e.g., incrementing a variable by 1).
 | 
						|
It is our strategy to relieve the front ends of the burden of hunting for
 | 
						|
special cases because there are many front ends and only one peephole
 | 
						|
optimizer.
 | 
						|
By handling the special cases in the peephole optimizer, 
 | 
						|
the front ends become simpler, easier to write and easier to maintain.
 | 
						|
.PP
 | 
						|
Following the peephole optimizer is a global optimizer [5], which
 | 
						|
unlike the peephole optimizer, examines the program as a whole.
 | 
						|
It builds a data flow graph to make possible a variety of 
 | 
						|
global optimizations,
 | 
						|
among them, moving invariant code out of loops, avoiding redundant
 | 
						|
computations, live/dead analysis and eliminating tail recursion.
 | 
						|
Note that the output of the global optimizer is still EM code.
 | 
						|
.PP
 | 
						|
Next comes the back end, which differs from the front ends in a
 | 
						|
fundamental way.
 | 
						|
Each front end is a separate program, whereas the back end is a single
 | 
						|
program that is driven by a machine dependent driving table.
 | 
						|
The driving table for a specific machine tells how the EM code is mapped
 | 
						|
onto the machine's assembly language.
 | 
						|
Although a simple driving table might just macro expand each EM instruction
 | 
						|
into a sequence of target machine instructions, a much more sophisticated
 | 
						|
translation strategy is normally used, as described later.
 | 
						|
For speed, the back end does not actually read in the driving table at run time.
 | 
						|
Instead, the tables are compiled along with the back end in advance, resulting
 | 
						|
in one binary program per machine.
 | 
						|
.PP
 | 
						|
The output of the back end is a program in the assembly language of some
 | 
						|
particular machine.
 | 
						|
The next component in the pipeline reads this program and performs peephole
 | 
						|
optimization on it.
 | 
						|
The optimizations performed here involve idiosyncracies
 | 
						|
of the target machine that cannot be performed in the machine-independent
 | 
						|
EM-to-EM peephole optimizer.
 | 
						|
Typically these optimizations take advantage of special instructions or special
 | 
						|
addressing modes.
 | 
						|
.PP
 | 
						|
The optimized target machine assembly code then goes into the final
 | 
						|
component in the pipeline, the universal assembler/linker.
 | 
						|
This program assembles the input to object format, extracting routines from
 | 
						|
libraries and including them as needed.
 | 
						|
.PP
 | 
						|
The final component of the tool kit is the utility package, which contains
 | 
						|
various test programs, interpreters for EM code, 
 | 
						|
EM libraries, conversion programs, and other aids for the implementer and
 | 
						|
user.
 | 
						|
.NH 1
 | 
						|
The Preprocessor
 | 
						|
.PP
 | 
						|
The function of the preprocessor is to extend all the programming languages
 | 
						|
by adding certain generally useful facilities to them in a uniform way.
 | 
						|
One of these is a simple macro system, in which the user can give names to
 | 
						|
character strings.
 | 
						|
The names can be used in the program, with the knowledge that they will be
 | 
						|
macro expanded prior to being input to the front end.
 | 
						|
Macros can be used for named constants, expanding short "procedures"
 | 
						|
in line, etc.
 | 
						|
.PP
 | 
						|
Another useful facility provided by the preprocessor is the ability to
 | 
						|
include compile-time libraries.
 | 
						|
On large projects, it is common to have all the declarations and definitions
 | 
						|
gathered together in a few files that are textually included in the programs
 | 
						|
by instructing the preprocessor to read them in, thus fooling the front end
 | 
						|
into thinking that they were part of the source program.
 | 
						|
.PP
 | 
						|
A third feature of the preprocessor is conditional compilation.
 | 
						|
The input program can be split up into labeled sections.
 | 
						|
By setting flags, some of the sections can be deleted by the preprocessor,
 | 
						|
thus allowing a family of slightly different programs to be conveniently stored
 | 
						|
on a single file.
 | 
						|
.NH 1
 | 
						|
The Front Ends
 | 
						|
.PP
 | 
						|
A front end is a program that converts input in some source language to a
 | 
						|
program in EM.
 | 
						|
At present, front ends 
 | 
						|
exist or are in preparation for Pascal, C, and Plain, and are being considered
 | 
						|
for Ada, ALGOL 68, FORTRAN 77, and Modula 2.
 | 
						|
Each of the present front ends is independent of all the other ones,
 | 
						|
although a general-purpose, table-driven front end is conceivable, provided
 | 
						|
one can devise a way to express the semantics of the source language in the
 | 
						|
driving tables.
 | 
						|
The Pascal front end uses a top-down parsing algorithm (recursive descent),
 | 
						|
whereas the C and Plain front ends are bottom-up.
 | 
						|
.PP
 | 
						|
All front ends, independent of the language being compiled,
 | 
						|
produce a common intermediate code called EM, which is
 | 
						|
the assembly language for a simple stack machine.
 | 
						|
The EM machine is based on a memory architecture
 | 
						|
containing a stack for local variables, a (static) data area for variables
 | 
						|
declared in the outermost block and global to the whole program, and a heap
 | 
						|
for dynamic data structures.
 | 
						|
In some ways EM resembles P-code [6], but is more general, since it is
 | 
						|
intended for a wider class of languages than just Pascal.
 | 
						|
.PP
 | 
						|
The EM instruction set has been described elsewhere
 | 
						|
[9,10,11]
 | 
						|
so we will only briefly summarize it here.
 | 
						|
Instructions exist to:
 | 
						|
.sp
 | 
						|
  1. Load a variable or constant of some length onto the stack.
 | 
						|
  2. Store the top item on the stack in memory.
 | 
						|
  3. Add, subtract, multiply, divide, etc. the top two stack items.
 | 
						|
  4. Examine the top one or two stack items and branch conditionally.
 | 
						|
  5. Call procedures and return from them.
 | 
						|
.sp
 | 
						|
.PP
 | 
						|
Loads and stores come in several variations, corresponding to the most common
 | 
						|
programming language semantics, for example, constants, simple variables,
 | 
						|
fields of a record, elements of an array, and so on.
 | 
						|
Distinctions are also made between variables local to the current block
 | 
						|
(i.e., stack frame), those in the outermost block (static storage), and those
 | 
						|
at intermediate lexicographic levels, which are accessed by following the
 | 
						|
static chain at run time.
 | 
						|
.PP
 | 
						|
All arithmetic instructions have a type (integer, unsigned, real,
 | 
						|
pointer, or set) and an
 | 
						|
operand length, which may either be explicit or may be popped from the stack
 | 
						|
at run time.
 | 
						|
Monadic branch instructions pop an item from the stack and branch if it is
 | 
						|
less than zero, less than or equal to zero, etc.
 | 
						|
Dyadic branch instructions pop two items, compare them, and branch accordingly.
 | 
						|
.PP
 | 
						|
In addition to these basic EM instructions, there is a collection of special
 | 
						|
purpose instructions (e.g., to increment a local variable), which are typically
 | 
						|
produced from the simple ones by the peephole optimizer.
 | 
						|
Although the complete EM instruction set contains nearly 150 instructions,
 | 
						|
only about 60 of them are really primitive; the rest are simply abbreviations
 | 
						|
for commonly occurring EM instruction sequences.
 | 
						|
.PP
 | 
						|
Of particular interest is the way object sizes are parametrized.
 | 
						|
The front ends allow the user to indicate how many bytes an integer, real, etc.
 | 
						|
should occupy.
 | 
						|
Given this information, the front ends can allocate memory, determining 
 | 
						|
the placement of variables within the stack frame.
 | 
						|
Sizes for primitive types are restricted to 8, 16, 32, 64, etc. bits.
 | 
						|
The front ends are also parametrized by the target machine's word length
 | 
						|
and address size so they can tell, for example, how many "load" instructions
 | 
						|
to generate to move a 32-bit integer.
 | 
						|
In the examples used henceforth,
 | 
						|
we will assume a 16-bit word size and 16-bit integers.
 | 
						|
.PP
 | 
						|
Since only byte-addressable target machines are permitted,
 | 
						|
it is nearly
 | 
						|
always possible to implement any requested sizes on any target machine.
 | 
						|
For example, the designer of the back end tables for the Z80 should provide
 | 
						|
code for 8-, 16-, and 32-bit arithmetic.
 | 
						|
In our view, the Pascal, C, or Plain programmer specifies what lengths 
 | 
						|
are needed,
 | 
						|
without reference to the target machine,
 | 
						|
and the back end provides it.
 | 
						|
This approach greatly enhances portability.
 | 
						|
While it is true that doing all arithmetic using 32-bit integers on the Z80
 | 
						|
will not be terribly fast, we feel that if that is what the programmer needs,
 | 
						|
it should be possible to implement it.
 | 
						|
.PP
 | 
						|
Like all assembly languages, EM has not only machine instructions, but also
 | 
						|
pseudoinstructions.
 | 
						|
These are used to indicate the start and end of each procedure, allocate
 | 
						|
and initialize storage for data, and similar functions.
 | 
						|
One particularly important pseudoinstruction is the one that is used to
 | 
						|
transmit information to the back end for optimization purposes.
 | 
						|
It can be used to suggest variables that are good candidates to assign to
 | 
						|
registers, delimit the scope of loops, indicate that certain variables 
 | 
						|
contain a useful value (next operation is a load) or not (next operation is
 | 
						|
a store), and various other things.
 | 
						|
.NH 1
 | 
						|
The Peephole Optimizer
 | 
						|
.PP
 | 
						|
The peephole optimizer reads in unoptimized EM programs and writes out
 | 
						|
optimized ones.
 | 
						|
Both the input and output are expressed in a highly compact code, rather than
 | 
						|
in ASCII, to reduce the i/o time, which would otherwise dominate the CPU
 | 
						|
time.
 | 
						|
The program itself is table driven, and is, by and large, ignorant of the
 | 
						|
semantics of EM.
 | 
						|
The knowledge of EM is contained in a
 | 
						|
language- and machine-independent table consisting of about 400
 | 
						|
pattern-replacement pairs.
 | 
						|
We will briefly describe the kinds of optimizations it performs below;
 | 
						|
a more complete discussion can be found in [9].
 | 
						|
.PP
 | 
						|
Each line in the driving table describes one optimization, consisting of a
 | 
						|
pattern part and a replacement part.
 | 
						|
The pattern part is a series of one or more EM instructions and a boolean
 | 
						|
expression.
 | 
						|
The replacement part is a series of EM instructions with operands.
 | 
						|
A typical optimization might be:
 | 
						|
.sp
 | 
						|
  LOL  LOC  ADI  STL  ($1 = $4) and ($2 = 1) and ($3 = 2) ==> INL $1
 | 
						|
.sp
 | 
						|
where the text prior to the ==> symbol is the pattern and the text after it is
 | 
						|
the replacement.
 | 
						|
LOL loads a local variable onto the stack, LOC loads a constant onto the stack,
 | 
						|
ADI is integer addition, and STL is store local.
 | 
						|
The pattern specifies that four consecutive EM instructions are present, with
 | 
						|
the indicated opcodes, and that furthermore the operand of the first 
 | 
						|
instruction (denoted by $1) and the fourth instruction (denoted by $4) are the
 | 
						|
same, the constant pushed by LOC is 1, and the size of the integers added by
 | 
						|
ADI is 2 bytes.
 | 
						|
(EM instructions have at most one operand, so it is not necessary to specify
 | 
						|
the operand number.)
 | 
						|
Under these conditions, the four instructions can be replaced by a single INL
 | 
						|
(increment local) instruction whose operand is equal to that of LOL.
 | 
						|
.PP
 | 
						|
Although the optimizations cover a wide range, the main ones
 | 
						|
can be roughly divided into the following categories.
 | 
						|
\fIConstant folding\fR
 | 
						|
is used to evaluate constant expressions, such as 2*3~+~7 at
 | 
						|
compile time instead of run time.
 | 
						|
\fIStrength reduction\fR
 | 
						|
is used to replace one operation, such as multiply, by
 | 
						|
another, such as shift.
 | 
						|
\fIReordering of expressions\fR
 | 
						|
helps in cases like -K/5, which can be better
 | 
						|
evaluated as K/-5, because the former requires
 | 
						|
a division and a negation, whereas the latter requires only a division.
 | 
						|
\fINull instructions\fR
 | 
						|
include resetting the stack pointer after a call with 0 parameters,
 | 
						|
offsetting zero bytes to access the
 | 
						|
first element of a record, or jumping to the next instruction.
 | 
						|
\fISpecial instructions\fR
 | 
						|
are those like INL, which deal with common special cases
 | 
						|
such as adding one to a variable or comparing something to zero.
 | 
						|
\fIGroup moves\fR
 | 
						|
are useful because a sequence
 | 
						|
of consecutive moves can often be replaced with EM code
 | 
						|
that allows the back end to generate a loop instead of in line code.
 | 
						|
\fIDead code elimination\fR
 | 
						|
is a technique for removing unreachable statements, possibly made unreachable
 | 
						|
by previous optimizations.
 | 
						|
\fIBranch chain compression\fR
 | 
						|
can be applied when a branch instruction jumps to another branch instruction.
 | 
						|
The first branch can jump directly to the final destination instead of
 | 
						|
indirectly.
 | 
						|
.PP
 | 
						|
The last two optimizations logically belong in the global optimizer but are
 | 
						|
in the local optimizer for historical reasons (meaning that the local
 | 
						|
optimizer has been the only optimizer for many years and the optimizations were
 | 
						|
easy to do there).
 | 
						|
.NH 1
 | 
						|
The Global Optimizer
 | 
						|
.PP
 | 
						|
In contrast to the peephole optimizer, which examines the EM code a few lines
 | 
						|
at a time through a small window, the global optimizer examines the 
 | 
						|
program's large scale structure.
 | 
						|
Three distinct types of optimizations can be found here:
 | 
						|
.sp
 | 
						|
  1. Interprocedural optimizations.
 | 
						|
  2. Intraprocedural optimizations.
 | 
						|
  3. Basic block optimizations.
 | 
						|
.sp
 | 
						|
We will now look at each of these in turn.
 | 
						|
.PP
 | 
						|
Interprocedural optimizations are those spanning procedure boundaries.
 | 
						|
The most important one is deciding to expand procedures in line,
 | 
						|
especially short procedures that occur in loops and pass several parameters.
 | 
						|
If it takes more time or memory to pass the parameters than to do the work,
 | 
						|
the program can be improved by eliminating the procedure.
 | 
						|
The inverse optimization -- discovering long common code sequences and
 | 
						|
turning them into a procedure -- is also possible, but much more difficult.
 | 
						|
Like much of the global optimizer's work, the decision to make or not make
 | 
						|
a certain program transformation is a heuristic one, based on knowledge of
 | 
						|
how the back end works, how most target machines are organized, etc.
 | 
						|
.PP
 | 
						|
The heart of the global optimizer is its analysis of individual
 | 
						|
procedures.
 | 
						|
To perform this analysis, the optimizer must locate the basic blocks,
 | 
						|
instruction sequences which can be entered only at the top and exited
 | 
						|
only at the bottom.
 | 
						|
It then constructs a data flow graph, with the basic blocks as nodes and
 | 
						|
jumps between blocks as arcs.
 | 
						|
.PP
 | 
						|
From the data flow graph, many important properties of the program can be
 | 
						|
discovered and exploited.
 | 
						|
Chief among these is the presence of loops, indicated by cycles in the graph.
 | 
						|
One important optimization is looking for code that can be moved outside the
 | 
						|
loop, either prior to it or subsequent to it.
 | 
						|
Such code motion saves execution time, although it does not save memory.
 | 
						|
Unrolling loops is also possible and desirable in some cases.
 | 
						|
.PP
 | 
						|
Another area in which global analysis of loops is especially important is
 | 
						|
in register allocation. 
 | 
						|
While it is true that EM does not have any registers to allocate,
 | 
						|
the optimizer can easily collect information to allow the
 | 
						|
back end to allocate registers wisely.
 | 
						|
For example, the global optimizer can collect static frequency-of-use
 | 
						|
and live/dead information about variables.
 | 
						|
(A variable is dead at some point in the program if its current value is
 | 
						|
not needed, i.e., the next reference to it overwrites it rather than
 | 
						|
reading it; if the current value will eventually be used, the variable is
 | 
						|
live.)
 | 
						|
If two variables are never simultaneously live over some interval of code
 | 
						|
(e.g., the body of a loop), they can be packed into a single variable,
 | 
						|
which, if used often enough, may warrant being assigned to a register.
 | 
						|
.PP
 | 
						|
Many loops involve arrays: this leads to other optimizations.
 | 
						|
If an array is accessed sequentially, with each iteration using the next
 | 
						|
higher numbered element, code improvement is often possible.
 | 
						|
Typically, a pointer to the bottom element of each array can be set up
 | 
						|
prior to the loop.
 | 
						|
Within the loop the element is accessed indirectly via the pointer, which is
 | 
						|
also incremented by the element size on each iteration.
 | 
						|
If the target machine has an autoincrement addressing mode and the pointer
 | 
						|
is assigned to a register, an array access can often be done in a single
 | 
						|
instruction.
 | 
						|
.PP
 | 
						|
Other intraprocedural optimizations include removing tail recursion
 | 
						|
(last statement is a recursive call to the procedure itself),
 | 
						|
topologically sorting the basic blocks to minimize the number of branch
 | 
						|
instructions, and common subexpression recognition.
 | 
						|
.PP
 | 
						|
The third general class of optimizations done by the global optimizer is
 | 
						|
improving the structure of a basic block.
 | 
						|
For the most part these involve transforming arithmetic or boolean
 | 
						|
expressions into forms that are likely to result in better target code.
 | 
						|
As a simple example, A~+~B*C can be converted to B*C~+~A.
 | 
						|
The latter can often
 | 
						|
be handled by loading B into a register, multiplying the register by C, and
 | 
						|
then adding in A, whereas the former may involve first putting A into a
 | 
						|
temporary, depending on the details of the code generation table.
 | 
						|
Another example of this kind of basic block optimization is transforming
 | 
						|
-B~+~A~<~0 into the equivalent, but simpler, A~<~B.
 | 
						|
.NH 1
 | 
						|
The Back End
 | 
						|
.PP
 | 
						|
The back end reads a stream of EM instructions and generates assembly code
 | 
						|
for the target machine.
 | 
						|
Although the algorithm itself is machine independent, for each target
 | 
						|
machine a machine dependent driving table must be supplied.
 | 
						|
The driving table effectively defines the mapping of EM code to target code.
 | 
						|
.PP
 | 
						|
It will be convenient to think of the EM instructions being read as a
 | 
						|
stream of tokens.
 | 
						|
For didactic purposes, we will concentrate on two kinds of tokens:
 | 
						|
those that load something onto the stack, and those that perform some operation
 | 
						|
on the top one or two values on the stack.
 | 
						|
The back end maintains at compile time a simulated stack whose behavior
 | 
						|
mirrors what the stack of a hardware EM machine would do at run time.
 | 
						|
If the current input token is a load instruction, a new entry is pushed onto
 | 
						|
the simulated stack.
 | 
						|
.PP
 | 
						|
Consider, as an example, the EM code produced for the statement K~:=~I~+~7.
 | 
						|
If K and I are
 | 
						|
2-byte local variables, it will normally be LOL I; LOC 7; ADI~2; STL K.
 | 
						|
Initially the simulated stack is empty.
 | 
						|
After the first token has been read and processed, the simulated stack will
 | 
						|
contain a stack token of type MEM with attributes telling that it is a local,
 | 
						|
giving its address, etc.
 | 
						|
After the second token has been read and processed, the top two tokens on the
 | 
						|
simulated stack will be CON (constant) on top and MEM directly underneath it.
 | 
						|
.PP
 | 
						|
At this point the back end reads the ADI~2 token and
 | 
						|
looks in the driving table to find a line or lines that define the
 | 
						|
action to be taken for ADI~2.
 | 
						|
For a typical multiregister machine, instructions will exist to add constants
 | 
						|
to registers, but not to memory.
 | 
						|
Consequently, the driving table will not contain an entry for ADI~2 with stack
 | 
						|
configuration CON, MEM.
 | 
						|
.PP
 | 
						|
The back end is now faced with the problem of how to get from its
 | 
						|
current stack configuration, CON, MEM, which is not listed, to one that is
 | 
						|
listed.
 | 
						|
The table will normally contain rules (which we call "coercions")
 | 
						|
for converting between CON, REG, MEM, and similar tokens.
 | 
						|
Therefore the back end attempts to "coerce" the stack into a configuration
 | 
						|
that
 | 
						|
.I is
 | 
						|
present in the table.
 | 
						|
A typical coercion rule might tell how to convert a MEM into
 | 
						|
a REG, namely by performing the actions of allocating a
 | 
						|
register and emitting code to move the memory word to that register.
 | 
						|
Having transformed the compile-time stack into a configuration allowed for
 | 
						|
ADI~2, the rule can be carried out.
 | 
						|
A typical rule 
 | 
						|
for ADI~2 might have stack configuration REG, MEM
 | 
						|
and would emit code to add the MEM to the REG, leaving the stack
 | 
						|
with a single REG token instead of the REG and MEM tokens present before the
 | 
						|
ADI~2.
 | 
						|
.PP
 | 
						|
In general, there will be more than one possible coercion path.
 | 
						|
Assuming reasonable coercion rules for our example,
 | 
						|
we might be able to convert
 | 
						|
CON MEM into CON REG by loading the variable I into a register.
 | 
						|
Alternatively, we could coerce CON to REG by loading the constant into a register.
 | 
						|
The first coercion path does the add by first loading I into a register and
 | 
						|
then adding 7 to it.
 | 
						|
The second path first loads 7 into a register and then adds I to it.
 | 
						|
On machines with a fast LOAD IMMEDIATE instruction for small constants
 | 
						|
but no fast ADD IMMEDIATE, or vice
 | 
						|
versa, one code sequence will be preferable to the other.
 | 
						|
.PP
 | 
						|
In fact, we actually have more choices than suggested above.
 | 
						|
In both coercion paths a register must be allocated.
 | 
						|
On many machines, not every register can be used in every operation, so the
 | 
						|
choice may be important.
 | 
						|
On some machines, for example, the operand of a multiply must be in an odd
 | 
						|
register.
 | 
						|
To summarize, from any state (i.e., token and stack configuration), a
 | 
						|
variety of choices can be made, leading to a variety of different target
 | 
						|
code sequences.
 | 
						|
.PP
 | 
						|
To decide which of the various code sequences to emit, the back end must have
 | 
						|
some information about the time and memory cost of each one.
 | 
						|
To provide this information, each rule in the driving table, including
 | 
						|
coercions, specifies both the time and memory cost of the code emitted when
 | 
						|
the rule is applied.
 | 
						|
The back end can then simply try each of the legal possibilities (including all
 | 
						|
the possible register allocations) to find the cheapest one.
 | 
						|
.PP
 | 
						|
This situation is similar to that found in a chess or other game-playing
 | 
						|
program, in which from any state a finite number of moves can be made.
 | 
						|
Just as in a chess program, the back end can look at all the "moves" that can
 | 
						|
be made from each state reachable from the original state, and thus find the
 | 
						|
sequence that gives the minimum cost to a depth of one.
 | 
						|
More generally, the back end can evaluate all paths corresponding to accepting
 | 
						|
the next
 | 
						|
.I N
 | 
						|
input tokens, find the cheapest one, and then make the first move along
 | 
						|
that path, precisely the way a chess program would.
 | 
						|
.PP
 | 
						|
Since the back end is analogous to both a parser and a chess playing program,
 | 
						|
some clarifying remarks may be helpful.
 | 
						|
First, chess programs and the back end must do some look ahead, whereas the
 | 
						|
parser for a well-designed grammar can usually suffice with one input token
 | 
						|
because grammars are supposed to be unambiguous.
 | 
						|
In contrast, many legal mappings
 | 
						|
from a sequence of EM instructions to target code may exist.
 | 
						|
Second, like a parser but unlike a chess program, the back end has perfect
 | 
						|
information -- it does not have to contend with an unpredictable opponent's
 | 
						|
moves.
 | 
						|
Third, chess programs normally make a static evaluation of the board and
 | 
						|
label the
 | 
						|
.I nodes
 | 
						|
of the tree with the resulting scores.
 | 
						|
The back end, in contrast, associates costs with
 | 
						|
.I arcs
 | 
						|
(moves) rather than nodes (states).
 | 
						|
However, the difference is not essential, since it could 
 | 
						|
also label each node with the cumulative cost from the root to that node.
 | 
						|
.PP
 | 
						|
As mentioned above, the cost field in the table contains
 | 
						|
.I both
 | 
						|
the time and memory costs for the code emitted.
 | 
						|
It should be clear that the back end could use either one
 | 
						|
or some linear combination of them as the scoring function for evaluating moves.
 | 
						|
A user can instruct the compiler to optimize for time or for memory or
 | 
						|
for, say,  0.3 x time + 0.7 x memory.
 | 
						|
Thus the same compiler can provide a wide range of performance options to
 | 
						|
the user.
 | 
						|
The writer of the back end table can take advantage of this flexibility by
 | 
						|
providing several code sequences with different tradeoffs for each EM
 | 
						|
instruction (e.g., in line code vs. call to a run time routine).
 | 
						|
.PP
 | 
						|
In addition to the time-space tradeoffs, by specifying the depth of search
 | 
						|
parameter,
 | 
						|
.I N ,
 | 
						|
the user can effectively also tradeoff compile time vs. object
 | 
						|
code quality, for whatever code metric has been chosen.
 | 
						|
In summary, by combining the properties of a parser and a game playing program,
 | 
						|
it is possible to make a code generator that is table driven,
 | 
						|
highly flexible, and has the ability to produce good code from a
 | 
						|
stack machine intermediate code.
 | 
						|
.NH 1
 | 
						|
The Target Machine Optimizer
 | 
						|
.PP
 | 
						|
In the model of Fig 2., the peephole optimizer comes before the global
 | 
						|
optimizer.
 | 
						|
It may happen that the code produced by the global optimizer can also
 | 
						|
be improved by another round of peephole optimization.
 | 
						|
Conceivably, the system could have been designed to iterate peephole and
 | 
						|
global optimizations until no more of either could be performed.
 | 
						|
.PP
 | 
						|
However, both of these optimizations are done on the machine independent
 | 
						|
EM code.
 | 
						|
Neither is able to take advantage of the peculiarities and idiosyncracies with
 | 
						|
which most target machines are well endowed.
 | 
						|
It is the function of the final 
 | 
						|
optimizer to do any (peephole) optimizations that still remain.
 | 
						|
.PP
 | 
						|
The algorithm used here is the same as in the EM peephole optimizer.
 | 
						|
In fact, if it were not for the differences between EM syntax, which is
 | 
						|
very restricted, and target assembly language syntax,
 | 
						|
which is less so, precisely the same program could be used for both.
 | 
						|
Nevertheless, the same ideas apply concerning patterns and replacements, so
 | 
						|
our discussion of this optimizer will be restricted to one example.
 | 
						|
.PP
 | 
						|
To see what the target optimizer might do, consider the
 | 
						|
PDP-11 instruction sequence sub #2,r0;  mov (r0),x.
 | 
						|
First 2 is subtracted from register 0, then the word pointed to by it
 | 
						|
is moved to x.
 | 
						|
The PDP-11 happens to have an addressing mode to perform this sequence in
 | 
						|
one instruction: mov -(r0),x.
 | 
						|
Although it is conceivable that this instruction could be included in the
 | 
						|
back end driving table for the PDP-11, it is awkward to do so because it
 | 
						|
can occur in so many contexts.
 | 
						|
It is much easier to catch things like this in a separate program.
 | 
						|
.NH 1
 | 
						|
The Universal Assembler/Linker
 | 
						|
.PP
 | 
						|
Although assembly languages for different machines may appear very different
 | 
						|
at first glance, they have a surprisingly large intersection.
 | 
						|
We have been able to construct an assembler/linker that is almost entirely
 | 
						|
independent of the assembly language being processed.
 | 
						|
To tailor the program to a specific assembly language, it is necessary to
 | 
						|
supply a table giving the list of instructions, the bit patterns required for
 | 
						|
each one, and the language syntax.
 | 
						|
The machine independent part of the assembler/linker is then compiled with the
 | 
						|
table to produce an assembler and linker for a particular target machine.
 | 
						|
Experience has shown that writing the necessary table for a new machine can be
 | 
						|
done in less than a week.
 | 
						|
.PP
 | 
						|
To enforce a modicum of uniformity, we have chosen to use a common set of
 | 
						|
pseudoinstructions for all target machines.
 | 
						|
They are used to initialize memory, allocate uninitialized memory, determine the
 | 
						|
current segment, and similar functions found in most assemblers.
 | 
						|
.PP
 | 
						|
The assembler is also a linker.
 | 
						|
After assembling a program, it checks to see if there are any
 | 
						|
unsatisfied external references.
 | 
						|
If so, it begins reading the libraries to find the necessary routines, including
 | 
						|
them in the object file as it finds them.
 | 
						|
This approach requires libraries to be maintained in assembly language form,
 | 
						|
but eliminates the need for inventing a language to express relocatable
 | 
						|
object programs in a machine independent way.
 | 
						|
It also simplifies the assembler, since producing absolute object code is
 | 
						|
easier than producing relocatable object code.
 | 
						|
Finally, although assembly language libraries may be somewhat larger than
 | 
						|
relocatable object module libraries, the loss in speed due to having more
 | 
						|
input may be more than compensated for by not having to pass an intermediate
 | 
						|
file between the assembler and linker.
 | 
						|
.NH 1
 | 
						|
The Utility Package
 | 
						|
.PP
 | 
						|
The utility package is a collection of programs designed to aid the
 | 
						|
implementers of new front ends or new back ends.
 | 
						|
The most useful ones are the test programs.
 | 
						|
For example, one test set, EMTEST, systematically checks out a back end by
 | 
						|
executing an ever larger subset of the EM instructions.
 | 
						|
It starts out by testing LOC, LOL and a few of the other essential instructions.
 | 
						|
If these appear to work, it then tries out new instructions one at a time,
 | 
						|
adding them to the set of instructions "known" to work as they pass the tests.
 | 
						|
.PP
 | 
						|
Each instruction is tested with a variety of operands chosen from values 
 | 
						|
where problems can be expected.
 | 
						|
For example, on target machines which have 16-bit index registers but only
 | 
						|
allow 8-bit displacements, a fundamentally different algorithm may be needed
 | 
						|
for accessing
 | 
						|
the first few bytes of local variables and those with offsets of thousands.
 | 
						|
The test programs have been carefully designed to thoroughly test all relevant
 | 
						|
cases.
 | 
						|
.PP
 | 
						|
In addition to EMTEST, test programs in Pascal, C, and other languages are also
 | 
						|
available.
 | 
						|
A typical test is:
 | 
						|
.sp
 | 
						|
   i := 9; \fBif\fP i + 250 <> 259 \fBthen\fP error(16);
 | 
						|
.sp
 | 
						|
Like EMTEST, the other test programs systematically exercise all features of the
 | 
						|
language being tested, and do so in a way that makes it possible to pinpoint
 | 
						|
errors precisely.
 | 
						|
While it has been said that testing can only demonstrate the presence of errors
 | 
						|
and not their absence, our experience is that 
 | 
						|
the test programs have been invaluable in debugging new parts of the system
 | 
						|
quickly.
 | 
						|
.PP
 | 
						|
Other utilities include programs to convert
 | 
						|
the highly compact EM code produced by front ends to ASCII and vice versa,
 | 
						|
programs to build various internal tables from human writable input formats,
 | 
						|
a variety of libraries written in or compiled to EM to make them portable,
 | 
						|
an EM assembler, and EM interpreters for various machines.
 | 
						|
.PP
 | 
						|
Interpreting the EM code instead of translating it to target machine language
 | 
						|
is useful for several reasons.
 | 
						|
First, the interpreters provide extensive run time diagnostics including
 | 
						|
an option to list the original source program (in Pascal, C, etc.) with the
 | 
						|
execution frequency or execution time for each source line printed in the
 | 
						|
left margin.
 | 
						|
Second, since an EM program is typically about one-third the size of a
 | 
						|
compiled program, large programs can be executed on small machines.
 | 
						|
Third, running the EM code directly makes it easier to pinpoint errors in 
 | 
						|
the EM output of front ends still being debugged.
 | 
						|
.NH 1
 | 
						|
Summary and Conclusions
 | 
						|
.PP
 | 
						|
The Amsterdam Compiler Kit is a tool kit for building
 | 
						|
portable (cross) compilers and interpreters.
 | 
						|
The main pieces of the kit are the front ends, which convert source programs
 | 
						|
to EM code, optimizers, which improve the EM code, and back ends, which convert
 | 
						|
the EM code to target assembly language.
 | 
						|
The kit is highly modular, so writing one front end
 | 
						|
(and its associated runtime routines)
 | 
						|
is sufficient to implement
 | 
						|
a new language on a dozen or more machines, and writing one back end table
 | 
						|
and one universal assembler/linker table is all that is needed to bring up all
 | 
						|
the previously implemented languages on a new machine.
 | 
						|
In this manner, the contents, and hopefully the usefulness, of the toolkit
 | 
						|
will increase in time.
 | 
						|
.PP
 | 
						|
We believe the principal lesson to be learned from our work is that the old
 | 
						|
UNCOL idea is basically a sound way to produce compilers, provided suitable
 | 
						|
restrictions are placed on the source languages and target machines.
 | 
						|
We also believe that although compilers produced by this technology may not
 | 
						|
be equal to the very best handcrafted compilers,
 | 
						|
in terms of object code quality, they are certainly
 | 
						|
competitive with many existing compilers.
 | 
						|
However, when one factors in the cost of producing the compiler,
 | 
						|
the possible slight loss in performance may be more than compensated for by the
 | 
						|
large decrease in production cost.
 | 
						|
As a consequence of our work and similar work by other researchers [1,3,4],
 | 
						|
we expect integrated compiler building kits to become increasingly popular
 | 
						|
in the near future.
 | 
						|
.PP
 | 
						|
The toolkit is now available for various computers running the
 | 
						|
.UX
 | 
						|
operating system.
 | 
						|
For information, contact the authors.
 | 
						|
.NH 1
 | 
						|
References
 | 
						|
.LP
 | 
						|
.nr r 0 1
 | 
						|
.in +4
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Graham, S.L.
 | 
						|
Table-Driven Code Generation.
 | 
						|
.I "Computer~13" ,
 | 
						|
8 (August 1980), 25-34.
 | 
						|
.PP
 | 
						|
A discussion of systematic ways to do code generation,
 | 
						|
in particular, the idea of having a table with templates that match parts of
 | 
						|
the parse tree and convert them into machine instructions.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Haddon, B.K., and Waite, W.M.
 | 
						|
Experience with the Universal Intermediate Language Janus.
 | 
						|
.I "Software Practice & Experience~8" ,
 | 
						|
5 (Sept.-Oct. 1978), 601-616.
 | 
						|
.PP
 | 
						|
An intermediate language for use with ALGOL 68, Pascal, etc. is described.
 | 
						|
The paper discusses some problems encountered and how they were dealt with.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Johnson, S.C.
 | 
						|
A Portable Compiler: Theory and Practice.
 | 
						|
.I "Ann. ACM Symp. Prin. Prog. Lang." ,
 | 
						|
Jan. 1978.
 | 
						|
.PP
 | 
						|
A cogent discussion of the portable C compiler.
 | 
						|
Particularly interesting are the author's thoughts on the value of
 | 
						|
computer science theory.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Leverett, B.W., Cattell, R.G.G, Hobbs, S.O., Newcomer, J.M.,
 | 
						|
Reiner, A.H., Schatz, B.R., and Wulf, W.A.
 | 
						|
An Overview of the Production-Quality Compiler-Compiler Project.
 | 
						|
.I Computer~13 ,
 | 
						|
8 (August 1980), 38-49.
 | 
						|
.PP
 | 
						|
PQCC is a system for building compilers similar in concept but differing in
 | 
						|
details from the Amsterdam Compiler Kit.
 | 
						|
The paper describes the intermediate representation used and the code generation
 | 
						|
strategy.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Lowry, E.S., and Medlock, C.W.
 | 
						|
Object Code Optimization.
 | 
						|
.I "Commun.~ACM~12",
 | 
						|
(Jan. 1969), 13-22.
 | 
						|
.PP
 | 
						|
A classic paper on global object code optimization.
 | 
						|
It covers data flow analysis, common subexpressions, code motion, register
 | 
						|
allocation and other techniques.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Nori, K.V., Ammann, U., Jensen, K., Nageli, H.
 | 
						|
The Pascal P Compiler Implementation Notes.
 | 
						|
Eidgen. Tech. Hochschule, Zurich, 1975.
 | 
						|
.PP
 | 
						|
A description of the original P-code machine, used to transport the Pascal-P
 | 
						|
compiler to new computers.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Steel, T.B., Jr. UNCOL: the Myth and the Fact. in
 | 
						|
.I "Ann. Rev. Auto. Prog."
 | 
						|
Goodman, R. (ed.), vol 2., (1960), 325-344.
 | 
						|
.PP
 | 
						|
An introduction to the UNCOL idea by its originator.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Steel, T.B., Jr.
 | 
						|
A First Version of UNCOL.
 | 
						|
.I "Proc. Western Joint Comp. Conf." ,
 | 
						|
(1961), 371-377.
 | 
						|
.PP
 | 
						|
The first detailed proposal for an UNCOL.  By current standards it is a
 | 
						|
primitive language, but it is interesting for its historical perspective.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB~\n+r.\fR Tanenbaum, A.S., van Staveren, H., and Stevenson, J.W.
 | 
						|
Using Peephole Optimization on Intermediate Code.
 | 
						|
.I "ACM Trans. Prog. Lang. and Sys. 3" ,
 | 
						|
1 (Jan. 1982) pp. 21-36.
 | 
						|
.PP
 | 
						|
A detailed description of a table-driven peephole optimizer.
 | 
						|
The driving table provides a list of patterns to match as well as the
 | 
						|
replacement text to use for each successful match.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB\n+r.\fR Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
 | 
						|
Description of an Experimental Machine Architecture for use with Block
 | 
						|
Structured Languages.
 | 
						|
Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
 | 
						|
.PP
 | 
						|
The defining document for EM.
 | 
						|
.sp 2
 | 
						|
.ti -4
 | 
						|
\fB\n+r.\fR Tanenbaum, A.S.
 | 
						|
Implications of Structured Programming for Machine Architecture.
 | 
						|
.I "Comm. ACM~21" ,
 | 
						|
3 (March 1978), 237-246.
 | 
						|
.PP
 | 
						|
The background and motivation for the design of EM.
 | 
						|
This early version emphasized the idea of interpreting the intermediate
 | 
						|
code (then called EM-1) rather than compiling it.
 |