897 lines
39 KiB
Plaintext
897 lines
39 KiB
Plaintext
|
.RP
|
||
|
.ND
|
||
|
.nr LL 78m
|
||
|
.tr ~
|
||
|
.ds as *
|
||
|
.TL
|
||
|
A Practical Tool Kit for Making Portable Compilers
|
||
|
.AU
|
||
|
Andrew S. Tanenbaum
|
||
|
Hans van Staveren
|
||
|
E. G. Keizer
|
||
|
Johan W. Stevenson
|
||
|
.AI
|
||
|
Mathematics Dept.
|
||
|
Vrije Universiteit
|
||
|
Amsterdam, The Netherlands
|
||
|
.AB
|
||
|
The Amsterdam Compiler Kit is an integrated collection of programs designed to
|
||
|
simplify the task of producing portable (cross) compilers and interpreters.
|
||
|
For each language to be compiled, a program (called a front end)
|
||
|
must be written to
|
||
|
translate the source program into a common intermediate code.
|
||
|
This intermediate code can be optimized and then either directly interpreted
|
||
|
or translated to the assembly language of the desired target machine.
|
||
|
The paper describes the various pieces of the tool kit in some detail, as well
|
||
|
as discussing the overall strategy.
|
||
|
.sp
|
||
|
Keywords: Compiler, Interpreter, Portability, Translator
|
||
|
.sp
|
||
|
CR Categories: 4.12, 4.13, 4.22
|
||
|
.sp 12
|
||
|
Author's present addresses:
|
||
|
A.S. Tanenbaum, H. van Staveren, E.G. Keizer: Mathematics
|
||
|
Dept., Vrije Universiteit, Postbus 7161, 1007 MC Amsterdam,
|
||
|
The Netherlands
|
||
|
|
||
|
J.W. Stevenson: NV Philips, S&I, T&M, Building TQ V5, Eindhoven,
|
||
|
The Netherlands
|
||
|
.AE
|
||
|
.NH 1
|
||
|
Introduction
|
||
|
.PP
|
||
|
As more and more organizations acquire many micro- and minicomputers,
|
||
|
the need for portable compilers is becoming more and more acute.
|
||
|
The present situation, in which each hardware vendor provides its own
|
||
|
compilers -- each with its own deficiencies and extensions, and none of them
|
||
|
compatible -- leaves much to be desired.
|
||
|
The ideal situation would be an integrated system containing a family
|
||
|
of (cross) compilers, each compiler accepting a standard source language and
|
||
|
producing code for a wide variety of target machines.
|
||
|
Furthermore, the compilers should be compatible, so programs written in
|
||
|
one language can call procedures written in another language.
|
||
|
Finally, the system should be designed so as to make adding new languages
|
||
|
and new machines easy.
|
||
|
Such an integrated system is being built at the Vrije Universiteit.
|
||
|
Its design and implementation is the subject of this article.
|
||
|
.PP
|
||
|
Our compiler building system, which is called the "Amsterdam Compiler Kit"
|
||
|
(ACK), can be thought of as a "tool kit."
|
||
|
It consists of a number of parts that can be combined to form compilers
|
||
|
(and interpreters) with various properties.
|
||
|
The tool kit is based on an idea (UNCOL) that was first suggested in 1960
|
||
|
[7], but which never really caught on then.
|
||
|
The problem which UNCOL attempts to solve is how to make a compiler for
|
||
|
each of
|
||
|
.I N
|
||
|
languages on
|
||
|
.I M
|
||
|
different machines without having to write
|
||
|
.I N
|
||
|
x
|
||
|
.I M
|
||
|
programs.
|
||
|
.PP
|
||
|
As shown in Fig. 1, the UNCOL approach is to write
|
||
|
.I N
|
||
|
"front ends," each
|
||
|
of which translates one source language to a common intermediate language,
|
||
|
UNCOL (UNiversal Computer Oriented Language), and
|
||
|
.I M
|
||
|
"back ends," each
|
||
|
of which translates programs in UNCOL to a specific machine language.
|
||
|
Under these conditions, only
|
||
|
.I N
|
||
|
+
|
||
|
.I M
|
||
|
programs must be written to provide all
|
||
|
.I N
|
||
|
languages on all
|
||
|
.I M
|
||
|
machines, instead of
|
||
|
.I N
|
||
|
x
|
||
|
.I M
|
||
|
programs.
|
||
|
.PP
|
||
|
Various researchers have attempted to design a suitable UNCOL
|
||
|
[2,8], but none of these have become popular.
|
||
|
It is our belief that previous attempts have failed because they have been
|
||
|
too ambitious, that is, they have tried to cover all languages
|
||
|
and all machines using a single UNCOL.
|
||
|
Our approach is more modest: we cater only to algebraic languages
|
||
|
and machines whose memory consists of 8-bit bytes, each with its own address.
|
||
|
Typical languages that could be handled include
|
||
|
Ada, ALGOL 60, ALGOL 68, BASIC, C, FORTRAN,
|
||
|
Modula, Pascal, PL/I, PL/M, PLAIN, and RATFOR,
|
||
|
whereas COBOL, LISP, and SNOBOL would be less efficient.
|
||
|
Examples of machines that could be included are the Intel 8080 and 8086,
|
||
|
Motorola 6800, 6809, and 68000, Zilog Z80 and Z8000, DEC PDP-11 and VAX,
|
||
|
and IBM 370 but not the Burroughs 6700, CDC Cyber, or Univac 1108 (because
|
||
|
they are not byte-oriented).
|
||
|
With these restrictions, we believe the old UNCOL idea can be used as the
|
||
|
basis of a practical compiler-building system.
|
||
|
.KF
|
||
|
.sp 15P
|
||
|
.ce 1
|
||
|
Fig. 1. The UNCOL model.
|
||
|
.sp
|
||
|
.KE
|
||
|
.NH 1
|
||
|
An Overview of the Amsterdam Compiler Kit
|
||
|
.PP
|
||
|
The tool kit consists of eight components:
|
||
|
.sp
|
||
|
1. The preprocessor.
|
||
|
2. The front ends.
|
||
|
3. The peephole optimizer.
|
||
|
4. The global optimizer.
|
||
|
5. The back end.
|
||
|
6. The target machine optimizer.
|
||
|
7. The universal assembler/linker.
|
||
|
8. The utility package.
|
||
|
.sp
|
||
|
.PP
|
||
|
A fully optimizing compiler,
|
||
|
depicted in Fig. 2, has seven cascaded phases.
|
||
|
Conceptually, each component reads an input file and writes a
|
||
|
transformed output file to be used as input to the next component.
|
||
|
In practice, some components may use temporary files to allow multiple
|
||
|
passes over the input or internal intermediate files.
|
||
|
.KF
|
||
|
.sp 12P
|
||
|
.ce 1
|
||
|
Fig. 2. Structure of the Amsterdam Compiler Kit.
|
||
|
.sp
|
||
|
.KE
|
||
|
.PP
|
||
|
In the following paragraphs we will briefly describe each component.
|
||
|
After this overview, we will look at all of them again in more detail.
|
||
|
A program to be compiled is first fed into the (language independent)
|
||
|
preprocessor, which provides a simple macro facility,
|
||
|
and similar textual facilties.
|
||
|
The preprocessor's output is a legal program in one of the programming
|
||
|
languages supported, whereas the input is a program possibly augmented
|
||
|
with macros, etc.
|
||
|
.PP
|
||
|
This output goes into the appropriate front end, whose job it is to
|
||
|
produce intermediate code.
|
||
|
This intermediate code (our UNCOL) is the machine language for a simple
|
||
|
stack machine called EM (Encoding Machine).
|
||
|
A typical front end might build a parse tree from the input, and then
|
||
|
use the parse tree to generate EM code, which is similar to reverse Polish.
|
||
|
In order to perform this work, the front end has to maintain tables of
|
||
|
declared variables, labels, etc., determine where to place the
|
||
|
data structures in memory, and so on.
|
||
|
.PP
|
||
|
The EM code generated by the front end is fed into the peephole optimizer,
|
||
|
which scans it with a window of a few instructions, replacing certain
|
||
|
inefficient code sequences by better ones.
|
||
|
Such a search is important because EM contains instructions to handle
|
||
|
numerous important special cases efficiently
|
||
|
(e.g., incrementing a variable by 1).
|
||
|
It is our strategy to relieve the front ends of the burden of hunting for
|
||
|
special cases because there are many front ends and only one peephole
|
||
|
optimizer.
|
||
|
By handling the special cases in the peephole optimizer,
|
||
|
the front ends become simpler, easier to write and easier to maintain.
|
||
|
.PP
|
||
|
Following the peephole optimizer is a global optimizer [5], which
|
||
|
unlike the peephole optimizer, examines the program as a whole.
|
||
|
It builds a data flow graph to make possible a variety of
|
||
|
global optimizations,
|
||
|
among them, moving invariant code out of loops, avoiding redundant
|
||
|
computations, live/dead analysis and eliminating tail recursion.
|
||
|
Note that the output of the global optimizer is still EM code.
|
||
|
.PP
|
||
|
Next comes the back end, which differs from the front ends in a
|
||
|
fundamental way.
|
||
|
Each front end is a separate program, whereas the back end is a single
|
||
|
program that is driven by a machine dependent driving table.
|
||
|
The driving table for a specific machine tells how the EM code is mapped
|
||
|
onto the machine's assembly language.
|
||
|
Although a simple driving table might just macro expand each EM instruction
|
||
|
into a sequence of target machine instructions, a much more sophisticated
|
||
|
translation strategy is normally used, as described later.
|
||
|
For speed, the back end does not actually read in the driving table at run time.
|
||
|
Instead, the tables are compiled along with the back end in advance, resulting
|
||
|
in one binary program per machine.
|
||
|
.PP
|
||
|
The output of the back end is a program in the assembly language of some
|
||
|
particular machine.
|
||
|
The next component in the pipeline reads this program and performs peephole
|
||
|
optimization on it.
|
||
|
The optimizations performed here involve idiosyncracies
|
||
|
of the target machine that cannot be performed in the machine-independent
|
||
|
EM-to-EM peephole optimizer.
|
||
|
Typically these optimizations take advantage of special instructions or special
|
||
|
addressing modes.
|
||
|
.PP
|
||
|
The optimized target machine assembly code then goes into the final
|
||
|
component in the pipeline, the universal assembler/linker.
|
||
|
This program assembles the input to object format, extracting routines from
|
||
|
libraries and including them as needed.
|
||
|
.PP
|
||
|
The final component of the tool kit is the utility package, which contains
|
||
|
various test programs, interpreters for EM code,
|
||
|
EM libraries, conversion programs, and other aids for the implementer and
|
||
|
user.
|
||
|
.NH 1
|
||
|
The Preprocessor
|
||
|
.PP
|
||
|
The function of the preprocessor is to extend all the programming languages
|
||
|
by adding certain generally useful facilities to them in a uniform way.
|
||
|
One of these is a simple macro system, in which the user can give names to
|
||
|
character strings.
|
||
|
The names can be used in the program, with the knowledge that they will be
|
||
|
macro expanded prior to being input to the front end.
|
||
|
Macros can be used for named constants, expanding short "procedures"
|
||
|
in line, etc.
|
||
|
.PP
|
||
|
Another useful facility provided by the preprocessor is the ability to
|
||
|
include compile-time libraries.
|
||
|
On large projects, it is common to have all the declarations and definitions
|
||
|
gathered together in a few files that are textually included in the programs
|
||
|
by instructing the preprocessor to read them in, thus fooling the front end
|
||
|
into thinking that they were part of the source program.
|
||
|
.PP
|
||
|
A third feature of the preprocessor is conditional compilation.
|
||
|
The input program can be split up into labeled sections.
|
||
|
By setting flags, some of the sections can be deleted by the preprocessor,
|
||
|
thus allowing a family of slightly different programs to be conveniently stored
|
||
|
on a single file.
|
||
|
.NH 1
|
||
|
The Front Ends
|
||
|
.PP
|
||
|
A front end is a program that converts input in some source language to a
|
||
|
program in EM.
|
||
|
At present, front ends
|
||
|
exist or are in preparation for Pascal, C, and Plain, and are being considered
|
||
|
for Ada, ALGOL 68, FORTRAN 77, and Modula 2.
|
||
|
Each of the present front ends is independent of all the other ones,
|
||
|
although a general-purpose, table-driven front end is conceivable, provided
|
||
|
one can devise a way to express the semantics of the source language in the
|
||
|
driving tables.
|
||
|
The Pascal front end uses a top-down parsing algorithm (recursive descent),
|
||
|
whereas the C and Plain front ends are bottom-up.
|
||
|
.PP
|
||
|
All front ends, independent of the language being compiled,
|
||
|
produce a common intermediate code called EM, which is
|
||
|
the assembly language for a simple stack machine.
|
||
|
The EM machine is based on a memory architecture
|
||
|
containing a stack for local variables, a (static) data area for variables
|
||
|
declared in the outermost block and global to the whole program, and a heap
|
||
|
for dynamic data structures.
|
||
|
In some ways EM resembles P-code [6], but is more general, since it is
|
||
|
intended for a wider class of languages than just Pascal.
|
||
|
.PP
|
||
|
The EM instruction set has been described elsewhere
|
||
|
[9,10,11]
|
||
|
so we will only briefly summarize it here.
|
||
|
Instructions exist to:
|
||
|
.sp
|
||
|
1. Load a variable or constant of some length onto the stack.
|
||
|
2. Store the top item on the stack in memory.
|
||
|
3. Add, subtract, multiply, divide, etc. the top two stack items.
|
||
|
4. Examine the top one or two stack items and branch conditionally.
|
||
|
5. Call procedures and return from them.
|
||
|
.sp
|
||
|
.PP
|
||
|
Loads and stores come in several variations, corresponding to the most common
|
||
|
programming language semantics, for example, constants, simple variables,
|
||
|
fields of a record, elements of an array, and so on.
|
||
|
Distinctions are also made between variables local to the current block
|
||
|
(i.e., stack frame), those in the outermost block (static storage), and those
|
||
|
at intermediate lexicographic levels, which are accessed by following the
|
||
|
static chain at run time.
|
||
|
.PP
|
||
|
All arithmetic instructions have a type (integer, unsigned, real,
|
||
|
pointer, or set) and an
|
||
|
operand length, which may either be explicit or may be popped from the stack
|
||
|
at run time.
|
||
|
Monadic branch instructions pop an item from the stack and branch if it is
|
||
|
less than zero, less than or equal to zero, etc.
|
||
|
Dyadic branch instructions pop two items, compare them, and branch accordingly.
|
||
|
.PP
|
||
|
In addition to these basic EM instructions, there is a collection of special
|
||
|
purpose instructions (e.g., to increment a local variable), which are typically
|
||
|
produced from the simple ones by the peephole optimizer.
|
||
|
Although the complete EM instruction set contains nearly 150 instructions,
|
||
|
only about 60 of them are really primitive; the rest are simply abbreviations
|
||
|
for commonly occurring EM instruction sequences.
|
||
|
.PP
|
||
|
Of particular interest is the way object sizes are parametrized.
|
||
|
The front ends allow the user to indicate how many bytes an integer, real, etc.
|
||
|
should occupy.
|
||
|
Given this information, the front ends can allocate memory, determining
|
||
|
the placement of variables within the stack frame.
|
||
|
Sizes for primitive types are restricted to 8, 16, 32, 64, etc. bits.
|
||
|
The front ends are also parametrized by the target machine's word length
|
||
|
and address size so they can tell, for example, how many "load" instructions
|
||
|
to generate to move a 32-bit integer.
|
||
|
In the examples used henceforth,
|
||
|
we will assume a 16-bit word size and 16-bit integers.
|
||
|
.PP
|
||
|
Since only byte-addressable target machines are permitted,
|
||
|
it is nearly
|
||
|
always possible to implement any requested sizes on any target machine.
|
||
|
For example, the designer of the back end tables for the Z80 should provide
|
||
|
code for 8-, 16-, and 32-bit arithmetic.
|
||
|
In our view, the Pascal, C, or Plain programmer specifies what lengths
|
||
|
are needed,
|
||
|
without reference to the target machine,
|
||
|
and the back end provides it.
|
||
|
This approach greatly enhances portability.
|
||
|
While it is true that doing all arithmetic using 32-bit integers on the Z80
|
||
|
will not be terribly fast, we feel that if that is what the programmer needs,
|
||
|
it should be possible to implement it.
|
||
|
.PP
|
||
|
Like all assembly languages, EM has not only machine instructions, but also
|
||
|
pseudoinstructions.
|
||
|
These are used to indicate the start and end of each procedure, allocate
|
||
|
and initialize storage for data, and similar functions.
|
||
|
One particularly important pseudoinstruction is the one that is used to
|
||
|
transmit information to the back end for optimization purposes.
|
||
|
It can be used to suggest variables that are good candidates to assign to
|
||
|
registers, delimit the scope of loops, indicate that certain variables
|
||
|
contain a useful value (next operation is a load) or not (next operation is
|
||
|
a store), and various other things.
|
||
|
.NH 1
|
||
|
The Peephole Optimizer
|
||
|
.PP
|
||
|
The peephole optimizer reads in unoptimized EM programs and writes out
|
||
|
optimized ones.
|
||
|
Both the input and output are expressed in a highly compact code, rather than
|
||
|
in ASCII, to reduce the i/o time, which would otherwise dominate the CPU
|
||
|
time.
|
||
|
The program itself is table driven, and is, by and large, ignorant of the
|
||
|
semantics of EM.
|
||
|
The knowledge of EM is contained in a
|
||
|
language- and machine-independent table consisting of about 400
|
||
|
pattern-replacement pairs.
|
||
|
We will briefly describe the kinds of optimizations it performs below;
|
||
|
a more complete discussion can be found in [9].
|
||
|
.PP
|
||
|
Each line in the driving table describes one optimization, consisting of a
|
||
|
pattern part and a replacement part.
|
||
|
The pattern part is a series of one or more EM instructions and a boolean
|
||
|
expression.
|
||
|
The replacement part is a series of EM instructions with operands.
|
||
|
A typical optimization might be:
|
||
|
.sp
|
||
|
LOL LOC ADI STL ($1 = $4) and ($2 = 1) and ($3 = 2) ==> INL $1
|
||
|
.sp
|
||
|
where the text prior to the ==> symbol is the pattern and the text after it is
|
||
|
the replacement.
|
||
|
LOL loads a local variable onto the stack, LOC loads a constant onto the stack,
|
||
|
ADI is integer addition, and STL is store local.
|
||
|
The pattern specifies that four consecutive EM instructions are present, with
|
||
|
the indicated opcodes, and that furthermore the operand of the first
|
||
|
instruction (denoted by $1) and the fourth instruction (denoted by $4) are the
|
||
|
same, the constant pushed by LOC is 1, and the size of the integers added by
|
||
|
ADI is 2 bytes.
|
||
|
(EM instructions have at most one operand, so it is not necessary to specify
|
||
|
the operand number.)
|
||
|
Under these conditions, the four instructions can be replaced by a single INL
|
||
|
(increment local) instruction whose operand is equal to that of LOL.
|
||
|
.PP
|
||
|
Although the optimizations cover a wide range, the main ones
|
||
|
can be roughly divided into the following categories.
|
||
|
\fIConstant folding\fR
|
||
|
is used to evaluate constant expressions, such as 2*3~+~7 at
|
||
|
compile time instead of run time.
|
||
|
\fIStrength reduction\fR
|
||
|
is used to replace one operation, such as multiply, by
|
||
|
another, such as shift.
|
||
|
\fIReordering of expressions\fR
|
||
|
helps in cases like -K/5, which can be better
|
||
|
evaluated as K/-5, because the former requires
|
||
|
a division and a negation, whereas the latter requires only a division.
|
||
|
\fINull instructions\fR
|
||
|
include resetting the stack pointer after a call with 0 parameters,
|
||
|
offsetting zero bytes to access the
|
||
|
first element of a record, or jumping to the next instruction.
|
||
|
\fISpecial instructions\fR
|
||
|
are those like INL, which deal with common special cases
|
||
|
such as adding one to a variable or comparing something to zero.
|
||
|
\fIGroup moves\fR
|
||
|
are useful because a sequence
|
||
|
of consecutive moves can often be replaced with EM code
|
||
|
that allows the back end to generate a loop instead of in line code.
|
||
|
\fIDead code elimination\fR
|
||
|
is a technique for removing unreachable statements, possibly made unreachable
|
||
|
by previous optimizations.
|
||
|
\fIBranch chain compression\fR
|
||
|
can be applied when a branch instruction jumps to another branch instruction.
|
||
|
The first branch can jump directly to the final destination instead of
|
||
|
indirectly.
|
||
|
.PP
|
||
|
The last two optimizations logically belong in the global optimizer but are
|
||
|
in the local optimizer for historical reasons (meaning that the local
|
||
|
optimizer has been the only optimizer for many years and the optimizations were
|
||
|
easy to do there).
|
||
|
.NH 1
|
||
|
The Global Optimizer
|
||
|
.PP
|
||
|
In contrast to the peephole optimizer, which examines the EM code a few lines
|
||
|
at a time through a small window, the global optimizer examines the
|
||
|
program's large scale structure.
|
||
|
Three distinct types of optimizations can be found here:
|
||
|
.sp
|
||
|
1. Interprocedural optimizations.
|
||
|
2. Intraprocedural optimizations.
|
||
|
3. Basic block optimizations.
|
||
|
.sp
|
||
|
We will now look at each of these in turn.
|
||
|
.PP
|
||
|
Interprocedural optimizations are those spanning procedure boundaries.
|
||
|
The most important one is deciding to expand procedures in line,
|
||
|
especially short procedures that occur in loops and pass several parameters.
|
||
|
If it takes more time or memory to pass the parameters than to do the work,
|
||
|
the program can be improved by eliminating the procedure.
|
||
|
The inverse optimization -- discovering long common code sequences and
|
||
|
turning them into a procedure -- is also possible, but much more difficult.
|
||
|
Like much of the global optimizer's work, the decision to make or not make
|
||
|
a certain program transformation is a heuristic one, based on knowledge of
|
||
|
how the back end works, how most target machines are organized, etc.
|
||
|
.PP
|
||
|
The heart of the global optimizer is its analysis of individual
|
||
|
procedures.
|
||
|
To perform this analysis, the optimizer must locate the basic blocks,
|
||
|
instruction sequences which can be entered only at the top and exited
|
||
|
only at the bottom.
|
||
|
It then constructs a data flow graph, with the basic blocks as nodes and
|
||
|
jumps between blocks as arcs.
|
||
|
.PP
|
||
|
From the data flow graph, many important properties of the program can be
|
||
|
discovered and exploited.
|
||
|
Chief among these is the presence of loops, indicated by cycles in the graph.
|
||
|
One important optimization is looking for code that can be moved outside the
|
||
|
loop, either prior to it or subsequent to it.
|
||
|
Such code motion saves execution time, although it does not save memory.
|
||
|
Unrolling loops is also possible and desirable in some cases.
|
||
|
.PP
|
||
|
Another area in which global analysis of loops is especially important is
|
||
|
in register allocation.
|
||
|
While it is true that EM does not have any registers to allocate,
|
||
|
the optimizer can easily collect information to allow the
|
||
|
back end to allocate registers wisely.
|
||
|
For example, the global optimizer can collect static frequency-of-use
|
||
|
and live/dead information about variables.
|
||
|
(A variable is dead at some point in the program if its current value is
|
||
|
not needed, i.e., the next reference to it overwrites it rather than
|
||
|
reading it; if the current value will eventually be used, the variable is
|
||
|
live.)
|
||
|
If two variables are never simultaneously live over some interval of code
|
||
|
(e.g., the body of a loop), they can be packed into a single variable,
|
||
|
which, if used often enough, may warrant being assigned to a register.
|
||
|
.PP
|
||
|
Many loops involve arrays: this leads to other optimizations.
|
||
|
If an array is accessed sequentially, with each iteration using the next
|
||
|
higher numbered element, code improvement is often possible.
|
||
|
Typically, a pointer to the bottom element of each array can be set up
|
||
|
prior to the loop.
|
||
|
Within the loop the element is accessed indirectly via the pointer, which is
|
||
|
also incremented by the element size on each iteration.
|
||
|
If the target machine has an autoincrement addressing mode and the pointer
|
||
|
is assigned to a register, an array access can often be done in a single
|
||
|
instruction.
|
||
|
.PP
|
||
|
Other intraprocedural optimizations include removing tail recursion
|
||
|
(last statement is a recursive call to the procedure itself),
|
||
|
topologically sorting the basic blocks to minimize the number of branch
|
||
|
instructions, and common subexpression recognition.
|
||
|
.PP
|
||
|
The third general class of optimizations done by the global optimizer is
|
||
|
improving the structure of a basic block.
|
||
|
For the most part these involve transforming arithmetic or boolean
|
||
|
expressions into forms that are likely to result in better target code.
|
||
|
As a simple example, A~+~B*C can be converted to B*C~+~A.
|
||
|
The latter can often
|
||
|
be handled by loading B into a register, multiplying the register by C, and
|
||
|
then adding in A, whereas the former may involve first putting A into a
|
||
|
temporary, depending on the details of the code generation table.
|
||
|
Another example of this kind of basic block optimization is transforming
|
||
|
-B~+~A~<~0 into the equivalent, but simpler, A~<~B.
|
||
|
.NH 1
|
||
|
The Back End
|
||
|
.PP
|
||
|
The back end reads a stream of EM instructions and generates assembly code
|
||
|
for the target machine.
|
||
|
Although the algorithm itself is machine independent, for each target
|
||
|
machine a machine dependent driving table must be supplied.
|
||
|
The driving table effectively defines the mapping of EM code to target code.
|
||
|
.PP
|
||
|
It will be convenient to think of the EM instructions being read as a
|
||
|
stream of tokens.
|
||
|
For didactic purposes, we will concentrate on two kinds of tokens:
|
||
|
those that load something onto the stack, and those that perform some operation
|
||
|
on the top one or two values on the stack.
|
||
|
The back end maintains at compile time a simulated stack whose behavior
|
||
|
mirrors what the stack of a hardware EM machine would do at run time.
|
||
|
If the current input token is a load instruction, a new entry is pushed onto
|
||
|
the simulated stack.
|
||
|
.PP
|
||
|
Consider, as an example, the EM code produced for the statement K~:=~I~+~7.
|
||
|
If K and I are
|
||
|
2-byte local variables, it will normally be LOL I; LOC 7; ADI~2; STL K.
|
||
|
Initially the simulated stack is empty.
|
||
|
After the first token has been read and processed, the simulated stack will
|
||
|
contain a stack token of type MEM with attributes telling that it is a local,
|
||
|
giving its address, etc.
|
||
|
After the second token has been read and processed, the top two tokens on the
|
||
|
simulated stack will be CON (constant) on top and MEM directly underneath it.
|
||
|
.PP
|
||
|
At this point the back end reads the ADI~2 token and
|
||
|
looks in the driving table to find a line or lines that define the
|
||
|
action to be taken for ADI~2.
|
||
|
For a typical multiregister machine, instructions will exist to add constants
|
||
|
to registers, but not to memory.
|
||
|
Consequently, the driving table will not contain an entry for ADI~2 with stack
|
||
|
configuration CON, MEM.
|
||
|
.PP
|
||
|
The back end is now faced with the problem of how to get from its
|
||
|
current stack configuration, CON, MEM, which is not listed, to one that is
|
||
|
listed.
|
||
|
The table will normally contain rules (which we call "coercions")
|
||
|
for converting between CON, REG, MEM, and similar tokens.
|
||
|
Therefore the back end attempts to "coerce" the stack into a configuration
|
||
|
that
|
||
|
.I is
|
||
|
present in the table.
|
||
|
A typical coercion rule might tell how to convert a MEM into
|
||
|
a REG, namely by performing the actions of allocating a
|
||
|
register and emitting code to move the memory word to that register.
|
||
|
Having transformed the compile-time stack into a configuration allowed for
|
||
|
ADI~2, the rule can be carried out.
|
||
|
A typical rule
|
||
|
for ADI~2 might have stack configuration REG, MEM
|
||
|
and would emit code to add the MEM to the REG, leaving the stack
|
||
|
with a single REG token instead of the REG and MEM tokens present before the
|
||
|
ADI~2.
|
||
|
.PP
|
||
|
In general, there will be more than one possible coercion path.
|
||
|
Assuming reasonable coercion rules for our example,
|
||
|
we might be able to convert
|
||
|
CON MEM into CON REG by loading the variable I into a register.
|
||
|
Alternatively, we could coerce CON to REG by loading the constant into a register.
|
||
|
The first coercion path does the add by first loading I into a register and
|
||
|
then adding 7 to it.
|
||
|
The second path first loads 7 into a register and then adds I to it.
|
||
|
On machines with a fast LOAD IMMEDIATE instruction for small constants
|
||
|
but no fast ADD IMMEDIATE, or vice
|
||
|
versa, one code sequence will be preferable to the other.
|
||
|
.PP
|
||
|
In fact, we actually have more choices than suggested above.
|
||
|
In both coercion paths a register must be allocated.
|
||
|
On many machines, not every register can be used in every operation, so the
|
||
|
choice may be important.
|
||
|
On some machines, for example, the operand of a multiply must be in an odd
|
||
|
register.
|
||
|
To summarize, from any state (i.e., token and stack configuration), a
|
||
|
variety of choices can be made, leading to a variety of different target
|
||
|
code sequences.
|
||
|
.PP
|
||
|
To decide which of the various code sequences to emit, the back end must have
|
||
|
some information about the time and memory cost of each one.
|
||
|
To provide this information, each rule in the driving table, including
|
||
|
coercions, specifies both the time and memory cost of the code emitted when
|
||
|
the rule is applied.
|
||
|
The back end can then simply try each of the legal possibilities (including all
|
||
|
the possible register allocations) to find the cheapest one.
|
||
|
.PP
|
||
|
This situation is similar to that found in a chess or other game-playing
|
||
|
program, in which from any state a finite number of moves can be made.
|
||
|
Just as in a chess program, the back end can look at all the "moves" that can
|
||
|
be made from each state reachable from the original state, and thus find the
|
||
|
sequence that gives the minimum cost to a depth of one.
|
||
|
More generally, the back end can evaluate all paths corresponding to accepting
|
||
|
the next
|
||
|
.I N
|
||
|
input tokens, find the cheapest one, and then make the first move along
|
||
|
that path, precisely the way a chess program would.
|
||
|
.PP
|
||
|
Since the back end is analogous to both a parser and a chess playing program,
|
||
|
some clarifying remarks may be helpful.
|
||
|
First, chess programs and the back end must do some look ahead, whereas the
|
||
|
parser for a well-designed grammar can usually suffice with one input token
|
||
|
because grammars are supposed to be unambiguous.
|
||
|
In contrast, many legal mappings
|
||
|
from a sequence of EM instructions to target code may exist.
|
||
|
Second, like a parser but unlike a chess program, the back end has perfect
|
||
|
information -- it does not have to contend with an unpredictable opponent's
|
||
|
moves.
|
||
|
Third, chess programs normally make a static evaluation of the board and
|
||
|
label the
|
||
|
.I nodes
|
||
|
of the tree with the resulting scores.
|
||
|
The back end, in contrast, associates costs with
|
||
|
.I arcs
|
||
|
(moves) rather than nodes (states).
|
||
|
However, the difference is not essential, since it could
|
||
|
also label each node with the cumulative cost from the root to that node.
|
||
|
.PP
|
||
|
As mentioned above, the cost field in the table contains
|
||
|
.I both
|
||
|
the time and memory costs for the code emitted.
|
||
|
It should be clear that the back end could use either one
|
||
|
or some linear combination of them as the scoring function for evaluating moves.
|
||
|
A user can instruct the compiler to optimize for time or for memory or
|
||
|
for, say, 0.3 x time + 0.7 x memory.
|
||
|
Thus the same compiler can provide a wide range of performance options to
|
||
|
the user.
|
||
|
The writer of the back end table can take advantage of this flexibility by
|
||
|
providing several code sequences with different tradeoffs for each EM
|
||
|
instruction (e.g., in line code vs. call to a run time routine).
|
||
|
.PP
|
||
|
In addition to the time-space tradeoffs, by specifying the depth of search
|
||
|
parameter,
|
||
|
.I N ,
|
||
|
the user can effectively also tradeoff compile time vs. object
|
||
|
code quality, for whatever code metric has been chosen.
|
||
|
In summary, by combining the properties of a parser and a game playing program,
|
||
|
it is possible to make a code generator that is table driven,
|
||
|
highly flexible, and has the ability to produce good code from a
|
||
|
stack machine intermediate code.
|
||
|
.NH 1
|
||
|
The Target Machine Optimizer
|
||
|
.PP
|
||
|
In the model of Fig 2., the peephole optimizer comes before the global
|
||
|
optimizer.
|
||
|
It may happen that the code produced by the global optimizer can also
|
||
|
be improved by another round of peephole optimization.
|
||
|
Conceivably, the system could have been designed to iterate peephole and
|
||
|
global optimizations until no more of either could be performed.
|
||
|
.PP
|
||
|
However, both of these optimizations are done on the machine independent
|
||
|
EM code.
|
||
|
Neither is able to take advantage of the peculiarities and idiosyncracies with
|
||
|
which most target machines are well endowed.
|
||
|
It is the function of the final
|
||
|
optimizer to do any (peephole) optimizations that still remain.
|
||
|
.PP
|
||
|
The algorithm used here is the same as in the EM peephole optimizer.
|
||
|
In fact, if it were not for the differences between EM syntax, which is
|
||
|
very restricted, and target assembly language syntax,
|
||
|
which is less so, precisely the same program could be used for both.
|
||
|
Nevertheless, the same ideas apply concerning patterns and replacements, so
|
||
|
our discussion of this optimizer will be restricted to one example.
|
||
|
.PP
|
||
|
To see what the target optimizer might do, consider the
|
||
|
PDP-11 instruction sequence sub #2,r0; mov (r0),x.
|
||
|
First 2 is subtracted from register 0, then the word pointed to by it
|
||
|
is moved to x.
|
||
|
The PDP-11 happens to have an addressing mode to perform this sequence in
|
||
|
one instruction: mov -(r0),x.
|
||
|
Although it is conceivable that this instruction could be included in the
|
||
|
back end driving table for the PDP-11, it is awkward to do so because it
|
||
|
can occur in so many contexts.
|
||
|
It is much easier to catch things like this in a separate program.
|
||
|
.NH 1
|
||
|
The Universal Assembler/Linker
|
||
|
.PP
|
||
|
Although assembly languages for different machines may appear very different
|
||
|
at first glance, they have a surprisingly large intersection.
|
||
|
We have been able to construct an assembler/linker that is almost entirely
|
||
|
independent of the assembly language being processed.
|
||
|
To tailor the program to a specific assembly language, it is necessary to
|
||
|
supply a table giving the list of instructions, the bit patterns required for
|
||
|
each one, and the language syntax.
|
||
|
The machine independent part of the assembler/linker is then compiled with the
|
||
|
table to produce an assembler and linker for a particular target machine.
|
||
|
Experience has shown that writing the necessary table for a new machine can be
|
||
|
done in less than a week.
|
||
|
.PP
|
||
|
To enforce a modicum of uniformity, we have chosen to use a common set of
|
||
|
pseudoinstructions for all target machines.
|
||
|
They are used to initialize memory, allocate uninitialized memory, determine the
|
||
|
current segment, and similar functions found in most assemblers.
|
||
|
.PP
|
||
|
The assembler is also a linker.
|
||
|
After assembling a program, it checks to see if there are any
|
||
|
unsatisfied external references.
|
||
|
If so, it begins reading the libraries to find the necessary routines, including
|
||
|
them in the object file as it finds them.
|
||
|
This approach requires libraries to be maintained in assembly language form,
|
||
|
but eliminates the need for inventing a language to express relocatable
|
||
|
object programs in a machine independent way.
|
||
|
It also simplifies the assembler, since producing absolute object code is
|
||
|
easier than producing relocatable object code.
|
||
|
Finally, although assembly language libraries may be somewhat larger than
|
||
|
relocatable object module libraries, the loss in speed due to having more
|
||
|
input may be more than compensated for by not having to pass an intermediate
|
||
|
file between the assembler and linker.
|
||
|
.NH 1
|
||
|
The Utility Package
|
||
|
.PP
|
||
|
The utility package is a collection of programs designed to aid the
|
||
|
implementers of new front ends or new back ends.
|
||
|
The most useful ones are the test programs.
|
||
|
For example, one test set, EMTEST, systematically checks out a back end by
|
||
|
executing an ever larger subset of the EM instructions.
|
||
|
It starts out by testing LOC, LOL and a few of the other essential instructions.
|
||
|
If these appear to work, it then tries out new instructions one at a time,
|
||
|
adding them to the set of instructions "known" to work as they pass the tests.
|
||
|
.PP
|
||
|
Each instruction is tested with a variety of operands chosen from values
|
||
|
where problems can be expected.
|
||
|
For example, on target machines which have 16-bit index registers but only
|
||
|
allow 8-bit displacements, a fundamentally different algorithm may be needed
|
||
|
for accessing
|
||
|
the first few bytes of local variables and those with offsets of thousands.
|
||
|
The test programs have been carefully designed to thoroughly test all relevant
|
||
|
cases.
|
||
|
.PP
|
||
|
In addition to EMTEST, test programs in Pascal, C, and other languages are also
|
||
|
available.
|
||
|
A typical test is:
|
||
|
.sp
|
||
|
i := 9; \fBif\fP i + 250 <> 259 \fBthen\fP error(16);
|
||
|
.sp
|
||
|
Like EMTEST, the other test programs systematically exercise all features of the
|
||
|
language being tested, and do so in a way that makes it possible to pinpoint
|
||
|
errors precisely.
|
||
|
While it has been said that testing can only demonstrate the presence of errors
|
||
|
and not their absence, our experience is that
|
||
|
the test programs have been invaluable in debugging new parts of the system
|
||
|
quickly.
|
||
|
.PP
|
||
|
Other utilities include programs to convert
|
||
|
the highly compact EM code produced by front ends to ASCII and vice versa,
|
||
|
programs to build various internal tables from human writable input formats,
|
||
|
a variety of libraries written in or compiled to EM to make them portable,
|
||
|
an EM assembler, and EM interpreters for various machines.
|
||
|
.PP
|
||
|
Interpreting the EM code instead of translating it to target machine language
|
||
|
is useful for several reasons.
|
||
|
First, the interpreters provide extensive run time diagnostics including
|
||
|
an option to list the original source program (in Pascal, C, etc.) with the
|
||
|
execution frequency or execution time for each source line printed in the
|
||
|
left margin.
|
||
|
Second, since an EM program is typically about one-third the size of a
|
||
|
compiled program, large programs can be executed on small machines.
|
||
|
Third, running the EM code directly makes it easier to pinpoint errors in
|
||
|
the EM output of front ends still being debugged.
|
||
|
.NH 1
|
||
|
Summary and Conclusions
|
||
|
.PP
|
||
|
The Amsterdam Compiler Kit is a tool kit for building
|
||
|
portable (cross) compilers and interpreters.
|
||
|
The main pieces of the kit are the front ends, which convert source programs
|
||
|
to EM code, optimizers, which improve the EM code, and back ends, which convert
|
||
|
the EM code to target assembly language.
|
||
|
The kit is highly modular, so writing one front end
|
||
|
(and its associated runtime routines)
|
||
|
is sufficient to implement
|
||
|
a new language on a dozen or more machines, and writing one back end table
|
||
|
and one universal assembler/linker table is all that is needed to bring up all
|
||
|
the previously implemented languages on a new machine.
|
||
|
In this manner, the contents, and hopefully the usefulness, of the toolkit
|
||
|
will increase in time.
|
||
|
.PP
|
||
|
We believe the principal lesson to be learned from our work is that the old
|
||
|
UNCOL idea is basically a sound way to produce compilers, provided suitable
|
||
|
restrictions are placed on the source languages and target machines.
|
||
|
We also believe that although compilers produced by this technology may not
|
||
|
be equal to the very best handcrafted compilers,
|
||
|
in terms of object code quality, they are certainly
|
||
|
competitive with many existing compilers.
|
||
|
However, when one factors in the cost of producing the compiler,
|
||
|
the possible slight loss in performance may be more than compensated for by the
|
||
|
large decrease in production cost.
|
||
|
As a consequence of our work and similar work by other researchers [1,3,4],
|
||
|
we expect integrated compiler building kits to become increasingly popular
|
||
|
in the near future.
|
||
|
.PP
|
||
|
The toolkit is now available for various computers running the
|
||
|
.UX
|
||
|
operating system.
|
||
|
For information, contact the authors.
|
||
|
.NH 1
|
||
|
References
|
||
|
.LP
|
||
|
.nr r 0 1
|
||
|
.in +4
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Graham, S.L.
|
||
|
Table-Driven Code Generation.
|
||
|
.I "Computer~13" ,
|
||
|
8 (August 1980), 25-34.
|
||
|
.PP
|
||
|
A discussion of systematic ways to do code generation,
|
||
|
in particular, the idea of having a table with templates that match parts of
|
||
|
the parse tree and convert them into machine instructions.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Haddon, B.K., and Waite, W.M.
|
||
|
Experience with the Universal Intermediate Language Janus.
|
||
|
.I "Software Practice & Experience~8" ,
|
||
|
5 (Sept.-Oct. 1978), 601-616.
|
||
|
.PP
|
||
|
An intermediate language for use with ALGOL 68, Pascal, etc. is described.
|
||
|
The paper discusses some problems encountered and how they were dealt with.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Johnson, S.C.
|
||
|
A Portable Compiler: Theory and Practice.
|
||
|
.I "Ann. ACM Symp. Prin. Prog. Lang." ,
|
||
|
Jan. 1978.
|
||
|
.PP
|
||
|
A cogent discussion of the portable C compiler.
|
||
|
Particularly interesting are the author's thoughts on the value of
|
||
|
computer science theory.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Leverett, B.W., Cattell, R.G.G, Hobbs, S.O., Newcomer, J.M.,
|
||
|
Reiner, A.H., Schatz, B.R., and Wulf, W.A.
|
||
|
An Overview of the Production-Quality Compiler-Compiler Project.
|
||
|
.I Computer~13 ,
|
||
|
8 (August 1980), 38-49.
|
||
|
.PP
|
||
|
PQCC is a system for building compilers similar in concept but differing in
|
||
|
details from the Amsterdam Compiler Kit.
|
||
|
The paper describes the intermediate representation used and the code generation
|
||
|
strategy.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Lowry, E.S., and Medlock, C.W.
|
||
|
Object Code Optimization.
|
||
|
.I "Commun.~ACM~12",
|
||
|
(Jan. 1969), 13-22.
|
||
|
.PP
|
||
|
A classic paper on global object code optimization.
|
||
|
It covers data flow analysis, common subexpressions, code motion, register
|
||
|
allocation and other techniques.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Nori, K.V., Ammann, U., Jensen, K., Nageli, H.
|
||
|
The Pascal P Compiler Implementation Notes.
|
||
|
Eidgen. Tech. Hochschule, Zurich, 1975.
|
||
|
.PP
|
||
|
A description of the original P-code machine, used to transport the Pascal-P
|
||
|
compiler to new computers.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Steel, T.B., Jr. UNCOL: the Myth and the Fact. in
|
||
|
.I "Ann. Rev. Auto. Prog."
|
||
|
Goodman, R. (ed.), vol 2., (1960), 325-344.
|
||
|
.PP
|
||
|
An introduction to the UNCOL idea by its originator.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Steel, T.B., Jr.
|
||
|
A First Version of UNCOL.
|
||
|
.I "Proc. Western Joint Comp. Conf." ,
|
||
|
(1961), 371-377.
|
||
|
.PP
|
||
|
The first detailed proposal for an UNCOL. By current standards it is a
|
||
|
primitive language, but it is interesting for its historical perspective.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB~\n+r.\fR Tanenbaum, A.S., van Staveren, H., and Stevenson, J.W.
|
||
|
Using Peephole Optimization on Intermediate Code.
|
||
|
.I "ACM Trans. Prog. Lang. and Sys. 3" ,
|
||
|
1 (Jan. 1982) pp. 21-36.
|
||
|
.PP
|
||
|
A detailed description of a table-driven peephole optimizer.
|
||
|
The driving table provides a list of patterns to match as well as the
|
||
|
replacement text to use for each successful match.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB\n+r.\fR Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
|
||
|
Description of an Experimental Machine Architecture for use with Block
|
||
|
Structured Languages.
|
||
|
Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
|
||
|
.PP
|
||
|
The defining document for EM.
|
||
|
.sp 2
|
||
|
.ti -4
|
||
|
\fB\n+r.\fR Tanenbaum, A.S.
|
||
|
Implications of Structured Programming for Machine Architecture.
|
||
|
.I "Comm. ACM~21" ,
|
||
|
3 (March 1978), 237-246.
|
||
|
.PP
|
||
|
The background and motivation for the design of EM.
|
||
|
This early version emphasized the idea of interpreting the intermediate
|
||
|
code (then called EM-1) rather than compiling it.
|