896 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			896 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .\" $Header$
 | |
| .RP
 | |
| .ND July 1984
 | |
| .tr ~
 | |
| .ds as *
 | |
| .TL
 | |
| A Practical Tool Kit for Making Portable Compilers
 | |
| .AU
 | |
| Andrew S. Tanenbaum
 | |
| Hans van Staveren
 | |
| E. G. Keizer
 | |
| Johan W. Stevenson
 | |
| .AI
 | |
| Mathematics Dept.
 | |
| Vrije Universiteit
 | |
| Amsterdam, The Netherlands
 | |
| .AB
 | |
| The Amsterdam Compiler Kit is an integrated collection of programs designed to
 | |
| simplify the task of producing portable (cross) compilers and interpreters.
 | |
| For each language to be compiled, a program (called a front end) 
 | |
| must be written to
 | |
| translate the source program into a common intermediate code.
 | |
| This intermediate code can be optimized and then either directly interpreted
 | |
| or translated to the assembly language of the desired target machine.
 | |
| The paper describes the various pieces of the tool kit in some detail, as well
 | |
| as discussing the overall strategy.
 | |
| .sp
 | |
| Keywords: Compiler, Interpreter, Portability, Translator
 | |
| .sp
 | |
| CR Categories: 4.12, 4.13, 4.22
 | |
| .sp 12
 | |
| Author's present addresses:
 | |
|   A.S. Tanenbaum, H. van Staveren, E.G. Keizer: Mathematics
 | |
|      Dept., Vrije Universiteit, Postbus 7161, 1007 MC Amsterdam,
 | |
|      The Netherlands
 | |
| 
 | |
|   J.W. Stevenson: NV Philips, S&I, T&M, Building TQ V5, Eindhoven,
 | |
|      The Netherlands
 | |
| .AE
 | |
| .NH 1
 | |
| Introduction
 | |
| .PP
 | |
| As more and more organizations acquire many micro- and minicomputers,
 | |
| the need for portable compilers is becoming more and more acute.
 | |
| The present situation, in which each hardware vendor provides its own
 | |
| compilers -- each with its own deficiencies and extensions, and none of them
 | |
| compatible -- leaves much to be desired.
 | |
| The ideal situation would be an integrated system containing a family
 | |
| of (cross) compilers, each compiler accepting a standard source language and
 | |
| producing code for a wide variety of target machines.
 | |
| Furthermore, the compilers should be compatible, so programs written in
 | |
| one language can call procedures written in another language.
 | |
| Finally, the system should be designed so as to make adding new languages
 | |
| and new machines easy.
 | |
| Such an integrated system is being built at the Vrije Universiteit.
 | |
| Its design and implementation is the subject of this article.
 | |
| .PP
 | |
| Our compiler building system, which is called the "Amsterdam Compiler Kit"
 | |
| (ACK), can be thought of as a "tool kit."
 | |
| It consists of a number of parts that can be combined to form compilers
 | |
| (and interpreters) with various properties.
 | |
| The tool kit is based on an idea (UNCOL) that was first suggested in 1960
 | |
| [7], but which never really caught on then.
 | |
| The problem which UNCOL attempts to solve is how to make a compiler for
 | |
| each of
 | |
| .I N
 | |
| languages on
 | |
| .I M
 | |
| different machines without having to write 
 | |
| .I N
 | |
| x
 | |
| .I M
 | |
| programs.
 | |
| .PP
 | |
| As shown in Fig. 1, the UNCOL approach is to write
 | |
| .I N
 | |
| "front ends," each
 | |
| of which translates one source language to a common intermediate language,
 | |
| UNCOL (UNiversal Computer Oriented Language), and
 | |
| .I M
 | |
| "back ends," each
 | |
| of which translates programs in UNCOL to a specific machine language.
 | |
| Under these conditions, only
 | |
| .I N
 | |
| +
 | |
| .I M
 | |
| programs must be written to provide all
 | |
| .I N
 | |
| languages on all
 | |
| .I M
 | |
| machines, instead of 
 | |
| .I N
 | |
| x
 | |
| .I M
 | |
| programs.
 | |
| .PP
 | |
| Various researchers have attempted to design a suitable UNCOL
 | |
| [2,8], but none of these have become popular.
 | |
| It is our belief that previous attempts have failed because they have been
 | |
| too ambitious, that is, they have tried to cover all languages
 | |
| and all machines using a single UNCOL.
 | |
| Our approach is more modest: we cater only to algebraic languages
 | |
| and machines whose memory consists of 8-bit bytes, each with its own address.
 | |
| Typical languages that could be handled include
 | |
| Ada, ALGOL 60, ALGOL 68, BASIC, C, FORTRAN,
 | |
| Modula, Pascal, PL/I, PL/M, PLAIN, and RATFOR,
 | |
| whereas COBOL, LISP, and SNOBOL would be less efficient.
 | |
| Examples of machines that could be included are the Intel 8080 and 8086,
 | |
| Motorola 6800, 6809, and 68000, Zilog Z80 and Z8000, DEC PDP-11 and VAX,
 | |
| and IBM 370 but not the Burroughs 6700, CDC Cyber, or Univac 1108 (because
 | |
| they are not byte-oriented).
 | |
| With these restrictions, we believe the old UNCOL idea can be used as the
 | |
| basis of a practical compiler-building system.
 | |
| .KF
 | |
| .sp 15P
 | |
| .ce 1
 | |
| Fig. 1.  The UNCOL model.
 | |
| .sp
 | |
| .KE
 | |
| .NH 1
 | |
| An Overview of the Amsterdam Compiler Kit
 | |
| .PP
 | |
| The tool kit consists of eight components:
 | |
| .sp
 | |
|   1. The preprocessor.
 | |
|   2. The front ends.
 | |
|   3. The peephole optimizer.
 | |
|   4. The global optimizer.
 | |
|   5. The back end.
 | |
|   6. The target machine optimizer.
 | |
|   7. The universal assembler/linker.
 | |
|   8. The utility package.
 | |
| .sp
 | |
| .PP
 | |
| A fully optimizing compiler,
 | |
| depicted in Fig. 2, has seven cascaded phases.
 | |
| Conceptually, each component reads an input file and writes a
 | |
| transformed output file to be used as input to the next component.
 | |
| In practice, some components may use temporary files to allow multiple
 | |
| passes over the input or internal intermediate files.
 | |
| .KF
 | |
| .sp 12P
 | |
| .ce 1
 | |
| Fig. 2.  Structure of the Amsterdam Compiler Kit.
 | |
| .sp
 | |
| .KE
 | |
| .PP
 | |
| In the following paragraphs we will briefly describe each component.
 | |
| After this overview, we will look at all of them again in more detail.
 | |
| A program to be compiled is first fed into the (language independent)
 | |
| preprocessor, which provides a simple macro facility,
 | |
| and similar textual facilties.
 | |
| The preprocessor's output is a legal program in one of the programming
 | |
| languages supported, whereas the input is a program possibly augmented
 | |
| with macros, etc.
 | |
| .PP
 | |
| This output goes into the appropriate front end, whose job it is to
 | |
| produce intermediate code.
 | |
| This intermediate code (our UNCOL) is the machine language for a simple
 | |
| stack machine called EM (Encoding Machine).
 | |
| A typical front end might build a parse tree from the input, and then
 | |
| use the parse tree to generate EM code, which is similar to reverse Polish.
 | |
| In order to perform this work, the front end has to maintain tables of
 | |
| declared variables, labels, etc., determine where to place the
 | |
| data structures in memory, and so on.
 | |
| .PP
 | |
| The EM code generated by the front end is fed into the peephole optimizer,
 | |
| which scans it with a window of a few instructions, replacing certain
 | |
| inefficient code sequences by better ones.
 | |
| Such a search is important because EM contains instructions to handle
 | |
| numerous important special cases efficiently
 | |
| (e.g., incrementing a variable by 1).
 | |
| It is our strategy to relieve the front ends of the burden of hunting for
 | |
| special cases because there are many front ends and only one peephole
 | |
| optimizer.
 | |
| By handling the special cases in the peephole optimizer, 
 | |
| the front ends become simpler, easier to write and easier to maintain.
 | |
| .PP
 | |
| Following the peephole optimizer is a global optimizer [5], which
 | |
| unlike the peephole optimizer, examines the program as a whole.
 | |
| It builds a data flow graph to make possible a variety of 
 | |
| global optimizations,
 | |
| among them, moving invariant code out of loops, avoiding redundant
 | |
| computations, live/dead analysis and eliminating tail recursion.
 | |
| Note that the output of the global optimizer is still EM code.
 | |
| .PP
 | |
| Next comes the back end, which differs from the front ends in a
 | |
| fundamental way.
 | |
| Each front end is a separate program, whereas the back end is a single
 | |
| program that is driven by a machine dependent driving table.
 | |
| The driving table for a specific machine tells how the EM code is mapped
 | |
| onto the machine's assembly language.
 | |
| Although a simple driving table might just macro expand each EM instruction
 | |
| into a sequence of target machine instructions, a much more sophisticated
 | |
| translation strategy is normally used, as described later.
 | |
| For speed, the back end does not actually read in the driving table at run time.
 | |
| Instead, the tables are compiled along with the back end in advance, resulting
 | |
| in one binary program per machine.
 | |
| .PP
 | |
| The output of the back end is a program in the assembly language of some
 | |
| particular machine.
 | |
| The next component in the pipeline reads this program and performs peephole
 | |
| optimization on it.
 | |
| The optimizations performed here involve idiosyncracies
 | |
| of the target machine that cannot be performed in the machine-independent
 | |
| EM-to-EM peephole optimizer.
 | |
| Typically these optimizations take advantage of special instructions or special
 | |
| addressing modes.
 | |
| .PP
 | |
| The optimized target machine assembly code then goes into the final
 | |
| component in the pipeline, the universal assembler/linker.
 | |
| This program assembles the input to object format, extracting routines from
 | |
| libraries and including them as needed.
 | |
| .PP
 | |
| The final component of the tool kit is the utility package, which contains
 | |
| various test programs, interpreters for EM code, 
 | |
| EM libraries, conversion programs, and other aids for the implementer and
 | |
| user.
 | |
| .NH 1
 | |
| The Preprocessor
 | |
| .PP
 | |
| The function of the preprocessor is to extend all the programming languages
 | |
| by adding certain generally useful facilities to them in a uniform way.
 | |
| One of these is a simple macro system, in which the user can give names to
 | |
| character strings.
 | |
| The names can be used in the program, with the knowledge that they will be
 | |
| macro expanded prior to being input to the front end.
 | |
| Macros can be used for named constants, expanding short "procedures"
 | |
| in line, etc.
 | |
| .PP
 | |
| Another useful facility provided by the preprocessor is the ability to
 | |
| include compile-time libraries.
 | |
| On large projects, it is common to have all the declarations and definitions
 | |
| gathered together in a few files that are textually included in the programs
 | |
| by instructing the preprocessor to read them in, thus fooling the front end
 | |
| into thinking that they were part of the source program.
 | |
| .PP
 | |
| A third feature of the preprocessor is conditional compilation.
 | |
| The input program can be split up into labeled sections.
 | |
| By setting flags, some of the sections can be deleted by the preprocessor,
 | |
| thus allowing a family of slightly different programs to be conveniently stored
 | |
| on a single file.
 | |
| .NH 1
 | |
| The Front Ends
 | |
| .PP
 | |
| A front end is a program that converts input in some source language to a
 | |
| program in EM.
 | |
| At present, front ends 
 | |
| exist or are in preparation for Pascal, C, and Plain, and are being considered
 | |
| for Ada, ALGOL 68, FORTRAN 77, and Modula 2.
 | |
| Each of the present front ends is independent of all the other ones,
 | |
| although a general-purpose, table-driven front end is conceivable, provided
 | |
| one can devise a way to express the semantics of the source language in the
 | |
| driving tables.
 | |
| The Pascal front end uses a top-down parsing algorithm (recursive descent),
 | |
| whereas the C and Plain front ends are bottom-up.
 | |
| .PP
 | |
| All front ends, independent of the language being compiled,
 | |
| produce a common intermediate code called EM, which is
 | |
| the assembly language for a simple stack machine.
 | |
| The EM machine is based on a memory architecture
 | |
| containing a stack for local variables, a (static) data area for variables
 | |
| declared in the outermost block and global to the whole program, and a heap
 | |
| for dynamic data structures.
 | |
| In some ways EM resembles P-code [6], but is more general, since it is
 | |
| intended for a wider class of languages than just Pascal.
 | |
| .PP
 | |
| The EM instruction set has been described elsewhere
 | |
| [9,10,11]
 | |
| so we will only briefly summarize it here.
 | |
| Instructions exist to:
 | |
| .sp
 | |
|   1. Load a variable or constant of some length onto the stack.
 | |
|   2. Store the top item on the stack in memory.
 | |
|   3. Add, subtract, multiply, divide, etc. the top two stack items.
 | |
|   4. Examine the top one or two stack items and branch conditionally.
 | |
|   5. Call procedures and return from them.
 | |
| .sp
 | |
| .PP
 | |
| Loads and stores come in several variations, corresponding to the most common
 | |
| programming language semantics, for example, constants, simple variables,
 | |
| fields of a record, elements of an array, and so on.
 | |
| Distinctions are also made between variables local to the current block
 | |
| (i.e., stack frame), those in the outermost block (static storage), and those
 | |
| at intermediate lexicographic levels, which are accessed by following the
 | |
| static chain at run time.
 | |
| .PP
 | |
| All arithmetic instructions have a type (integer, unsigned, real,
 | |
| pointer, or set) and an
 | |
| operand length, which may either be explicit or may be popped from the stack
 | |
| at run time.
 | |
| Monadic branch instructions pop an item from the stack and branch if it is
 | |
| less than zero, less than or equal to zero, etc.
 | |
| Dyadic branch instructions pop two items, compare them, and branch accordingly.
 | |
| .PP
 | |
| In addition to these basic EM instructions, there is a collection of special
 | |
| purpose instructions (e.g., to increment a local variable), which are typically
 | |
| produced from the simple ones by the peephole optimizer.
 | |
| Although the complete EM instruction set contains nearly 150 instructions,
 | |
| only about 60 of them are really primitive; the rest are simply abbreviations
 | |
| for commonly occurring EM instruction sequences.
 | |
| .PP
 | |
| Of particular interest is the way object sizes are parametrized.
 | |
| The front ends allow the user to indicate how many bytes an integer, real, etc.
 | |
| should occupy.
 | |
| Given this information, the front ends can allocate memory, determining 
 | |
| the placement of variables within the stack frame.
 | |
| Sizes for primitive types are restricted to 8, 16, 32, 64, etc. bits.
 | |
| The front ends are also parametrized by the target machine's word length
 | |
| and address size so they can tell, for example, how many "load" instructions
 | |
| to generate to move a 32-bit integer.
 | |
| In the examples used henceforth,
 | |
| we will assume a 16-bit word size and 16-bit integers.
 | |
| .PP
 | |
| Since only byte-addressable target machines are permitted,
 | |
| it is nearly
 | |
| always possible to implement any requested sizes on any target machine.
 | |
| For example, the designer of the back end tables for the Z80 should provide
 | |
| code for 8-, 16-, and 32-bit arithmetic.
 | |
| In our view, the Pascal, C, or Plain programmer specifies what lengths 
 | |
| are needed,
 | |
| without reference to the target machine,
 | |
| and the back end provides it.
 | |
| This approach greatly enhances portability.
 | |
| While it is true that doing all arithmetic using 32-bit integers on the Z80
 | |
| will not be terribly fast, we feel that if that is what the programmer needs,
 | |
| it should be possible to implement it.
 | |
| .PP
 | |
| Like all assembly languages, EM has not only machine instructions, but also
 | |
| pseudoinstructions.
 | |
| These are used to indicate the start and end of each procedure, allocate
 | |
| and initialize storage for data, and similar functions.
 | |
| One particularly important pseudoinstruction is the one that is used to
 | |
| transmit information to the back end for optimization purposes.
 | |
| It can be used to suggest variables that are good candidates to assign to
 | |
| registers, delimit the scope of loops, indicate that certain variables 
 | |
| contain a useful value (next operation is a load) or not (next operation is
 | |
| a store), and various other things.
 | |
| .NH 1
 | |
| The Peephole Optimizer
 | |
| .PP
 | |
| The peephole optimizer reads in unoptimized EM programs and writes out
 | |
| optimized ones.
 | |
| Both the input and output are expressed in a highly compact code, rather than
 | |
| in ASCII, to reduce the i/o time, which would otherwise dominate the CPU
 | |
| time.
 | |
| The program itself is table driven, and is, by and large, ignorant of the
 | |
| semantics of EM.
 | |
| The knowledge of EM is contained in a
 | |
| language- and machine-independent table consisting of about 400
 | |
| pattern-replacement pairs.
 | |
| We will briefly describe the kinds of optimizations it performs below;
 | |
| a more complete discussion can be found in [9].
 | |
| .PP
 | |
| Each line in the driving table describes one optimization, consisting of a
 | |
| pattern part and a replacement part.
 | |
| The pattern part is a series of one or more EM instructions and a boolean
 | |
| expression.
 | |
| The replacement part is a series of EM instructions with operands.
 | |
| A typical optimization might be:
 | |
| .sp
 | |
|   LOL  LOC  ADI  STL  ($1 = $4) and ($2 = 1) and ($3 = 2) ==> INL $1
 | |
| .sp
 | |
| where the text prior to the ==> symbol is the pattern and the text after it is
 | |
| the replacement.
 | |
| LOL loads a local variable onto the stack, LOC loads a constant onto the stack,
 | |
| ADI is integer addition, and STL is store local.
 | |
| The pattern specifies that four consecutive EM instructions are present, with
 | |
| the indicated opcodes, and that furthermore the operand of the first 
 | |
| instruction (denoted by $1) and the fourth instruction (denoted by $4) are the
 | |
| same, the constant pushed by LOC is 1, and the size of the integers added by
 | |
| ADI is 2 bytes.
 | |
| (EM instructions have at most one operand, so it is not necessary to specify
 | |
| the operand number.)
 | |
| Under these conditions, the four instructions can be replaced by a single INL
 | |
| (increment local) instruction whose operand is equal to that of LOL.
 | |
| .PP
 | |
| Although the optimizations cover a wide range, the main ones
 | |
| can be roughly divided into the following categories.
 | |
| \fIConstant folding\fR
 | |
| is used to evaluate constant expressions, such as 2*3~+~7 at
 | |
| compile time instead of run time.
 | |
| \fIStrength reduction\fR
 | |
| is used to replace one operation, such as multiply, by
 | |
| another, such as shift.
 | |
| \fIReordering of expressions\fR
 | |
| helps in cases like -K/5, which can be better
 | |
| evaluated as K/-5, because the former requires
 | |
| a division and a negation, whereas the latter requires only a division.
 | |
| \fINull instructions\fR
 | |
| include resetting the stack pointer after a call with 0 parameters,
 | |
| offsetting zero bytes to access the
 | |
| first element of a record, or jumping to the next instruction.
 | |
| \fISpecial instructions\fR
 | |
| are those like INL, which deal with common special cases
 | |
| such as adding one to a variable or comparing something to zero.
 | |
| \fIGroup moves\fR
 | |
| are useful because a sequence
 | |
| of consecutive moves can often be replaced with EM code
 | |
| that allows the back end to generate a loop instead of in line code.
 | |
| \fIDead code elimination\fR
 | |
| is a technique for removing unreachable statements, possibly made unreachable
 | |
| by previous optimizations.
 | |
| \fIBranch chain compression\fR
 | |
| can be applied when a branch instruction jumps to another branch instruction.
 | |
| The first branch can jump directly to the final destination instead of
 | |
| indirectly.
 | |
| .PP
 | |
| The last two optimizations logically belong in the global optimizer but are
 | |
| in the local optimizer for historical reasons (meaning that the local
 | |
| optimizer has been the only optimizer for many years and the optimizations were
 | |
| easy to do there).
 | |
| .NH 1
 | |
| The Global Optimizer
 | |
| .PP
 | |
| In contrast to the peephole optimizer, which examines the EM code a few lines
 | |
| at a time through a small window, the global optimizer examines the 
 | |
| program's large scale structure.
 | |
| Three distinct types of optimizations can be found here:
 | |
| .sp
 | |
|   1. Interprocedural optimizations.
 | |
|   2. Intraprocedural optimizations.
 | |
|   3. Basic block optimizations.
 | |
| .sp
 | |
| We will now look at each of these in turn.
 | |
| .PP
 | |
| Interprocedural optimizations are those spanning procedure boundaries.
 | |
| The most important one is deciding to expand procedures in line,
 | |
| especially short procedures that occur in loops and pass several parameters.
 | |
| If it takes more time or memory to pass the parameters than to do the work,
 | |
| the program can be improved by eliminating the procedure.
 | |
| The inverse optimization -- discovering long common code sequences and
 | |
| turning them into a procedure -- is also possible, but much more difficult.
 | |
| Like much of the global optimizer's work, the decision to make or not make
 | |
| a certain program transformation is a heuristic one, based on knowledge of
 | |
| how the back end works, how most target machines are organized, etc.
 | |
| .PP
 | |
| The heart of the global optimizer is its analysis of individual
 | |
| procedures.
 | |
| To perform this analysis, the optimizer must locate the basic blocks,
 | |
| instruction sequences which can be entered only at the top and exited
 | |
| only at the bottom.
 | |
| It then constructs a data flow graph, with the basic blocks as nodes and
 | |
| jumps between blocks as arcs.
 | |
| .PP
 | |
| From the data flow graph, many important properties of the program can be
 | |
| discovered and exploited.
 | |
| Chief among these is the presence of loops, indicated by cycles in the graph.
 | |
| One important optimization is looking for code that can be moved outside the
 | |
| loop, either prior to it or subsequent to it.
 | |
| Such code motion saves execution time, although it does not save memory.
 | |
| Unrolling loops is also possible and desirable in some cases.
 | |
| .PP
 | |
| Another area in which global analysis of loops is especially important is
 | |
| in register allocation. 
 | |
| While it is true that EM does not have any registers to allocate,
 | |
| the optimizer can easily collect information to allow the
 | |
| back end to allocate registers wisely.
 | |
| For example, the global optimizer can collect static frequency-of-use
 | |
| and live/dead information about variables.
 | |
| (A variable is dead at some point in the program if its current value is
 | |
| not needed, i.e., the next reference to it overwrites it rather than
 | |
| reading it; if the current value will eventually be used, the variable is
 | |
| live.)
 | |
| If two variables are never simultaneously live over some interval of code
 | |
| (e.g., the body of a loop), they can be packed into a single variable,
 | |
| which, if used often enough, may warrant being assigned to a register.
 | |
| .PP
 | |
| Many loops involve arrays: this leads to other optimizations.
 | |
| If an array is accessed sequentially, with each iteration using the next
 | |
| higher numbered element, code improvement is often possible.
 | |
| Typically, a pointer to the bottom element of each array can be set up
 | |
| prior to the loop.
 | |
| Within the loop the element is accessed indirectly via the pointer, which is
 | |
| also incremented by the element size on each iteration.
 | |
| If the target machine has an autoincrement addressing mode and the pointer
 | |
| is assigned to a register, an array access can often be done in a single
 | |
| instruction.
 | |
| .PP
 | |
| Other intraprocedural optimizations include removing tail recursion
 | |
| (last statement is a recursive call to the procedure itself),
 | |
| topologically sorting the basic blocks to minimize the number of branch
 | |
| instructions, and common subexpression recognition.
 | |
| .PP
 | |
| The third general class of optimizations done by the global optimizer is
 | |
| improving the structure of a basic block.
 | |
| For the most part these involve transforming arithmetic or boolean
 | |
| expressions into forms that are likely to result in better target code.
 | |
| As a simple example, A~+~B*C can be converted to B*C~+~A.
 | |
| The latter can often
 | |
| be handled by loading B into a register, multiplying the register by C, and
 | |
| then adding in A, whereas the former may involve first putting A into a
 | |
| temporary, depending on the details of the code generation table.
 | |
| Another example of this kind of basic block optimization is transforming
 | |
| -B~+~A~<~0 into the equivalent, but simpler, A~<~B.
 | |
| .NH 1
 | |
| The Back End
 | |
| .PP
 | |
| The back end reads a stream of EM instructions and generates assembly code
 | |
| for the target machine.
 | |
| Although the algorithm itself is machine independent, for each target
 | |
| machine a machine dependent driving table must be supplied.
 | |
| The driving table effectively defines the mapping of EM code to target code.
 | |
| .PP
 | |
| It will be convenient to think of the EM instructions being read as a
 | |
| stream of tokens.
 | |
| For didactic purposes, we will concentrate on two kinds of tokens:
 | |
| those that load something onto the stack, and those that perform some operation
 | |
| on the top one or two values on the stack.
 | |
| The back end maintains at compile time a simulated stack whose behavior
 | |
| mirrors what the stack of a hardware EM machine would do at run time.
 | |
| If the current input token is a load instruction, a new entry is pushed onto
 | |
| the simulated stack.
 | |
| .PP
 | |
| Consider, as an example, the EM code produced for the statement K~:=~I~+~7.
 | |
| If K and I are
 | |
| 2-byte local variables, it will normally be LOL I; LOC 7; ADI~2; STL K.
 | |
| Initially the simulated stack is empty.
 | |
| After the first token has been read and processed, the simulated stack will
 | |
| contain a stack token of type MEM with attributes telling that it is a local,
 | |
| giving its address, etc.
 | |
| After the second token has been read and processed, the top two tokens on the
 | |
| simulated stack will be CON (constant) on top and MEM directly underneath it.
 | |
| .PP
 | |
| At this point the back end reads the ADI~2 token and
 | |
| looks in the driving table to find a line or lines that define the
 | |
| action to be taken for ADI~2.
 | |
| For a typical multiregister machine, instructions will exist to add constants
 | |
| to registers, but not to memory.
 | |
| Consequently, the driving table will not contain an entry for ADI~2 with stack
 | |
| configuration CON, MEM.
 | |
| .PP
 | |
| The back end is now faced with the problem of how to get from its
 | |
| current stack configuration, CON, MEM, which is not listed, to one that is
 | |
| listed.
 | |
| The table will normally contain rules (which we call "coercions")
 | |
| for converting between CON, REG, MEM, and similar tokens.
 | |
| Therefore the back end attempts to "coerce" the stack into a configuration
 | |
| that
 | |
| .I is
 | |
| present in the table.
 | |
| A typical coercion rule might tell how to convert a MEM into
 | |
| a REG, namely by performing the actions of allocating a
 | |
| register and emitting code to move the memory word to that register.
 | |
| Having transformed the compile-time stack into a configuration allowed for
 | |
| ADI~2, the rule can be carried out.
 | |
| A typical rule 
 | |
| for ADI~2 might have stack configuration REG, MEM
 | |
| and would emit code to add the MEM to the REG, leaving the stack
 | |
| with a single REG token instead of the REG and MEM tokens present before the
 | |
| ADI~2.
 | |
| .PP
 | |
| In general, there will be more than one possible coercion path.
 | |
| Assuming reasonable coercion rules for our example,
 | |
| we might be able to convert
 | |
| CON MEM into CON REG by loading the variable I into a register.
 | |
| Alternatively, we could coerce CON to REG by loading the constant into a register.
 | |
| The first coercion path does the add by first loading I into a register and
 | |
| then adding 7 to it.
 | |
| The second path first loads 7 into a register and then adds I to it.
 | |
| On machines with a fast LOAD IMMEDIATE instruction for small constants
 | |
| but no fast ADD IMMEDIATE, or vice
 | |
| versa, one code sequence will be preferable to the other.
 | |
| .PP
 | |
| In fact, we actually have more choices than suggested above.
 | |
| In both coercion paths a register must be allocated.
 | |
| On many machines, not every register can be used in every operation, so the
 | |
| choice may be important.
 | |
| On some machines, for example, the operand of a multiply must be in an odd
 | |
| register.
 | |
| To summarize, from any state (i.e., token and stack configuration), a
 | |
| variety of choices can be made, leading to a variety of different target
 | |
| code sequences.
 | |
| .PP
 | |
| To decide which of the various code sequences to emit, the back end must have
 | |
| some information about the time and memory cost of each one.
 | |
| To provide this information, each rule in the driving table, including
 | |
| coercions, specifies both the time and memory cost of the code emitted when
 | |
| the rule is applied.
 | |
| The back end can then simply try each of the legal possibilities (including all
 | |
| the possible register allocations) to find the cheapest one.
 | |
| .PP
 | |
| This situation is similar to that found in a chess or other game-playing
 | |
| program, in which from any state a finite number of moves can be made.
 | |
| Just as in a chess program, the back end can look at all the "moves" that can
 | |
| be made from each state reachable from the original state, and thus find the
 | |
| sequence that gives the minimum cost to a depth of one.
 | |
| More generally, the back end can evaluate all paths corresponding to accepting
 | |
| the next
 | |
| .I N
 | |
| input tokens, find the cheapest one, and then make the first move along
 | |
| that path, precisely the way a chess program would.
 | |
| .PP
 | |
| Since the back end is analogous to both a parser and a chess playing program,
 | |
| some clarifying remarks may be helpful.
 | |
| First, chess programs and the back end must do some look ahead, whereas the
 | |
| parser for a well-designed grammar can usually suffice with one input token
 | |
| because grammars are supposed to be unambiguous.
 | |
| In contrast, many legal mappings
 | |
| from a sequence of EM instructions to target code may exist.
 | |
| Second, like a parser but unlike a chess program, the back end has perfect
 | |
| information -- it does not have to contend with an unpredictable opponent's
 | |
| moves.
 | |
| Third, chess programs normally make a static evaluation of the board and
 | |
| label the
 | |
| .I nodes
 | |
| of the tree with the resulting scores.
 | |
| The back end, in contrast, associates costs with
 | |
| .I arcs
 | |
| (moves) rather than nodes (states).
 | |
| However, the difference is not essential, since it could 
 | |
| also label each node with the cumulative cost from the root to that node.
 | |
| .PP
 | |
| As mentioned above, the cost field in the table contains
 | |
| .I both
 | |
| the time and memory costs for the code emitted.
 | |
| It should be clear that the back end could use either one
 | |
| or some linear combination of them as the scoring function for evaluating moves.
 | |
| A user can instruct the compiler to optimize for time or for memory or
 | |
| for, say,  0.3 x time + 0.7 x memory.
 | |
| Thus the same compiler can provide a wide range of performance options to
 | |
| the user.
 | |
| The writer of the back end table can take advantage of this flexibility by
 | |
| providing several code sequences with different tradeoffs for each EM
 | |
| instruction (e.g., in line code vs. call to a run time routine).
 | |
| .PP
 | |
| In addition to the time-space tradeoffs, by specifying the depth of search
 | |
| parameter,
 | |
| .I N ,
 | |
| the user can effectively also tradeoff compile time vs. object
 | |
| code quality, for whatever code metric has been chosen.
 | |
| In summary, by combining the properties of a parser and a game playing program,
 | |
| it is possible to make a code generator that is table driven,
 | |
| highly flexible, and has the ability to produce good code from a
 | |
| stack machine intermediate code.
 | |
| .NH 1
 | |
| The Target Machine Optimizer
 | |
| .PP
 | |
| In the model of Fig 2., the peephole optimizer comes before the global
 | |
| optimizer.
 | |
| It may happen that the code produced by the global optimizer can also
 | |
| be improved by another round of peephole optimization.
 | |
| Conceivably, the system could have been designed to iterate peephole and
 | |
| global optimizations until no more of either could be performed.
 | |
| .PP
 | |
| However, both of these optimizations are done on the machine independent
 | |
| EM code.
 | |
| Neither is able to take advantage of the peculiarities and idiosyncracies with
 | |
| which most target machines are well endowed.
 | |
| It is the function of the final 
 | |
| optimizer to do any (peephole) optimizations that still remain.
 | |
| .PP
 | |
| The algorithm used here is the same as in the EM peephole optimizer.
 | |
| In fact, if it were not for the differences between EM syntax, which is
 | |
| very restricted, and target assembly language syntax,
 | |
| which is less so, precisely the same program could be used for both.
 | |
| Nevertheless, the same ideas apply concerning patterns and replacements, so
 | |
| our discussion of this optimizer will be restricted to one example.
 | |
| .PP
 | |
| To see what the target optimizer might do, consider the
 | |
| PDP-11 instruction sequence sub #2,r0;  mov (r0),x.
 | |
| First 2 is subtracted from register 0, then the word pointed to by it
 | |
| is moved to x.
 | |
| The PDP-11 happens to have an addressing mode to perform this sequence in
 | |
| one instruction: mov -(r0),x.
 | |
| Although it is conceivable that this instruction could be included in the
 | |
| back end driving table for the PDP-11, it is awkward to do so because it
 | |
| can occur in so many contexts.
 | |
| It is much easier to catch things like this in a separate program.
 | |
| .NH 1
 | |
| The Universal Assembler/Linker
 | |
| .PP
 | |
| Although assembly languages for different machines may appear very different
 | |
| at first glance, they have a surprisingly large intersection.
 | |
| We have been able to construct an assembler/linker that is almost entirely
 | |
| independent of the assembly language being processed.
 | |
| To tailor the program to a specific assembly language, it is necessary to
 | |
| supply a table giving the list of instructions, the bit patterns required for
 | |
| each one, and the language syntax.
 | |
| The machine independent part of the assembler/linker is then compiled with the
 | |
| table to produce an assembler and linker for a particular target machine.
 | |
| Experience has shown that writing the necessary table for a new machine can be
 | |
| done in less than a week.
 | |
| .PP
 | |
| To enforce a modicum of uniformity, we have chosen to use a common set of
 | |
| pseudoinstructions for all target machines.
 | |
| They are used to initialize memory, allocate uninitialized memory, determine the
 | |
| current segment, and similar functions found in most assemblers.
 | |
| .PP
 | |
| The assembler is also a linker.
 | |
| After assembling a program, it checks to see if there are any
 | |
| unsatisfied external references.
 | |
| If so, it begins reading the libraries to find the necessary routines, including
 | |
| them in the object file as it finds them.
 | |
| This approach requires libraries to be maintained in assembly language form,
 | |
| but eliminates the need for inventing a language to express relocatable
 | |
| object programs in a machine independent way.
 | |
| It also simplifies the assembler, since producing absolute object code is
 | |
| easier than producing relocatable object code.
 | |
| Finally, although assembly language libraries may be somewhat larger than
 | |
| relocatable object module libraries, the loss in speed due to having more
 | |
| input may be more than compensated for by not having to pass an intermediate
 | |
| file between the assembler and linker.
 | |
| .NH 1
 | |
| The Utility Package
 | |
| .PP
 | |
| The utility package is a collection of programs designed to aid the
 | |
| implementers of new front ends or new back ends.
 | |
| The most useful ones are the test programs.
 | |
| For example, one test set, EMTEST, systematically checks out a back end by
 | |
| executing an ever larger subset of the EM instructions.
 | |
| It starts out by testing LOC, LOL and a few of the other essential instructions.
 | |
| If these appear to work, it then tries out new instructions one at a time,
 | |
| adding them to the set of instructions "known" to work as they pass the tests.
 | |
| .PP
 | |
| Each instruction is tested with a variety of operands chosen from values 
 | |
| where problems can be expected.
 | |
| For example, on target machines which have 16-bit index registers but only
 | |
| allow 8-bit displacements, a fundamentally different algorithm may be needed
 | |
| for accessing
 | |
| the first few bytes of local variables and those with offsets of thousands.
 | |
| The test programs have been carefully designed to thoroughly test all relevant
 | |
| cases.
 | |
| .PP
 | |
| In addition to EMTEST, test programs in Pascal, C, and other languages are also
 | |
| available.
 | |
| A typical test is:
 | |
| .sp
 | |
|    i := 9; \fBif\fP i + 250 <> 259 \fBthen\fP error(16);
 | |
| .sp
 | |
| Like EMTEST, the other test programs systematically exercise all features of the
 | |
| language being tested, and do so in a way that makes it possible to pinpoint
 | |
| errors precisely.
 | |
| While it has been said that testing can only demonstrate the presence of errors
 | |
| and not their absence, our experience is that 
 | |
| the test programs have been invaluable in debugging new parts of the system
 | |
| quickly.
 | |
| .PP
 | |
| Other utilities include programs to convert
 | |
| the highly compact EM code produced by front ends to ASCII and vice versa,
 | |
| programs to build various internal tables from human writable input formats,
 | |
| a variety of libraries written in or compiled to EM to make them portable,
 | |
| an EM assembler, and EM interpreters for various machines.
 | |
| .PP
 | |
| Interpreting the EM code instead of translating it to target machine language
 | |
| is useful for several reasons.
 | |
| First, the interpreters provide extensive run time diagnostics including
 | |
| an option to list the original source program (in Pascal, C, etc.) with the
 | |
| execution frequency or execution time for each source line printed in the
 | |
| left margin.
 | |
| Second, since an EM program is typically about one-third the size of a
 | |
| compiled program, large programs can be executed on small machines.
 | |
| Third, running the EM code directly makes it easier to pinpoint errors in 
 | |
| the EM output of front ends still being debugged.
 | |
| .NH 1
 | |
| Summary and Conclusions
 | |
| .PP
 | |
| The Amsterdam Compiler Kit is a tool kit for building
 | |
| portable (cross) compilers and interpreters.
 | |
| The main pieces of the kit are the front ends, which convert source programs
 | |
| to EM code, optimizers, which improve the EM code, and back ends, which convert
 | |
| the EM code to target assembly language.
 | |
| The kit is highly modular, so writing one front end
 | |
| (and its associated runtime routines)
 | |
| is sufficient to implement
 | |
| a new language on a dozen or more machines, and writing one back end table
 | |
| and one universal assembler/linker table is all that is needed to bring up all
 | |
| the previously implemented languages on a new machine.
 | |
| In this manner, the contents, and hopefully the usefulness, of the toolkit
 | |
| will increase in time.
 | |
| .PP
 | |
| We believe the principal lesson to be learned from our work is that the old
 | |
| UNCOL idea is basically a sound way to produce compilers, provided suitable
 | |
| restrictions are placed on the source languages and target machines.
 | |
| We also believe that although compilers produced by this technology may not
 | |
| be equal to the very best handcrafted compilers,
 | |
| in terms of object code quality, they are certainly
 | |
| competitive with many existing compilers.
 | |
| However, when one factors in the cost of producing the compiler,
 | |
| the possible slight loss in performance may be more than compensated for by the
 | |
| large decrease in production cost.
 | |
| As a consequence of our work and similar work by other researchers [1,3,4],
 | |
| we expect integrated compiler building kits to become increasingly popular
 | |
| in the near future.
 | |
| .PP
 | |
| The toolkit is now available for various computers running the
 | |
| .UX
 | |
| operating system.
 | |
| For information, contact the authors.
 | |
| .NH 1
 | |
| References
 | |
| .LP
 | |
| .nr r 0 1
 | |
| .in +4
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Graham, S.L.
 | |
| Table-Driven Code Generation.
 | |
| .I "Computer~13" ,
 | |
| 8 (August 1980), 25-34.
 | |
| .PP
 | |
| A discussion of systematic ways to do code generation,
 | |
| in particular, the idea of having a table with templates that match parts of
 | |
| the parse tree and convert them into machine instructions.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Haddon, B.K., and Waite, W.M.
 | |
| Experience with the Universal Intermediate Language Janus.
 | |
| .I "Software Practice & Experience~8" ,
 | |
| 5 (Sept.-Oct. 1978), 601-616.
 | |
| .PP
 | |
| An intermediate language for use with ALGOL 68, Pascal, etc. is described.
 | |
| The paper discusses some problems encountered and how they were dealt with.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Johnson, S.C.
 | |
| A Portable Compiler: Theory and Practice.
 | |
| .I "Ann. ACM Symp. Prin. Prog. Lang." ,
 | |
| Jan. 1978.
 | |
| .PP
 | |
| A cogent discussion of the portable C compiler.
 | |
| Particularly interesting are the author's thoughts on the value of
 | |
| computer science theory.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Leverett, B.W., Cattell, R.G.G, Hobbs, S.O., Newcomer, J.M.,
 | |
| Reiner, A.H., Schatz, B.R., and Wulf, W.A.
 | |
| An Overview of the Production-Quality Compiler-Compiler Project.
 | |
| .I Computer~13 ,
 | |
| 8 (August 1980), 38-49.
 | |
| .PP
 | |
| PQCC is a system for building compilers similar in concept but differing in
 | |
| details from the Amsterdam Compiler Kit.
 | |
| The paper describes the intermediate representation used and the code generation
 | |
| strategy.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Lowry, E.S., and Medlock, C.W.
 | |
| Object Code Optimization.
 | |
| .I "Commun.~ACM~12",
 | |
| (Jan. 1969), 13-22.
 | |
| .PP
 | |
| A classic paper on global object code optimization.
 | |
| It covers data flow analysis, common subexpressions, code motion, register
 | |
| allocation and other techniques.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Nori, K.V., Ammann, U., Jensen, K., Nageli, H.
 | |
| The Pascal P Compiler Implementation Notes.
 | |
| Eidgen. Tech. Hochschule, Zurich, 1975.
 | |
| .PP
 | |
| A description of the original P-code machine, used to transport the Pascal-P
 | |
| compiler to new computers.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Steel, T.B., Jr. UNCOL: the Myth and the Fact. in
 | |
| .I "Ann. Rev. Auto. Prog."
 | |
| Goodman, R. (ed.), vol 2., (1960), 325-344.
 | |
| .PP
 | |
| An introduction to the UNCOL idea by its originator.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Steel, T.B., Jr.
 | |
| A First Version of UNCOL.
 | |
| .I "Proc. Western Joint Comp. Conf." ,
 | |
| (1961), 371-377.
 | |
| .PP
 | |
| The first detailed proposal for an UNCOL.  By current standards it is a
 | |
| primitive language, but it is interesting for its historical perspective.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB~\n+r.\fR Tanenbaum, A.S., van Staveren, H., and Stevenson, J.W.
 | |
| Using Peephole Optimization on Intermediate Code.
 | |
| .I "ACM Trans. Prog. Lang. and Sys. 3" ,
 | |
| 1 (Jan. 1982) pp. 21-36.
 | |
| .PP
 | |
| A detailed description of a table-driven peephole optimizer.
 | |
| The driving table provides a list of patterns to match as well as the
 | |
| replacement text to use for each successful match.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB\n+r.\fR Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
 | |
| Description of an Experimental Machine Architecture for use with Block
 | |
| Structured Languages.
 | |
| Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
 | |
| .PP
 | |
| The defining document for EM.
 | |
| .sp 2
 | |
| .ti -4
 | |
| \fB\n+r.\fR Tanenbaum, A.S.
 | |
| Implications of Structured Programming for Machine Architecture.
 | |
| .I "Comm. ACM~21" ,
 | |
| 3 (March 1978), 237-246.
 | |
| .PP
 | |
| The background and motivation for the design of EM.
 | |
| This early version emphasized the idea of interpreting the intermediate
 | |
| code (then called EM-1) rather than compiling it.
 |