374 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			374 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.bp
 | 
						|
.NH 1
 | 
						|
Overview of the global optimizer
 | 
						|
.NH 2
 | 
						|
The ACK compilation process
 | 
						|
.PP
 | 
						|
The EM Global Optimizer is one of three optimizers that are
 | 
						|
part of the Amsterdam Compiler Kit (ACK).
 | 
						|
The phases of ACK are:
 | 
						|
.IP 1.
 | 
						|
A Front End translates a source program to EM
 | 
						|
.IP 2.
 | 
						|
The Peephole Optimizer
 | 
						|
.[
 | 
						|
tanenbaum staveren peephole toplass
 | 
						|
.]
 | 
						|
reads EM code and produces 'better' EM code.
 | 
						|
It performs a number of optimizations (mostly peephole
 | 
						|
optimizations)
 | 
						|
such as constant folding, strength reduction and unreachable code
 | 
						|
elimination.
 | 
						|
.IP 3.
 | 
						|
The Global Optimizer further improves the EM code.
 | 
						|
.IP 4.
 | 
						|
The Code Generator transforms EM to assembly code
 | 
						|
of the target computer.
 | 
						|
.IP 5.
 | 
						|
The Target Optimizer improves the assembly code.
 | 
						|
.IP 6.
 | 
						|
An Assembler/Loader generates an executable file.
 | 
						|
.LP
 | 
						|
For a more extensive overview of the ACK compilation process,
 | 
						|
we refer to.
 | 
						|
.[
 | 
						|
tanenbaum toolkit rapport
 | 
						|
.]
 | 
						|
.[
 | 
						|
tanenbaum toolkit cacm
 | 
						|
.]
 | 
						|
.PP
 | 
						|
The input of the Global Optimizer may consist of files and
 | 
						|
libraries.
 | 
						|
Every file or module in the library must contain EM code in
 | 
						|
Compact Assembly Language format.
 | 
						|
.[~[
 | 
						|
tanenbaum machine architecture
 | 
						|
.], section 11.2]
 | 
						|
The output consists of one such EM file.
 | 
						|
The input files and libraries together need not
 | 
						|
constitute an entire program,
 | 
						|
although as much of the program as possible should be supplied.
 | 
						|
The more information about the program the optimizer 
 | 
						|
gets, the better its output code will be.
 | 
						|
.PP
 | 
						|
The Global Optimizer is language- and machine-independent,
 | 
						|
i.e. it can be used for all languages and machines supported by ACK.
 | 
						|
Yet, it puts some unavoidable restrictions on the EM code
 | 
						|
produced by the Front End (see below).
 | 
						|
It must have some knowledge of the target machine.
 | 
						|
This knowledge is expressed in a machine description table
 | 
						|
which is passed as argument to the optimizer.
 | 
						|
This table does not contain very detailed information about the
 | 
						|
target (such as its instruction set and addressing modes).
 | 
						|
.NH 2
 | 
						|
The EM code
 | 
						|
.PP
 | 
						|
The definition of EM, the intermediate code of all ACK compilers,
 | 
						|
is given in a separate document.
 | 
						|
.[
 | 
						|
tanenbaum machine architecture
 | 
						|
.]
 | 
						|
We will only discuss some features of EM that are most relevant
 | 
						|
to the Global Optimizer.
 | 
						|
.PP
 | 
						|
EM is the assembly code of a virtual \fIstack machine\fR.
 | 
						|
All operations are performed on the top of the stack.
 | 
						|
For example, the statement "A := B + 3" may be expressed in EM as:
 | 
						|
.DS
 | 
						|
.TS
 | 
						|
l l.
 | 
						|
LOL -4	-- push local variable B
 | 
						|
LOC 3	-- push constant 3
 | 
						|
ADI 2	-- add two 2-byte items on top of
 | 
						|
	-- the stack and push the result
 | 
						|
STL -2	-- pop A
 | 
						|
.TE
 | 
						|
.DE
 | 
						|
So EM is essentially a \fIpostfix\fR code.
 | 
						|
.PP
 | 
						|
EM has a rich instruction set, containing several arithmetic
 | 
						|
and logical operators.
 | 
						|
It also contains special-case instructions (such as INCrement).
 | 
						|
.PP
 | 
						|
EM has \fIglobal\fR (\fIexternal\fR) variables, accessible
 | 
						|
by all procedures and \fIlocal\fR variables, accessible by a few
 | 
						|
(nested) procedures.
 | 
						|
The local variables of a lexically enclosing procedure may
 | 
						|
be accessed via a \fIstatic link\fR. 
 | 
						|
EM has instructions to follow the static chain.
 | 
						|
There are EM instruction to allow a procedure
 | 
						|
to access its local variables directly (such as LOL and STL above).
 | 
						|
Local variables are referenced via an offset in the stack frame
 | 
						|
of the procedure, rather than by their names (e.g. -2 and -4 above).
 | 
						|
The EM code does not contain the (source language) type
 | 
						|
of the variables.
 | 
						|
.PP
 | 
						|
All structured statements in the source program are expressed in
 | 
						|
low level jump instructions.
 | 
						|
Besides conditional and unconditional branch instructions, there are 
 | 
						|
two case instructions (CSA and CSB),
 | 
						|
to allow efficient translation of case statements.
 | 
						|
.NH 2
 | 
						|
Requirements on the EM input
 | 
						|
.PP
 | 
						|
As the optimizer should be useful for all languages,
 | 
						|
it clearly should not put severe restrictions on the EM code
 | 
						|
of the input.
 | 
						|
There is, however, one immovable requirement:
 | 
						|
it must be possible to determine the \fIflow of control\fR of the
 | 
						|
input program.
 | 
						|
As virtually all global optimizations are based on control flow information,
 | 
						|
the optimizer would be totally powerless without it.
 | 
						|
For this reason we restrict the usage of the case jump instructions (CSA/CSB)
 | 
						|
of EM.
 | 
						|
Such an instruction is always called with the address of a case descriptor
 | 
						|
on top the the stack.
 | 
						|
.[~[
 | 
						|
tanenbaum machine architecture
 | 
						|
.] section 7.4]
 | 
						|
This descriptor contains the labels of all possible
 | 
						|
destinations of the jump.
 | 
						|
We demand that all case descriptors are allocated in a global
 | 
						|
data fragment of type ROM, i.e. the case descriptors
 | 
						|
may not be modifyable.
 | 
						|
Furthermore, any case instruction should be immediately preceded by
 | 
						|
a LAE (Load Address External) instruction, that loads the
 | 
						|
address of the descriptor,
 | 
						|
so the descriptor can be uniquely identified.
 | 
						|
.PP
 | 
						|
The optimizer will work improperly if the user deceives the control flow.
 | 
						|
We will give two methods to do this.
 | 
						|
.PP
 | 
						|
In "C" the notorious library routines "setjmp" and "longjmp"
 | 
						|
.[
 | 
						|
unix programmer's manual McIlroy
 | 
						|
.]
 | 
						|
may be used to jump out of a procedure,
 | 
						|
but can also be used for a number of other stuffy purposes,
 | 
						|
for example, to create an extra entry point in a loop.
 | 
						|
.DS
 | 
						|
 while (condition) {
 | 
						|
	 ....
 | 
						|
	 setjmp(buf);
 | 
						|
	 ...
 | 
						|
 }
 | 
						|
 ...
 | 
						|
 longjmp(buf);
 | 
						|
.DE
 | 
						|
The invocation to longjmp actually is a jump to the place of
 | 
						|
the last call to setjmp with the same argument (buf).
 | 
						|
As the calls to setjmp and longjmp are indistinguishable from
 | 
						|
normal procedure calls, the optimizer will not see the danger.
 | 
						|
No need to say that several loop optimizations will behave
 | 
						|
unexpectedly when presented with such pathological input.
 | 
						|
.PP
 | 
						|
Another way to deceive the flow of control is
 | 
						|
by using exception handling routines.
 | 
						|
Ada*
 | 
						|
.FS
 | 
						|
* Ada is a registered trademark of the U.S. Government
 | 
						|
(Ada Joint Program Office).
 | 
						|
.FE
 | 
						|
has clearly recognized the dangers of exception handling,
 | 
						|
but other languages (such as PL/I) have not.
 | 
						|
.[
 | 
						|
ada rationale
 | 
						|
.]
 | 
						|
.PP
 | 
						|
The optimizer will be more effective if the EM input contains
 | 
						|
some extra information about the source program.
 | 
						|
Especially the \fIregister message\fR is very important.
 | 
						|
These messages indicate which local variables may never be
 | 
						|
accessed indirectly.
 | 
						|
Most optimizations benefit significantly by this information.
 | 
						|
.PP
 | 
						|
The Inline Substitution technique needs to know how many bytes
 | 
						|
of formal parameters every procedure accesses.
 | 
						|
Only calls to procedures for which the EM code contains this information
 | 
						|
will be substituted in line.
 | 
						|
.NH 2
 | 
						|
Structure of the optimizer
 | 
						|
.PP
 | 
						|
The Global Optimizer is organized as a number of \fIphases\fR,
 | 
						|
each one performing some task.
 | 
						|
The main structure is as follows:
 | 
						|
.IP IC 6
 | 
						|
the Intermediate Code construction phase transforms EM into the
 | 
						|
intermediate code (ic) of the optimizer
 | 
						|
.IP CF
 | 
						|
the Control Flow phase extends the ic with control flow
 | 
						|
information and interprocedural information
 | 
						|
.IP OPTs
 | 
						|
zero or more optimization phases, each one performing one or
 | 
						|
more related optimizations
 | 
						|
.IP CA
 | 
						|
the Compact Assembly phase generates Compact Assembly Language EM code
 | 
						|
out of ic.
 | 
						|
.LP
 | 
						|
.PP
 | 
						|
An important issue in the design of a global optimizer is the
 | 
						|
interaction between optimization techniques.
 | 
						|
It is often advantageous to combine several techniques in
 | 
						|
one algorithm that takes into account all interactions between them.
 | 
						|
Ideally, one single algorithm should be developed that does
 | 
						|
all optimizations simultaneously and deals with all possible interactions.
 | 
						|
In practice, such an algorithm is still far out of  reach.
 | 
						|
Instead some rather ad hoc (albeit important) combinations are chosen,
 | 
						|
such as Common Subexpression Elimination and Register Allocation.
 | 
						|
.[
 | 
						|
prabhala sethi common subexpressions
 | 
						|
.]
 | 
						|
.[
 | 
						|
sethi ullman optimal code
 | 
						|
.]
 | 
						|
.PP
 | 
						|
In the Em Global Optimizer there is one separate algorithm for
 | 
						|
every technique.
 | 
						|
Note that this does not mean that all techniques are independent
 | 
						|
of each other.
 | 
						|
.PP
 | 
						|
In principle, the optimization phases can be run in any order;
 | 
						|
a phase may even be run more than once.
 | 
						|
However, the following rules should be obeyed:
 | 
						|
.IP -
 | 
						|
the Live Variable analysis phase (LV) must be run prior to
 | 
						|
Register Allocation (RA), as RA uses information outputted by LV.
 | 
						|
.IP -
 | 
						|
RA should be the last phase; this is a consequence of the way
 | 
						|
the interface between RA and the Code Generator is defined.
 | 
						|
.LP
 | 
						|
The ordering of the phases has significant impact on
 | 
						|
the quality of the produced code.
 | 
						|
In
 | 
						|
.[
 | 
						|
wulf overview production quality carnegie-mellon
 | 
						|
.]
 | 
						|
two kinds of phase ordering problems are distinguished.
 | 
						|
If two techniques A and B both take away opportunities of each other,
 | 
						|
there is a "negative" ordering problem.
 | 
						|
If, on the other hand, both A and B introduce new optimization
 | 
						|
opportunities for each other, the problem is called "positive".
 | 
						|
In the Global Optimizer the following interactions must be
 | 
						|
taken into account:
 | 
						|
.IP -
 | 
						|
Inline Substitution (IL) may create new opportunities for most
 | 
						|
other techniques, so it should be run as early as possible
 | 
						|
.IP -
 | 
						|
Use Definition analysis (UD) may introduce opportunities for LV.
 | 
						|
.IP -
 | 
						|
Strength Reduction may create opportunities for UD
 | 
						|
.LP
 | 
						|
The optimizer has a default phase ordering, which can
 | 
						|
be changed by the user.
 | 
						|
.NH 2
 | 
						|
Structure of this document
 | 
						|
.PP
 | 
						|
The remaining chapters of this document each describe one
 | 
						|
phase of the optimizer.
 | 
						|
For every phase, we describe its task, its design,
 | 
						|
its implementation, and its source files.
 | 
						|
The latter two sections are intended to aid the
 | 
						|
maintenance of the optimizer and
 | 
						|
can be skipped by the initial reader.
 | 
						|
.NH 2
 | 
						|
References
 | 
						|
.PP
 | 
						|
There are very 
 | 
						|
few modern textbooks on optimization.
 | 
						|
Chapters 12, 13, and 14 of
 | 
						|
.[
 | 
						|
aho compiler design
 | 
						|
.]
 | 
						|
are a good introduction to the subject.
 | 
						|
Wulf et. al.
 | 
						|
.[
 | 
						|
wulf optimizing compiler
 | 
						|
.]
 | 
						|
describe one specific optimizing (Bliss) compiler.
 | 
						|
Anklam et. al.
 | 
						|
.[
 | 
						|
anklam vax-11
 | 
						|
.]
 | 
						|
discuss code generation and optimization in
 | 
						|
compilers for one specific machine (a Vax-11).
 | 
						|
Kirchgaesner et. al. 
 | 
						|
.[
 | 
						|
optimizing ada compiler
 | 
						|
.]
 | 
						|
present a brief description of many
 | 
						|
optimizations; the report also contains a lengthy (over 60 pages)
 | 
						|
bibliography.
 | 
						|
.PP
 | 
						|
The number of articles on optimization is quite impressive.
 | 
						|
The Lowry and Medlock paper on the Fortran H compiler
 | 
						|
.[
 | 
						|
object code optimization Lowry Medlock
 | 
						|
.]
 | 
						|
is a classical one.
 | 
						|
Other papers on global optimization are.
 | 
						|
.[
 | 
						|
faiman optimizing pascal
 | 
						|
.]
 | 
						|
.[
 | 
						|
perkins sites
 | 
						|
.]
 | 
						|
.[
 | 
						|
harrison general purpose optimizing
 | 
						|
.]
 | 
						|
.[
 | 
						|
morel partial redundancies
 | 
						|
.]
 | 
						|
.[
 | 
						|
Mintz global optimizer
 | 
						|
.]
 | 
						|
Freudenberger
 | 
						|
.[
 | 
						|
freudenberger setl optimizer
 | 
						|
.]
 | 
						|
describes an optimizer for a Very High Level Language (SETL).
 | 
						|
The Production-Quality Compiler-Compiler (PQCC) project uses
 | 
						|
very sophisticated compiler techniques, as described in.
 | 
						|
.[
 | 
						|
wulf overview ieee
 | 
						|
.]
 | 
						|
.[
 | 
						|
wulf overview carnegie-mellon
 | 
						|
.]
 | 
						|
.[
 | 
						|
wulf machine-relative
 | 
						|
.]
 | 
						|
.PP
 | 
						|
Several Ph.D. theses are dedicated to optimization.
 | 
						|
Davidson
 | 
						|
.[
 | 
						|
davidson simplifying
 | 
						|
.]
 | 
						|
outlines a machine-independent peephole optimizer that
 | 
						|
improves assembly code.
 | 
						|
Katkus
 | 
						|
.[
 | 
						|
katkus
 | 
						|
.]
 | 
						|
describes how efficient programs can be obtained at little cost by
 | 
						|
optimizing only a small part of a program.
 | 
						|
Photopoulos
 | 
						|
.[
 | 
						|
photopoulos mixed code
 | 
						|
.]
 | 
						|
discusses the idea of generating interpreted intermediate code as well
 | 
						|
as assembly code, to obtain programs that are both small and  fast.
 | 
						|
Shaffer
 | 
						|
.[
 | 
						|
shaffer automatic
 | 
						|
.]
 | 
						|
describes the theory of automatic subroutine generation.
 | 
						|
.]
 | 
						|
Leverett
 | 
						|
.[
 | 
						|
leverett register allocation compilers
 | 
						|
.]
 | 
						|
deals with register allocation in the PQCC compilers.
 | 
						|
.PP
 | 
						|
References to articles about specific optimization techniques
 | 
						|
will be given in later chapters.
 |