374 lines
11 KiB
Text
374 lines
11 KiB
Text
.bp
|
|
.NH 1
|
|
Overview of the global optimizer
|
|
.NH 2
|
|
The ACK compilation process
|
|
.PP
|
|
The EM Global Optimizer is one of three optimizers that are
|
|
part of the Amsterdam Compiler Kit (ACK).
|
|
The phases of ACK are:
|
|
.IP 1.
|
|
A Front End translates a source program to EM
|
|
.IP 2.
|
|
The Peephole Optimizer
|
|
.[
|
|
tanenbaum staveren peephole toplass
|
|
.]
|
|
reads EM code and produces 'better' EM code.
|
|
It performs a number of optimizations (mostly peephole
|
|
optimizations)
|
|
such as constant folding, strength reduction and unreachable code
|
|
elimination.
|
|
.IP 3.
|
|
The Global Optimizer further improves the EM code.
|
|
.IP 4.
|
|
The Code Generator transforms EM to assembly code
|
|
of the target computer.
|
|
.IP 5.
|
|
The Target Optimizer improves the assembly code.
|
|
.IP 6.
|
|
An Assembler/Loader generates an executable file.
|
|
.LP
|
|
For a more extensive overview of the ACK compilation process,
|
|
we refer to.
|
|
.[
|
|
tanenbaum toolkit rapport
|
|
.]
|
|
.[
|
|
tanenbaum toolkit cacm
|
|
.]
|
|
.PP
|
|
The input of the Global Optimizer may consist of files and
|
|
libraries.
|
|
Every file or module in the library must contain EM code in
|
|
Compact Assembly Language format.
|
|
.[~[
|
|
tanenbaum machine architecture
|
|
.], section 11.2]
|
|
The output consists of one such EM file.
|
|
The input files and libraries together need not
|
|
constitute an entire program,
|
|
although as much of the program as possible should be supplied.
|
|
The more information about the program the optimizer
|
|
gets, the better its output code will be.
|
|
.PP
|
|
The Global Optimizer is language- and machine-independent,
|
|
i.e. it can be used for all languages and machines supported by ACK.
|
|
Yet, it puts some unavoidable restrictions on the EM code
|
|
produced by the Front End (see below).
|
|
It must have some knowledge of the target machine.
|
|
This knowledge is expressed in a machine description table
|
|
which is passed as argument to the optimizer.
|
|
This table does not contain very detailed information about the
|
|
target (such as its instruction set and addressing modes).
|
|
.NH 2
|
|
The EM code
|
|
.PP
|
|
The definition of EM, the intermediate code of all ACK compilers,
|
|
is given in a separate document.
|
|
.[
|
|
tanenbaum machine architecture
|
|
.]
|
|
We will only discuss some features of EM that are most relevant
|
|
to the Global Optimizer.
|
|
.PP
|
|
EM is the assembly code of a virtual \fIstack machine\fR.
|
|
All operations are performed on the top of the stack.
|
|
For example, the statement "A := B + 3" may be expressed in EM as:
|
|
.DS
|
|
.TS
|
|
l l.
|
|
LOL -4 -- push local variable B
|
|
LOC 3 -- push constant 3
|
|
ADI 2 -- add two 2-byte items on top of
|
|
-- the stack and push the result
|
|
STL -2 -- pop A
|
|
.TE
|
|
.DE
|
|
So EM is essentially a \fIpostfix\fR code.
|
|
.PP
|
|
EM has a rich instruction set, containing several arithmetic
|
|
and logical operators.
|
|
It also contains special-case instructions (such as INCrement).
|
|
.PP
|
|
EM has \fIglobal\fR (\fIexternal\fR) variables, accessible
|
|
by all procedures and \fIlocal\fR variables, accessible by a few
|
|
(nested) procedures.
|
|
The local variables of a lexically enclosing procedure may
|
|
be accessed via a \fIstatic link\fR.
|
|
EM has instructions to follow the static chain.
|
|
There are EM instruction to allow a procedure
|
|
to access its local variables directly (such as LOL and STL above).
|
|
Local variables are referenced via an offset in the stack frame
|
|
of the procedure, rather than by their names (e.g. -2 and -4 above).
|
|
The EM code does not contain the (source language) type
|
|
of the variables.
|
|
.PP
|
|
All structured statements in the source program are expressed in
|
|
low level jump instructions.
|
|
Besides conditional and unconditional branch instructions, there are
|
|
two case instructions (CSA and CSB),
|
|
to allow efficient translation of case statements.
|
|
.NH 2
|
|
Requirements on the EM input
|
|
.PP
|
|
As the optimizer should be useful for all languages,
|
|
it clearly should not put severe restrictions on the EM code
|
|
of the input.
|
|
There is, however, one immovable requirement:
|
|
it must be possible to determine the \fIflow of control\fR of the
|
|
input program.
|
|
As virtually all global optimizations are based on control flow information,
|
|
the optimizer would be totally powerless without it.
|
|
For this reason we restrict the usage of the case jump instructions (CSA/CSB)
|
|
of EM.
|
|
Such an instruction is always called with the address of a case descriptor
|
|
on top the the stack.
|
|
.[~[
|
|
tanenbaum machine architecture
|
|
.] section 7.4]
|
|
This descriptor contains the labels of all possible
|
|
destinations of the jump.
|
|
We demand that all case descriptors are allocated in a global
|
|
data fragment of type ROM, i.e. the case descriptors
|
|
may not be modifyable.
|
|
Furthermore, any case instruction should be immediately preceded by
|
|
a LAE (Load Address External) instruction, that loads the
|
|
address of the descriptor,
|
|
so the descriptor can be uniquely identified.
|
|
.PP
|
|
The optimizer will work improperly if the user deceives the control flow.
|
|
We will give two methods to do this.
|
|
.PP
|
|
In "C" the notorious library routines "setjmp" and "longjmp"
|
|
.[
|
|
unix programmer's manual McIlroy
|
|
.]
|
|
may be used to jump out of a procedure,
|
|
but can also be used for a number of other stuffy purposes,
|
|
for example, to create an extra entry point in a loop.
|
|
.DS
|
|
while (condition) {
|
|
....
|
|
setjmp(buf);
|
|
...
|
|
}
|
|
...
|
|
longjmp(buf);
|
|
.DE
|
|
The invocation to longjmp actually is a jump to the place of
|
|
the last call to setjmp with the same argument (buf).
|
|
As the calls to setjmp and longjmp are indistinguishable from
|
|
normal procedure calls, the optimizer will not see the danger.
|
|
No need to say that several loop optimizations will behave
|
|
unexpectedly when presented with such pathological input.
|
|
.PP
|
|
Another way to deceive the flow of control is
|
|
by using exception handling routines.
|
|
Ada*
|
|
.FS
|
|
* Ada is a registered trademark of the U.S. Government
|
|
(Ada Joint Program Office).
|
|
.FE
|
|
has clearly recognized the dangers of exception handling,
|
|
but other languages (such as PL/I) have not.
|
|
.[
|
|
ada rationale
|
|
.]
|
|
.PP
|
|
The optimizer will be more effective if the EM input contains
|
|
some extra information about the source program.
|
|
Especially the \fIregister message\fR is very important.
|
|
These messages indicate which local variables may never be
|
|
accessed indirectly.
|
|
Most optimizations benefit significantly by this information.
|
|
.PP
|
|
The Inline Substitution technique needs to know how many bytes
|
|
of formal parameters every procedure accesses.
|
|
Only calls to procedures for which the EM code contains this information
|
|
will be substituted in line.
|
|
.NH 2
|
|
Structure of the optimizer
|
|
.PP
|
|
The Global Optimizer is organized as a number of \fIphases\fR,
|
|
each one performing some task.
|
|
The main structure is as follows:
|
|
.IP IC 6
|
|
the Intermediate Code construction phase transforms EM into the
|
|
intermediate code (ic) of the optimizer
|
|
.IP CF
|
|
the Control Flow phase extends the ic with control flow
|
|
information and interprocedural information
|
|
.IP OPTs
|
|
zero or more optimization phases, each one performing one or
|
|
more related optimizations
|
|
.IP CA
|
|
the Compact Assembly phase generates Compact Assembly Language EM code
|
|
out of ic.
|
|
.LP
|
|
.PP
|
|
An important issue in the design of a global optimizer is the
|
|
interaction between optimization techniques.
|
|
It is often advantageous to combine several techniques in
|
|
one algorithm that takes into account all interactions between them.
|
|
Ideally, one single algorithm should be developed that does
|
|
all optimizations simultaneously and deals with all possible interactions.
|
|
In practice, such an algorithm is still far out of reach.
|
|
Instead some rather ad hoc (albeit important) combinations are chosen,
|
|
such as Common Subexpression Elimination and Register Allocation.
|
|
.[
|
|
prabhala sethi common subexpressions
|
|
.]
|
|
.[
|
|
sethi ullman optimal code
|
|
.]
|
|
.PP
|
|
In the Em Global Optimizer there is one separate algorithm for
|
|
every technique.
|
|
Note that this does not mean that all techniques are independent
|
|
of each other.
|
|
.PP
|
|
In principle, the optimization phases can be run in any order;
|
|
a phase may even be run more than once.
|
|
However, the following rules should be obeyed:
|
|
.IP -
|
|
the Live Variable analysis phase (LV) must be run prior to
|
|
Register Allocation (RA), as RA uses information outputted by LV.
|
|
.IP -
|
|
RA should be the last phase; this is a consequence of the way
|
|
the interface between RA and the Code Generator is defined.
|
|
.LP
|
|
The ordering of the phases has significant impact on
|
|
the quality of the produced code.
|
|
In
|
|
.[
|
|
wulf overview production quality carnegie-mellon
|
|
.]
|
|
two kinds of phase ordering problems are distinguished.
|
|
If two techniques A and B both take away opportunities of each other,
|
|
there is a "negative" ordering problem.
|
|
If, on the other hand, both A and B introduce new optimization
|
|
opportunities for each other, the problem is called "positive".
|
|
In the Global Optimizer the following interactions must be
|
|
taken into account:
|
|
.IP -
|
|
Inline Substitution (IL) may create new opportunities for most
|
|
other techniques, so it should be run as early as possible
|
|
.IP -
|
|
Use Definition analysis (UD) may introduce opportunities for LV.
|
|
.IP -
|
|
Strength Reduction may create opportunities for UD
|
|
.LP
|
|
The optimizer has a default phase ordering, which can
|
|
be changed by the user.
|
|
.NH 2
|
|
Structure of this document
|
|
.PP
|
|
The remaining chapters of this document each describe one
|
|
phase of the optimizer.
|
|
For every phase, we describe its task, its design,
|
|
its implementation, and its source files.
|
|
The latter two sections are intended to aid the
|
|
maintenance of the optimizer and
|
|
can be skipped by the initial reader.
|
|
.NH 2
|
|
References
|
|
.PP
|
|
There are very
|
|
few modern textbooks on optimization.
|
|
Chapters 12, 13, and 14 of
|
|
.[
|
|
aho compiler design
|
|
.]
|
|
are a good introduction to the subject.
|
|
Wulf et. al.
|
|
.[
|
|
wulf optimizing compiler
|
|
.]
|
|
describe one specific optimizing (Bliss) compiler.
|
|
Anklam et. al.
|
|
.[
|
|
anklam vax-11
|
|
.]
|
|
discuss code generation and optimization in
|
|
compilers for one specific machine (a Vax-11).
|
|
Kirchgaesner et. al.
|
|
.[
|
|
optimizing ada compiler
|
|
.]
|
|
present a brief description of many
|
|
optimizations; the report also contains a lengthy (over 60 pages)
|
|
bibliography.
|
|
.PP
|
|
The number of articles on optimization is quite impressive.
|
|
The Lowry and Medlock paper on the Fortran H compiler
|
|
.[
|
|
object code optimization Lowry Medlock
|
|
.]
|
|
is a classical one.
|
|
Other papers on global optimization are.
|
|
.[
|
|
faiman optimizing pascal
|
|
.]
|
|
.[
|
|
perkins sites
|
|
.]
|
|
.[
|
|
harrison general purpose optimizing
|
|
.]
|
|
.[
|
|
morel partial redundancies
|
|
.]
|
|
.[
|
|
Mintz global optimizer
|
|
.]
|
|
Freudenberger
|
|
.[
|
|
freudenberger setl optimizer
|
|
.]
|
|
describes an optimizer for a Very High Level Language (SETL).
|
|
The Production-Quality Compiler-Compiler (PQCC) project uses
|
|
very sophisticated compiler techniques, as described in.
|
|
.[
|
|
wulf overview ieee
|
|
.]
|
|
.[
|
|
wulf overview carnegie-mellon
|
|
.]
|
|
.[
|
|
wulf machine-relative
|
|
.]
|
|
.PP
|
|
Several Ph.D. theses are dedicated to optimization.
|
|
Davidson
|
|
.[
|
|
davidson simplifying
|
|
.]
|
|
outlines a machine-independent peephole optimizer that
|
|
improves assembly code.
|
|
Katkus
|
|
.[
|
|
katkus
|
|
.]
|
|
describes how efficient programs can be obtained at little cost by
|
|
optimizing only a small part of a program.
|
|
Photopoulos
|
|
.[
|
|
photopoulos mixed code
|
|
.]
|
|
discusses the idea of generating interpreted intermediate code as well
|
|
as assembly code, to obtain programs that are both small and fast.
|
|
Shaffer
|
|
.[
|
|
shaffer automatic
|
|
.]
|
|
describes the theory of automatic subroutine generation.
|
|
.]
|
|
Leverett
|
|
.[
|
|
leverett register allocation compilers
|
|
.]
|
|
deals with register allocation in the PQCC compilers.
|
|
.PP
|
|
References to articles about specific optimization techniques
|
|
will be given in later chapters.
|