371 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			371 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .bp
 | |
| .NH 1
 | |
| Overview of the global optimizer
 | |
| .NH 2
 | |
| The ACK compilation process
 | |
| .PP
 | |
| The EM Global Optimizer is one of three optimizers that are
 | |
| part of the Amsterdam Compiler Kit (ACK).
 | |
| The phases of ACK are:
 | |
| .IP 1.
 | |
| A Front End translates a source program to EM
 | |
| .IP 2.
 | |
| The Peephole Optimizer
 | |
| .[
 | |
| tanenbaum staveren peephole toplass
 | |
| .]
 | |
| reads EM code and produces 'better' EM code.
 | |
| It performs a number of optimizations (mostly peephole
 | |
| optimizations)
 | |
| such as constant folding, strength reduction and unreachable code
 | |
| elimination.
 | |
| .IP 3.
 | |
| The Global Optimizer further improves the EM code.
 | |
| .IP 4.
 | |
| The Code Generator transforms EM to assembly code
 | |
| of the target computer.
 | |
| .IP 5.
 | |
| The Target Optimizer improves the assembly code.
 | |
| .IP 6.
 | |
| An Assembler/Loader generates an executable file.
 | |
| .LP
 | |
| For a more extensive overview of the ACK compilation process,
 | |
| we refer to.
 | |
| .[
 | |
| tanenbaum toolkit rapport
 | |
| .]
 | |
| .[
 | |
| tanenbaum toolkit cacm
 | |
| .]
 | |
| .PP
 | |
| The input of the Global Optimizer may consist of files and
 | |
| libraries.
 | |
| Every file or module in the library must contain EM code in
 | |
| Compact Assembly Language format.
 | |
| .[~[
 | |
| tanenbaum machine architecture
 | |
| .], section 11.2]
 | |
| The output consists of one such EM file.
 | |
| The input files and libraries together need not
 | |
| constitute an entire program,
 | |
| although as much of the program as possible should be supplied.
 | |
| The more information about the program the optimizer 
 | |
| gets, the better its output code will be.
 | |
| .PP
 | |
| The Global Optimizer is language- and machine-independent,
 | |
| i.e. it can be used for all languages and machines supported by ACK.
 | |
| Yet, it puts some unavoidable restrictions on the EM code
 | |
| produced by the Front End (see below).
 | |
| It must have some knowledge of the target machine.
 | |
| This knowledge is expressed in a machine description table
 | |
| which is passed as argument to the optimizer.
 | |
| This table does not contain very detailed information about the
 | |
| target (such as its instruction set and addressing modes).
 | |
| .NH 2
 | |
| The EM code
 | |
| .PP
 | |
| The definition of EM, the intermediate code of all ACK compilers,
 | |
| is given in a separate document.
 | |
| .[
 | |
| tanenbaum machine architecture
 | |
| .]
 | |
| We will only discuss some features of EM that are most relevant
 | |
| to the Global Optimizer.
 | |
| .PP
 | |
| EM is the assembly code of a virtual \fIstack machine\fR.
 | |
| All operations are performed on the top of the stack.
 | |
| For example, the statement "A := B + 3" may be expressed in EM as:
 | |
| .DS
 | |
| LOL -4         -- push local variable B
 | |
| LOC 3          -- push constant 3
 | |
| ADI 2          -- add two 2-byte items on top of
 | |
| 	       -- the stack and push the result
 | |
| STL -2         -- pop A
 | |
| .DE
 | |
| So EM is essentially a \fIpostfix\fR code.
 | |
| .PP
 | |
| EM has a rich instruction set, containing several arithmetic
 | |
| and logical operators.
 | |
| It also contains special-case instructions (such as INCrement).
 | |
| .PP
 | |
| EM has \fIglobal\fR (\fIexternal\fR) variables, accessible
 | |
| by all procedures and \fIlocal\fR variables, accessible by a few
 | |
| (nested) procedures.
 | |
| The local variables of a lexically enclosing procedure may
 | |
| be accessed via a \fIstatic link\fR. 
 | |
| EM has instructions to follow the static chain.
 | |
| There are EM instruction to allow a procedure
 | |
| to access its local variables directly (such as LOL and STL above).
 | |
| Local variables are referenced via an offset in the stack frame
 | |
| of the procedure, rather than by their names (e.g. -2 and -4 above).
 | |
| The EM code does not contain the (source language) type
 | |
| of the variables.
 | |
| .PP
 | |
| All structured statements in the source program are expressed in
 | |
| low level jump instructions.
 | |
| Besides conditional and unconditional branch instructions, there are 
 | |
| two case instructions (CSA and CSB),
 | |
| to allow efficient translation of case statements.
 | |
| .NH 2
 | |
| Requirements on the EM input
 | |
| .PP
 | |
| As the optimizer should be useful for all languages,
 | |
| it clearly should not put severe restrictions on the EM code
 | |
| of the input.
 | |
| There is, however, one immovable requirement:
 | |
| it must be possible to determine the \fIflow of control\fR of the
 | |
| input program.
 | |
| As virtually all global optimizations are based on control flow information,
 | |
| the optimizer would be totally powerless without it.
 | |
| For this reason we restrict the usage of the case jump instructions (CSA/CSB)
 | |
| of EM.
 | |
| Such an instruction is always called with the address of a case descriptor
 | |
| on top the the stack.
 | |
| .[~[
 | |
| tanenbaum machine architecture
 | |
| .] section 7.4]
 | |
| This descriptor contains the labels of all possible
 | |
| destinations of the jump.
 | |
| We demand that all case descriptors are allocated in a global
 | |
| data fragment of type ROM, i.e. the case descriptors
 | |
| may not be modifyable.
 | |
| Furthermore, any case instruction should be immediately preceded by
 | |
| a LAE (Load Address External) instruction, that loads the
 | |
| address of the descriptor,
 | |
| so the descriptor can be uniquely identified.
 | |
| .PP
 | |
| The optimizer will work improperly if the user deceives the control flow.
 | |
| We will give two methods to do this.
 | |
| .PP
 | |
| In "C" the notorious library routines "setjmp" and "longjmp"
 | |
| .[
 | |
| unix programmer's manual McIlroy
 | |
| .]
 | |
| may be used to jump out of a procedure,
 | |
| but can also be used for a number of other stuffy purposes,
 | |
| for example, to create an extra entry point in a loop.
 | |
| .DS
 | |
|  while (condition) {
 | |
| 	 ....
 | |
| 	 setjmp(buf);
 | |
| 	 ...
 | |
|  }
 | |
|  ...
 | |
|  longjmp(buf);
 | |
| .DE
 | |
| The invocation to longjmp actually is a jump to the place of
 | |
| the last call to setjmp with the same argument (buf).
 | |
| As the calls to setjmp and longjmp are indistinguishable from
 | |
| normal procedure calls, the optimizer will not see the danger.
 | |
| No need to say that several loop optimizations will behave
 | |
| unexpectedly when presented with such pathological input.
 | |
| .PP
 | |
| Another way to deceive the flow of control is
 | |
| by using exception handling routines.
 | |
| Ada*
 | |
| .FS
 | |
| * Ada is a registered trademark of the U.S. Government
 | |
| (Ada Joint Program Office).
 | |
| .FE
 | |
| has clearly recognized the dangers of exception handling,
 | |
| but other languages (such as PL/I) have not.
 | |
| .[
 | |
| ada rationale
 | |
| .]
 | |
| .PP
 | |
| The optimizer will be more effective if the EM input contains
 | |
| some extra information about the source program.
 | |
| Especially the \fIregister message\fR is very important.
 | |
| These messages indicate which local variables may never be
 | |
| accessed indirectly.
 | |
| Most optimizations benefit significantly by this information.
 | |
| .PP
 | |
| The Inline Substitution technique needs to know how many bytes
 | |
| of formal parameters every procedure accesses.
 | |
| Only calls to procedures for which the EM code contains this information
 | |
| will be substituted in line.
 | |
| .NH 2
 | |
| Structure of the optimizer
 | |
| .PP
 | |
| The Global Optimizer is organized as a number of \fIphases\fR,
 | |
| each one performing some task.
 | |
| The main structure is as follows:
 | |
| .IP IC 6
 | |
| the Intermediate Code construction phase transforms EM into the
 | |
| intermediate code (ic) of the optimizer
 | |
| .IP CF
 | |
| the Control Flow phase extends the ic with control flow
 | |
| information and interprocedural information
 | |
| .IP OPTs
 | |
| zero or more optimization phases, each one performing one or
 | |
| more related optimizations
 | |
| .IP CA
 | |
| the Compact Assembly phase generates Compact Assembly Language EM code
 | |
| out of ic.
 | |
| .LP
 | |
| .PP
 | |
| An important issue in the design of a global optimizer is the
 | |
| interaction between optimization techniques.
 | |
| It is often advantageous to combine several techniques in
 | |
| one algorithm that takes into account all interactions between them.
 | |
| Ideally, one single algorithm should be developed that does
 | |
| all optimizations simultaneously and deals with all possible interactions.
 | |
| In practice, such an algorithm is still far out of  reach.
 | |
| Instead some rather ad hoc (albeit important) combinations are chosen,
 | |
| such as Common Subexpression Elimination and Register Allocation.
 | |
| .[
 | |
| prabhala sethi common subexpressions
 | |
| .]
 | |
| .[
 | |
| sethi ullman optimal code
 | |
| .]
 | |
| .PP
 | |
| In the Em Global Optimizer there is one separate algorithm for
 | |
| every technique.
 | |
| Note that this does not mean that all techniques are independent
 | |
| of each other.
 | |
| .PP
 | |
| In principle, the optimization phases can be run in any order;
 | |
| a phase may even be run more than once.
 | |
| However, the following rules should be obeyed:
 | |
| .IP -
 | |
| the Live Variable analysis phase (LV) must be run prior to
 | |
| Register Allocation (RA), as RA uses information outputted by LV.
 | |
| .IP -
 | |
| RA should be the last phase; this is a consequence of the way
 | |
| the interface between RA and the Code Generator is defined.
 | |
| .LP
 | |
| The ordering of the phases has significant impact on
 | |
| the quality of the produced code.
 | |
| In
 | |
| .[
 | |
| wulf overview production quality carnegie-mellon
 | |
| .]
 | |
| two kinds of phase ordering problems are distinguished.
 | |
| If two techniques A and B both take away opportunities of each other,
 | |
| there is a "negative" ordering problem.
 | |
| If, on the other hand, both A and B introduce new optimization
 | |
| opportunities for each other, the problem is called "positive".
 | |
| In the Global Optimizer the following interactions must be
 | |
| taken into account:
 | |
| .IP -
 | |
| Inline Substitution (IL) may create new opportunities for most
 | |
| other techniques, so it should be run as early as possible
 | |
| .IP -
 | |
| Use Definition analysis (UD) may introduce opportunities for LV.
 | |
| .IP -
 | |
| Strength Reduction may create opportunities for UD
 | |
| .LP
 | |
| The optimizer has a default phase ordering, which can
 | |
| be changed by the user.
 | |
| .NH 2
 | |
| Structure of this document
 | |
| .PP
 | |
| The remaining chapters of this document each describe one
 | |
| phase of the optimizer.
 | |
| For every phase, we describe its task, its design,
 | |
| its implementation, and its source files.
 | |
| The latter two sections are intended to aid the
 | |
| maintenance of the optimizer and
 | |
| can be skipped by the initial reader.
 | |
| .NH 2
 | |
| References
 | |
| .PP
 | |
| There are very 
 | |
| few modern textbooks on optimization.
 | |
| Chapters 12, 13, and 14 of
 | |
| .[
 | |
| aho compiler design
 | |
| .]
 | |
| are a good introduction to the subject.
 | |
| Wulf et. al.
 | |
| .[
 | |
| wulf optimizing compiler
 | |
| .]
 | |
| describe one specific optimizing (Bliss) compiler.
 | |
| Anklam et. al.
 | |
| .[
 | |
| anklam vax-11
 | |
| .]
 | |
| discuss code generation and optimization in
 | |
| compilers for one specific machine (a Vax-11).
 | |
| Kirchgaesner et. al. 
 | |
| .[
 | |
| optimizing ada compiler
 | |
| .]
 | |
| present a brief description of many
 | |
| optimizations; the report also contains a lengthy (over 60 pages)
 | |
| bibliography.
 | |
| .PP
 | |
| The number of articles on optimization is quite impressive.
 | |
| The Lowry and Medlock paper on the Fortran H compiler
 | |
| .[
 | |
| object code optimization Lowry Medlock
 | |
| .]
 | |
| is a classical one.
 | |
| Other papers on global optimization are.
 | |
| .[
 | |
| faiman optimizing pascal
 | |
| .]
 | |
| .[
 | |
| perkins sites
 | |
| .]
 | |
| .[
 | |
| harrison general purpose optimizing
 | |
| .]
 | |
| .[
 | |
| morel partial redundancies
 | |
| .]
 | |
| .[
 | |
| Mintz global optimizer
 | |
| .]
 | |
| Freudenberger
 | |
| .[
 | |
| freudenberger setl optimizer
 | |
| .]
 | |
| describes an optimizer for a Very High Level Language (SETL).
 | |
| The Production-Quality Compiler-Compiler (PQCC) project uses
 | |
| very sophisticated compiler techniques, as described in.
 | |
| .[
 | |
| wulf overview ieee
 | |
| .]
 | |
| .[
 | |
| wulf overview carnegie-mellon
 | |
| .]
 | |
| .[
 | |
| wulf machine-relative
 | |
| .]
 | |
| .PP
 | |
| Several Ph.D. theses are dedicated to optimization.
 | |
| Davidson
 | |
| .[
 | |
| davidson simplifying
 | |
| .]
 | |
| outlines a machine-independent peephole optimizer that
 | |
| improves assembly code.
 | |
| Katkus
 | |
| .[
 | |
| katkus
 | |
| .]
 | |
| describes how efficient programs can be obtained at little cost by
 | |
| optimizing only a small part of a program.
 | |
| Photopoulos
 | |
| .[
 | |
| photopoulos mixed code
 | |
| .]
 | |
| discusses the idea of generating interpreted intermediate code as well
 | |
| as assembly code, to obtain programs that are both small and  fast.
 | |
| Shaffer
 | |
| .[
 | |
| shaffer automatic
 | |
| .]
 | |
| describes the theory of automatic subroutine generation.
 | |
| .]
 | |
| Leverett
 | |
| .[
 | |
| leverett register allocation compilers
 | |
| .]
 | |
| deals with register allocation in the PQCC compilers.
 | |
| .PP
 | |
| References to articles about specific optimization techniques
 | |
| will be given in later chapters.
 |