.bp .NH 1 Overview of the global optimizer .NH 2 The ACK compilation process .PP The EM Global Optimizer is one of three optimizers that are part of the Amsterdam Compiler Kit (ACK). The phases of ACK are: .IP 1. A Front End translates a source program to EM .IP 2. The Peephole Optimizer .[ tanenbaum staveren peephole toplass .] reads EM code and produces 'better' EM code. It performs a number of optimizations (mostly peephole optimizations) such as constant folding, strength reduction and unreachable code elimination. .IP 3. The Global Optimizer further improves the EM code. .IP 4. The Code Generator transforms EM to assembly code of the target computer. .IP 5. The Target Optimizer improves the assembly code. .IP 6. An Assembler/Loader generates an executable file. .LP For a more extensive overview of the ACK compilation process, we refer to. .[ tanenbaum toolkit rapport .] .[ tanenbaum toolkit cacm .] .PP The input of the Global Optimizer may consist of files and libraries. Every file or module in the library must contain EM code in Compact Assembly Language format. .[~[ tanenbaum machine architecture .], section 11.2] The output consists of one such EM file. The input files and libraries together need not constitute an entire program, although as much of the program as possible should be supplied. The more information about the program the optimizer gets, the better its output code will be. .PP The Global Optimizer is language- and machine-independent, i.e. it can be used for all languages and machines supported by ACK. Yet, it puts some unavoidable restrictions on the EM code produced by the Front End (see below). It must have some knowledge of the target machine. This knowledge is expressed in a machine description table which is passed as argument to the optimizer. This table does not contain very detailed information about the target (such as its instruction set and addressing modes). .NH 2 The EM code .PP The definition of EM, the intermediate code of all ACK compilers, is given in a separate document. .[ tanenbaum machine architecture .] We will only discuss some features of EM that are most relevant to the Global Optimizer. .PP EM is the assembly code of a virtual \fIstack machine\fR. All operations are performed on the top of the stack. For example, the statement "A := B + 3" may be expressed in EM as: .DS LOL -4 -- push local variable B LOC 3 -- push constant 3 ADI 2 -- add two 2-byte items on top of -- the stack and push the result STL -2 -- pop A .DE So EM is essentially a \fIpostfix\fR code. .PP EM has a rich instruction set, containing several arithmetic and logical operators. It also contains special-case instructions (such as INCrement). .PP EM has \fIglobal\fR (\fIexternal\fR) variables, accessible by all procedures and \fIlocal\fR variables, accessible by a few (nested) procedures. The local variables of a lexically enclosing procedure may be accessed via a \fIstatic link\fR. EM has instructions to follow the static chain. There are EM instruction to allow a procedure to access its local variables directly (such as LOL and STL above). Local variables are referenced via an offset in the stack frame of the procedure, rather than by their names (e.g. -2 and -4 above). The EM code does not contain the (source language) type of the variables. .PP All structured statements in the source program are expressed in low level jump instructions. Besides conditional and unconditional branch instructions, there are two case instructions (CSA and CSB), to allow efficient translation of case statements. .NH 2 Requirements on the EM input .PP As the optimizer should be useful for all languages, it clearly should not put severe restrictions on the EM code of the input. There is, however, one immovable requirement: it must be possible to determine the \fIflow of control\fR of the input program. As virtually all global optimizations are based on control flow information, the optimizer would be totally powerless without it. For this reason we restrict the usage of the case jump instructions (CSA/CSB) of EM. Such an instruction is always called with the address of a case descriptor on top the the stack. .[~[ tanenbaum machine architecture .] section 7.4] This descriptor contains the labels of all possible destinations of the jump. We demand that all case descriptors are allocated in a global data fragment of type ROM, i.e. the case descriptors may not be modifyable. Furthermore, any case instruction should be immediately preceded by a LAE (Load Address External) instruction, that loads the address of the descriptor, so the descriptor can be uniquely identified. .PP The optimizer will work improperly if the user deceives the control flow. We will give two methods to do this. .PP In "C" the notorious library routines "setjmp" and "longjmp" .[ unix programmer's manual McIlroy .] may be used to jump out of a procedure, but can also be used for a number of other stuffy purposes, for example, to create an extra entry point in a loop. .DS while (condition) { .... setjmp(buf); ... } ... longjmp(buf); .DE The invocation to longjmp actually is a jump to the place of the last call to setjmp with the same argument (buf). As the calls to setjmp and longjmp are indistinguishable from normal procedure calls, the optimizer will not see the danger. No need to say that several loop optimizations will behave unexpectedly when presented with such pathological input. .PP Another way to deceive the flow of control is by using exception handling routines. Ada* .FS * Ada is a registered trademark of the U.S. Government (Ada Joint Program Office). .FE has clearly recognized the dangers of exception handling, but other languages (such as PL/I) have not. .[ ada rationale .] .PP The optimizer will be more effective if the EM input contains some extra information about the source program. Especially the \fIregister message\fR is very important. These messages indicate which local variables may never be accessed indirectly. Most optimizations benefit significantly by this information. .PP The Inline Substitution technique needs to know how many bytes of formal parameters every procedure accesses. Only calls to procedures for which the EM code contains this information will be substituted in line. .NH 2 Structure of the optimizer .PP The Global Optimizer is organized as a number of \fIphases\fR, each one performing some task. The main structure is as follows: .IP IC 6 the Intermediate Code construction phase transforms EM into the intermediate code (ic) of the optimizer .IP CF the Control Flow phase extends the ic with control flow information and interprocedural information .IP OPTs zero or more optimization phases, each one performing one or more related optimizations .IP CA the Compact Assembly phase generates Compact Assembly Language EM code out of ic. .LP .PP An important issue in the design of a global optimizer is the interaction between optimization techniques. It is often advantageous to combine several techniques in one algorithm that takes into account all interactions between them. Ideally, one single algorithm should be developed that does all optimizations simultaneously and deals with all possible interactions. In practice, such an algorithm is still far out of reach. Instead some rather ad hoc (albeit important) combinations are chosen, such as Common Subexpression Elimination and Register Allocation. .[ prabhala sethi common subexpressions .] .[ sethi ullman optimal code .] .PP In the Em Global Optimizer there is one separate algorithm for every technique. Note that this does not mean that all techniques are independent of each other. .PP In principle, the optimization phases can be run in any order; a phase may even be run more than once. However, the following rules should be obeyed: .IP - the Live Variable analysis phase (LV) must be run prior to Register Allocation (RA), as RA uses information outputted by LV. .IP - RA should be the last phase; this is a consequence of the way the interface between RA and the Code Generator is defined. .LP The ordering of the phases has significant impact on the quality of the produced code. In .[ wulf overview production quality carnegie-mellon .] two kinds of phase ordering problems are distinguished. If two techniques A and B both take away opportunities of each other, there is a "negative" ordering problem. If, on the other hand, both A and B introduce new optimization opportunities for each other, the problem is called "positive". In the Global Optimizer the following interactions must be taken into account: .IP - Inline Substitution (IL) may create new opportunities for most other techniques, so it should be run as early as possible .IP - Use Definition analysis (UD) may introduce opportunities for LV. .IP - Strength Reduction may create opportunities for UD .LP The optimizer has a default phase ordering, which can be changed by the user. .NH 2 Structure of this document .PP The remaining chapters of this document each describe one phase of the optimizer. For every phase, we describe its task, its design, its implementation, and its source files. The latter two sections are intended to aid the maintenance of the optimizer and can be skipped by the initial reader. .NH 2 References .PP There are very few modern textbooks on optimization. Chapters 12, 13, and 14 of .[ aho compiler design .] are a good introduction to the subject. Wulf et. al. .[ wulf optimizing compiler .] describe one specific optimizing (Bliss) compiler. Anklam et. al. .[ anklam vax-11 .] discuss code generation and optimization in compilers for one specific machine (a Vax-11). Kirchgaesner et. al. .[ optimizing ada compiler .] present a brief description of many optimizations; the report also contains a lengthy (over 60 pages) bibliography. .PP The number of articles on optimization is quite impressive. The Lowry and Medlock paper on the Fortran H compiler .[ object code optimization Lowry Medlock .] is a classical one. Other papers on global optimization are. .[ faiman optimizing pascal .] .[ perkins sites .] .[ harrison general purpose optimizing .] .[ morel partial redundancies .] .[ Mintz global optimizer .] Freudenberger .[ freudenberger setl optimizer .] describes an optimizer for a Very High Level Language (SETL). The Production-Quality Compiler-Compiler (PQCC) project uses very sophisticated compiler techniques, as described in. .[ wulf overview ieee .] .[ wulf overview carnegie-mellon .] .[ wulf machine-relative .] .PP Several Ph.D. theses are dedicated to optimization. Davidson .[ davidson simplifying .] outlines a machine-independent peephole optimizer that improves assembly code. Katkus .[ katkus .] describes how efficient programs can be obtained at little cost by optimizing only a small part of a program. Photopoulos .[ photopoulos mixed code .] discusses the idea of generating interpreted intermediate code as well as assembly code, to obtain programs that are both small and fast. Shaffer .[ shaffer automatic .] describes the theory of automatic subroutine generation. .] Leverett .[ leverett register allocation compilers .] deals with register allocation in the PQCC compilers. .PP References to articles about specific optimization techniques will be given in later chapters.