.NH 2 Implementation .PP .NH 3 The value number method .PP To determine whether two expressions have the same result, there must be some way to determine whether their operands have the same values. We use a system of \fIvalue numbers\fP .[ kennedy data flow analysis .] in which each distinct value of whatever type, created or used within the working window, receives a unique identifying number, its value number. Two items have the same value number if and only if, based only upon information from the instructions in the window, their values are provably identical. For example, after processing the statement .DS a := 4; .DE the variable a and the constant 4 have the same value number. .PP The value number of the result of an expression depends only on the kind of operator and the value number(s) of the operand(s). The expressions need not be textually equal, as shown in Fig. 7.5. .DS a := c; (1) use(a * b); (2) d := b; (3) use(c * d); (4) Fig. 7.5 Different expressions with the same value number .DE At line (1) a receives the same value number as c. At line (2) d receives the same value number as b. At line (4) the expression "c * d" receives the same value number as the expression "a * b" at line (2), because the value numbers of their left and right operands are the same, and the operator (*) is the same. .PP As another example of the value number method, consider Fig. 7.6. .DS use(a * b); (1) a := 123; (2) use(a * b); (3) Fig. 7.6 Identical expressions with the different value numbers .DE Although textually the expressions "a * b" in line 1 and line 3 are equal, a will have different value numbers at line 3 and line 1. The two expressions will not mistakenly be recognized as equivalent. .NH 3 Entities .PP The Value Number Method distinguishes between operators and operands. The value numbers of operands are stored in a table, called the \fIsymbol table\fR. The value number of a subexpression depends on the (root) operator of the expression and on the value numbers of its operands. A table of "available expressions" is used to do this mapping. .PP CS recognizes the following kinds of EM operands, called \fIentities\fR: .IP - constant - local variable - external variable - indirectly accessed entity - offsetted entity - address of local variable - address of external variable - address of offsetted entity - address of local base - address of argument base - array element - procedure identifier - floating zero - local base - heap pointer - ignore mask .LP Whenever a new entity is encountered in the working window, it is entered in the symbol table and given a brand new value number. Most entities have attributes (e.g. the offset in the current stackframe for local variables), which are also stored in the symbol table. .PP An entity is called static if its value cannot be changed (e.g. a constant or an address). .NH 3 Parsing expressions .PP Common subexpressions are recognized by simulating the behaviour of the EM machine. The EM code is parsed from left to right; as EM is postfix code, this is a bottom up parse. At any point the current state of the EM runtime stack is reflected by a simulated "fake stack", containing descriptions of the parsed operands and expressions. A descriptor consists of: .DS (1) the value number of the operand or expression (2) the size of the operand or expression (3) a pointer to the first line of EM-code that constitutes the operand or expression .DE Note that operands may consist of several EM instructions. Whenever an operator is encountered, the descriptors of its operands are on top of the fake stack. The operator and the value numbers of the operands are used as indices in the table of available expressions, to determine the value number of the expression. .PP During the parsing process, we keep track of the first line of each expression; we need this information when we decide to eliminate the expression. .NH 3 Updating entities .PP An entity is assigned a value number when it is used for the first time in the working window. If the entity is used as left hand side of an assignment, it gets the value number of the right hand side. Sometimes the effects of an instruction on an entity cannot be determined exactly; the current value and value number of the entity may become inconsistent. Hence the current value number must be forgotten. This is achieved by giving the entity a new value number that was not used before. The entity is said to be \fIkilled\fR. .PP As information is lost when an entity is killed, CS tries to save as many entities as possible. In case of an indirect assignment through a pointer, some analysis is done to see which variables cannot be altered. For a procedure call, the interprocedural information contained in the procedure table is used to restrict the set of entities that may be changed by the call. Local variables for which the front end generated a register message can never be changed by an indirect assignment or a procedure call. .NH 3 Changing the EM text .PP When a new expression comes available, it is checked whether its result is saved in a local that may go in a register. The last line of the expression must be followed by a STL or SDL instruction (depending on the size of the result) and a register message must be present for this local. If there is such a local, it is recorded in the available expressions table. Each time a new occurrence of this expression is found, the value number of the local is compared against the value number of the result. If they are different the local cannot be used and is forgotten. .PP The available expressions are linked in a list. New expressions are linked at the head of the list. In this way expressions that are contained within other expressions appear later in the list, because EM-expressions are postfix. The elimination process walks through the list, starting at the head, to find the largest expressions first. If an expression is eliminated, any expression later on in the list, contained in the former expression, is removed from the list, as expressions can only be eliminated once. .PP A STL or SDL is emitted after the first occurrence of the expression, unless there was an existing local variable that could hold the result. .NH 3 Desirability analysis .PP Although the global optimizer works on EM code, the goal is to improve the quality of the object code. Therefore some machine-dependent information is needed to decide whether it is desirable to eliminate a given expression. Because it is impossible for the CS phase to know exactly what code will be generated, some heuristics are used. CS essentially looks for some special cases that should not be eliminated. These special cases can be turned on or off for a given machine, as indicated in a machine descriptor file. .PP Some operators can sometimes be translated into an addressing mode for the machine at hand. Such an operator is only eliminated if its operand is itself expensive, i.e. it is not just a simple load. The machine descriptor file contains a set of such operators. .PP Eliminating the loading of the Local Base or the Argument Base by the LXL resp. LXA instruction is only beneficial if the difference in lexical levels exceeds a certain threshold. The machine descriptor file contains this threshold. .PP Replacing a SAR or a LAR by an AAR followed by a LOI may possibly increase the size of the object code. We assume that this is only possible when the size of the array element is greater than some limit. .PP There are back ends that can very efficiently translate the index computing instruction sequence LOC SLI ADS. If this is the case, the SLI instruction between a LOC and an ADS is not eliminated. .PP To handle unforseen cases, the descriptor file may also contain a set of operators that should never be eliminated. .NH 3 The algorithm .PP After these preparatory explanations, the algorithm itself is easy to understand. For each instruction within the current window, the following steps are performed in the given order : .IP 1. Check if this instruction defines an entity. If so, the set of entities is updated accordingly. .IP 2. Kill all entities that might be affected by this instruction. .IP 3. Simulate the instruction on the fake-stack. If this instruction is an operator, update the list of available expressions accordingly. .PP The result of this process is a list of available expressions plus the information needed to eliminate them. Expressions that are desirable to eliminate are eliminated. Next, the window is shifted and the process is repeated.