243 lines
		
	
	
	
		
			8.4 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			243 lines
		
	
	
	
		
			8.4 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .NH 2
 | |
| Implementation
 | |
| .PP
 | |
| .NH 3
 | |
| The value number method
 | |
| .PP
 | |
| To determine whether two expressions have the same result,
 | |
| there must be some way to determine whether their operands have
 | |
| the same values.
 | |
| We use a system of \fIvalue numbers\fP
 | |
| .[
 | |
| kennedy data flow analysis 
 | |
| .]
 | |
| in which each distinct value of whatever type,
 | |
| created or used within the working window,
 | |
| receives a unique identifying number, its value number.
 | |
| Two items have the same value number if and only if,
 | |
| based only upon information from the instructions in the window,
 | |
| their values are provably identical.
 | |
| For example, after processing the statement
 | |
| .DS
 | |
| a := 4;
 | |
| .DE
 | |
| the variable a and the constant 4 have the same value number.
 | |
| .PP
 | |
| The value number of the result of an expression depends only
 | |
| on the kind of operator and the value number(s) of the operand(s).
 | |
| The expressions need not be textually equal, as shown in Fig. 7.5.
 | |
| .DS
 | |
| a := c;		(1)
 | |
| use(a * b);	(2)
 | |
| d := b;		(3)
 | |
| use(c * d);	(4)
 | |
| 
 | |
| Fig. 7.5 Different expressions with the same value number
 | |
| .DE
 | |
| At line (1) a receives the same value number as c.
 | |
| At line (2) d receives the same value number as b.
 | |
| At line (4) the expression "c * d" receives the same value number
 | |
| as the expression "a * b" at line (2),
 | |
| because the value numbers of their left and right operands are the same,
 | |
| and the operator (*) is the same.
 | |
| .PP
 | |
| As another example of the value number method, consider Fig. 7.6.
 | |
| .DS
 | |
| use(a * b);	(1)
 | |
| a := 123;	(2)
 | |
| use(a * b);	(3)
 | |
| 
 | |
| Fig. 7.6 Identical expressions with the different value numbers
 | |
| .DE
 | |
| Although textually the expressions "a * b" in line 1 and line 3 are equal,
 | |
| a will have different value numbers at line 3 and line 1.
 | |
| The two expressions will not mistakenly be recognized as equivalent.
 | |
| .NH 3
 | |
| Entities
 | |
| .PP
 | |
| The Value Number Method distinguishes between operators and operands.
 | |
| The value numbers of operands are stored in a table,
 | |
| called the \fIsymbol table\fR.
 | |
| The value number of a subexpression depends on the
 | |
| (root) operator of the expression and on the value numbers
 | |
| of its operands.
 | |
| A table of "available expressions" is used to do this mapping.
 | |
| .PP
 | |
| CS recognizes the following kinds of EM operands, called \fIentities\fR:
 | |
| .IP
 | |
| - constant
 | |
| - local variable
 | |
| - external variable
 | |
| - indirectly accessed entity
 | |
| - offsetted entity
 | |
| - address of local variable
 | |
| - address of external variable
 | |
| - address of offsetted entity
 | |
| - address of local base
 | |
| - address of argument base
 | |
| - array element
 | |
| - procedure identifier
 | |
| - floating zero
 | |
| - local base
 | |
| - heap pointer
 | |
| - ignore mask
 | |
| .LP
 | |
| Whenever a new entity is encountered in the working window,
 | |
| it is entered in the symbol table and given a brand new value number.
 | |
| Most entities have attributes (e.g. the offset in
 | |
| the current stackframe for local variables),
 | |
| which are also stored in the symbol table.
 | |
| .PP
 | |
| An entity is called static if its value cannot be changed
 | |
| (e.g. a constant or an address).
 | |
| .NH 3
 | |
| Parsing expressions
 | |
| .PP
 | |
| Common subexpressions are recognized by simulating the behaviour
 | |
| of the EM machine.
 | |
| The EM code is parsed from left to right;
 | |
| as EM is postfix code, this is a bottom up parse.
 | |
| At any point the current state of the EM runtime stack is
 | |
| reflected by a simulated "fake stack",
 | |
| containing descriptions of the parsed operands and expressions.
 | |
| A descriptor consists of:
 | |
| .DS
 | |
| (1) the value number of the operand or expression
 | |
| (2) the size of the operand or expression
 | |
| (3) a pointer to the first line of EM-code
 | |
|     that constitutes the operand or expression
 | |
| .DE
 | |
| Note that operands may consist of several EM instructions.
 | |
| Whenever an operator is encountered, the
 | |
| descriptors of its operands are on top of the fake stack.
 | |
| The operator and the value numbers of the operands 
 | |
| are used as indices in the table of available expressions,
 | |
| to determine the value number of the expression.
 | |
| .PP
 | |
| During the parsing process,
 | |
| we keep track of the first line of each expression;
 | |
| we need this information when we decide to eliminate the expression.
 | |
| .NH 3
 | |
| Updating entities
 | |
| .PP
 | |
| An entity is assigned a value number when it is
 | |
| used for the first time
 | |
| in the working window.
 | |
| If the entity is used as left hand side of an assignment,
 | |
| it gets the value number of the right hand side.
 | |
| Sometimes the effects of an instruction on an entity cannot
 | |
| be determined exactly;
 | |
| the current value and value number of the entity may become
 | |
| inconsistent.
 | |
| Hence the current value number must be forgotten.
 | |
| This is achieved by giving the entity a new value number
 | |
| that was not used before.
 | |
| The entity is said to be \fIkilled\fR.
 | |
| .PP
 | |
| As information is lost when an entity is killed,
 | |
| CS tries to save as many entities as possible.
 | |
| In case of an indirect assignment through a pointer,
 | |
| some analysis is done to see which variables cannot be altered.
 | |
| For a procedure call, the interprocedural information contained
 | |
| in the procedure table is used to restrict the set of entities that may
 | |
| be changed by the call.
 | |
| Local variables for which the front end generated 
 | |
| a register message can never be changed by an indirect assignment
 | |
| or a procedure call.
 | |
| .NH 3
 | |
| Changing the EM text
 | |
| .PP
 | |
| When a new expression comes available,
 | |
| it is checked whether its result is saved in a local
 | |
| that may go in a register.
 | |
| The last line of the expression must be followed
 | |
| by a STL or SDL instruction
 | |
| (depending on the size of the result)
 | |
| and a register message must be present for
 | |
| this local.
 | |
| If there is such a local,
 | |
| it is recorded in the available expressions table.
 | |
| Each time a new occurrence of this expression
 | |
| is found,
 | |
| the value number of the local is compared against
 | |
| the value number of the result.
 | |
| If they are different the local cannot be used and is forgotten.
 | |
| .PP
 | |
| The available expressions are linked in a list.
 | |
| New expressions are linked at the head of the list.
 | |
| In this way expressions that are contained within other
 | |
| expressions appear later in the list,
 | |
| because EM-expressions are postfix.
 | |
| The elimination process walks through the list,
 | |
| starting at the head, to find the largest expressions first.
 | |
| If an expression is eliminated,
 | |
| any expression later on in the list, contained in the former expression,
 | |
| is removed from the list,
 | |
| as expressions can only be eliminated once.
 | |
| .PP
 | |
| A STL or SDL is emitted after the first occurrence of the expression,
 | |
| unless there was an existing local variable that could hold the result.
 | |
| .NH 3
 | |
| Desirability analysis
 | |
| .PP
 | |
| Although the global optimizer works on EM code,
 | |
| the goal is to improve the quality of the object code.
 | |
| Therefore some machine-dependent information is needed
 | |
| to decide whether it is desirable to
 | |
| eliminate a given expression.
 | |
| Because it is impossible for the CS phase to know
 | |
| exactly what code will be generated,
 | |
| some heuristics are used.
 | |
| CS essentially looks for some special cases
 | |
| that should not be eliminated.
 | |
| These special cases can be turned on or off for a given machine,
 | |
| as indicated in a machine descriptor file.
 | |
| .PP
 | |
| Some operators can sometimes be translated
 | |
| into an addressing mode for the machine at hand.
 | |
| Such an operator is only eliminated
 | |
| if its operand is itself expensive,
 | |
| i.e. it is not just a simple load.
 | |
| The machine descriptor file contains a set of such operators.
 | |
| .PP
 | |
| Eliminating the loading of the Local Base or
 | |
| the Argument Base by the LXL resp. LXA instruction
 | |
| is only beneficial if the difference in lexical levels
 | |
| exceeds a certain threshold.
 | |
| The machine descriptor file contains this threshold.
 | |
| .PP
 | |
| Replacing a SAR or a LAR by an AAR followed by a LOI
 | |
| may possibly increase the size of the object code.
 | |
| We assume that this is only possible when the
 | |
| size of the array element is greater than some limit.
 | |
| .PP
 | |
| There are back ends that can very efficiently translate
 | |
| the index computing instruction sequence LOC SLI ADS.
 | |
| If this is the case,
 | |
| the SLI instruction between a LOC
 | |
| and an ADS is not eliminated.
 | |
| .PP
 | |
| To handle unforseen cases, the descriptor file may also contain
 | |
| a set of operators that should never be eliminated.
 | |
| .NH 3
 | |
| The algorithm
 | |
| .PP
 | |
| After these preparatory explanations,
 | |
| the algorithm itself is easy to understand.
 | |
| For each instruction within the current window,
 | |
| the following steps are performed in the given order :
 | |
| .IP 1.
 | |
| Check if this instruction defines an entity.
 | |
| If so, the set of entities is updated accordingly.
 | |
| .IP 2.
 | |
| Kill all entities that might be affected by this instruction.
 | |
| .IP 3.
 | |
| Simulate the instruction on the fake-stack.
 | |
| If this instruction is an operator,
 | |
| update the list of available expressions accordingly.
 | |
| .PP
 | |
| The result of this process is
 | |
| a list of available expressions plus the information
 | |
| needed to eliminate them.
 | |
| Expressions that are desirable to eliminate are eliminated.
 | |
| Next, the window is shifted and the process is repeated.
 |