250 lines
		
	
	
	
		
			8.4 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			250 lines
		
	
	
	
		
			8.4 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.NH 2
 | 
						|
Implementation
 | 
						|
.PP
 | 
						|
.NH 3
 | 
						|
The value number method
 | 
						|
.PP
 | 
						|
To determine whether two expressions have the same result,
 | 
						|
there must be some way to determine whether their operands have
 | 
						|
the same values.
 | 
						|
We use a system of \fIvalue numbers\fP
 | 
						|
.[
 | 
						|
kennedy data flow analysis 
 | 
						|
.]
 | 
						|
in which each distinct value of whatever type,
 | 
						|
created or used within the working window,
 | 
						|
receives a unique identifying number, its value number.
 | 
						|
Two items have the same value number if and only if,
 | 
						|
based only upon information from the instructions in the window,
 | 
						|
their values are provably identical.
 | 
						|
For example, after processing the statement
 | 
						|
.DS
 | 
						|
a := 4;
 | 
						|
.DE
 | 
						|
the variable a and the constant 4 have the same value number.
 | 
						|
.PP
 | 
						|
The value number of the result of an expression depends only
 | 
						|
on the kind of operator and the value number(s) of the operand(s).
 | 
						|
The expressions need not be textually equal, as shown in Fig. 7.5.
 | 
						|
.DS
 | 
						|
.TS
 | 
						|
l l.
 | 
						|
a := c;	(1)
 | 
						|
use(a * b);	(2)
 | 
						|
d := b;	(3)
 | 
						|
use(c * d);	(4)
 | 
						|
.TE
 | 
						|
 | 
						|
Fig. 7.5 Different expressions with the same value number
 | 
						|
.DE
 | 
						|
At line (1) a receives the same value number as c.
 | 
						|
At line (2) d receives the same value number as b.
 | 
						|
At line (4) the expression "c * d" receives the same value number
 | 
						|
as the expression "a * b" at line (2),
 | 
						|
because the value numbers of their left and right operands are the same,
 | 
						|
and the operator (*) is the same.
 | 
						|
.PP
 | 
						|
As another example of the value number method, consider Fig. 7.6.
 | 
						|
.DS
 | 
						|
.TS
 | 
						|
l l.
 | 
						|
use(a * b);	(1)
 | 
						|
a := 123;	(2)
 | 
						|
use(a * b);	(3)
 | 
						|
.TE
 | 
						|
 | 
						|
Fig. 7.6 Identical expressions with the different value numbers
 | 
						|
.DE
 | 
						|
Although textually the expressions "a * b" in line 1 and line 3 are equal,
 | 
						|
a will have different value numbers at line 3 and line 1.
 | 
						|
The two expressions will not mistakenly be recognized as equivalent.
 | 
						|
.NH 3
 | 
						|
Entities
 | 
						|
.PP
 | 
						|
The Value Number Method distinguishes between operators and operands.
 | 
						|
The value numbers of operands are stored in a table,
 | 
						|
called the \fIsymbol table\fR.
 | 
						|
The value number of a subexpression depends on the
 | 
						|
(root) operator of the expression and on the value numbers
 | 
						|
of its operands.
 | 
						|
A table of "available expressions" is used to do this mapping.
 | 
						|
.PP
 | 
						|
CS recognizes the following kinds of EM operands, called \fIentities\fR:
 | 
						|
.DS
 | 
						|
- constant
 | 
						|
- local variable
 | 
						|
- external variable
 | 
						|
- indirectly accessed entity
 | 
						|
- offsetted entity
 | 
						|
- address of local variable
 | 
						|
- address of external variable
 | 
						|
- address of offsetted entity
 | 
						|
- address of local base
 | 
						|
- address of argument base
 | 
						|
- array element
 | 
						|
- procedure identifier
 | 
						|
- floating zero
 | 
						|
- local base
 | 
						|
- heap pointer
 | 
						|
- ignore mask
 | 
						|
.DE
 | 
						|
.LP
 | 
						|
Whenever a new entity is encountered in the working window,
 | 
						|
it is entered in the symbol table and given a brand new value number.
 | 
						|
Most entities have attributes (e.g. the offset in
 | 
						|
the current stackframe for local variables),
 | 
						|
which are also stored in the symbol table.
 | 
						|
.PP
 | 
						|
An entity is called static if its value cannot be changed
 | 
						|
(e.g. a constant or an address).
 | 
						|
.NH 3
 | 
						|
Parsing expressions
 | 
						|
.PP
 | 
						|
Common subexpressions are recognized by simulating the behaviour
 | 
						|
of the EM machine.
 | 
						|
The EM code is parsed from left to right;
 | 
						|
as EM is postfix code, this is a bottom up parse.
 | 
						|
At any point the current state of the EM runtime stack is
 | 
						|
reflected by a simulated "fake stack",
 | 
						|
containing descriptions of the parsed operands and expressions.
 | 
						|
A descriptor consists of:
 | 
						|
.DS
 | 
						|
(1) the value number of the operand or expression
 | 
						|
(2) the size of the operand or expression
 | 
						|
(3) a pointer to the first line of EM-code
 | 
						|
    that constitutes the operand or expression
 | 
						|
.DE
 | 
						|
Note that operands may consist of several EM instructions.
 | 
						|
Whenever an operator is encountered, the
 | 
						|
descriptors of its operands are on top of the fake stack.
 | 
						|
The operator and the value numbers of the operands 
 | 
						|
are used as indices in the table of available expressions,
 | 
						|
to determine the value number of the expression.
 | 
						|
.PP
 | 
						|
During the parsing process,
 | 
						|
we keep track of the first line of each expression;
 | 
						|
we need this information when we decide to eliminate the expression.
 | 
						|
.NH 3
 | 
						|
Updating entities
 | 
						|
.PP
 | 
						|
An entity is assigned a value number when it is
 | 
						|
used for the first time
 | 
						|
in the working window.
 | 
						|
If the entity is used as left hand side of an assignment,
 | 
						|
it gets the value number of the right hand side.
 | 
						|
Sometimes the effects of an instruction on an entity cannot
 | 
						|
be determined exactly;
 | 
						|
the current value and value number of the entity may become
 | 
						|
inconsistent.
 | 
						|
Hence the current value number must be forgotten.
 | 
						|
This is achieved by giving the entity a new value number
 | 
						|
that was not used before.
 | 
						|
The entity is said to be \fIkilled\fR.
 | 
						|
.PP
 | 
						|
As information is lost when an entity is killed,
 | 
						|
CS tries to save as many entities as possible.
 | 
						|
In case of an indirect assignment through a pointer,
 | 
						|
some analysis is done to see which variables cannot be altered.
 | 
						|
For a procedure call, the interprocedural information contained
 | 
						|
in the procedure table is used to restrict the set of entities that may
 | 
						|
be changed by the call.
 | 
						|
Local variables for which the front end generated 
 | 
						|
a register message can never be changed by an indirect assignment
 | 
						|
or a procedure call.
 | 
						|
.NH 3
 | 
						|
Changing the EM text
 | 
						|
.PP
 | 
						|
When a new expression comes available,
 | 
						|
it is checked whether its result is saved in a local
 | 
						|
that may go in a register.
 | 
						|
The last line of the expression must be followed
 | 
						|
by a STL or SDL instruction
 | 
						|
(depending on the size of the result)
 | 
						|
and a register message must be present for
 | 
						|
this local.
 | 
						|
If there is such a local,
 | 
						|
it is recorded in the available expressions table.
 | 
						|
Each time a new occurrence of this expression
 | 
						|
is found,
 | 
						|
the value number of the local is compared against
 | 
						|
the value number of the result.
 | 
						|
If they are different the local cannot be used and is forgotten.
 | 
						|
.PP
 | 
						|
The available expressions are linked in a list.
 | 
						|
New expressions are linked at the head of the list.
 | 
						|
In this way expressions that are contained within other
 | 
						|
expressions appear later in the list,
 | 
						|
because EM-expressions are postfix.
 | 
						|
The elimination process walks through the list,
 | 
						|
starting at the head, to find the largest expressions first.
 | 
						|
If an expression is eliminated,
 | 
						|
any expression later on in the list, contained in the former expression,
 | 
						|
is removed from the list,
 | 
						|
as expressions can only be eliminated once.
 | 
						|
.PP
 | 
						|
A STL or SDL is emitted after the first occurrence of the expression,
 | 
						|
unless there was an existing local variable that could hold the result.
 | 
						|
.NH 3
 | 
						|
Desirability analysis
 | 
						|
.PP
 | 
						|
Although the global optimizer works on EM code,
 | 
						|
the goal is to improve the quality of the object code.
 | 
						|
Therefore some machine-dependent information is needed
 | 
						|
to decide whether it is desirable to
 | 
						|
eliminate a given expression.
 | 
						|
Because it is impossible for the CS phase to know
 | 
						|
exactly what code will be generated,
 | 
						|
some heuristics are used.
 | 
						|
CS essentially looks for some special cases
 | 
						|
that should not be eliminated.
 | 
						|
These special cases can be turned on or off for a given machine,
 | 
						|
as indicated in a machine descriptor file.
 | 
						|
.PP
 | 
						|
Some operators can sometimes be translated
 | 
						|
into an addressing mode for the machine at hand.
 | 
						|
Such an operator is only eliminated
 | 
						|
if its operand is itself expensive,
 | 
						|
i.e. it is not just a simple load.
 | 
						|
The machine descriptor file contains a set of such operators.
 | 
						|
.PP
 | 
						|
Eliminating the loading of the Local Base or
 | 
						|
the Argument Base by the LXL resp. LXA instruction
 | 
						|
is only beneficial if the difference in lexical levels
 | 
						|
exceeds a certain threshold.
 | 
						|
The machine descriptor file contains this threshold.
 | 
						|
.PP
 | 
						|
Replacing a SAR or a LAR by an AAR followed by a LOI
 | 
						|
may possibly increase the size of the object code.
 | 
						|
We assume that this is only possible when the
 | 
						|
size of the array element is greater than some limit.
 | 
						|
.PP
 | 
						|
There are back ends that can very efficiently translate
 | 
						|
the index computing instruction sequence LOC SLI ADS.
 | 
						|
If this is the case,
 | 
						|
the SLI instruction between a LOC
 | 
						|
and an ADS is not eliminated.
 | 
						|
.PP
 | 
						|
To handle unforseen cases, the descriptor file may also contain
 | 
						|
a set of operators that should never be eliminated.
 | 
						|
.NH 3
 | 
						|
The algorithm
 | 
						|
.PP
 | 
						|
After these preparatory explanations,
 | 
						|
the algorithm itself is easy to understand.
 | 
						|
For each instruction within the current window,
 | 
						|
the following steps are performed in the given order :
 | 
						|
.IP 1.
 | 
						|
Check if this instruction defines an entity.
 | 
						|
If so, the set of entities is updated accordingly.
 | 
						|
.IP 2.
 | 
						|
Kill all entities that might be affected by this instruction.
 | 
						|
.IP 3.
 | 
						|
Simulate the instruction on the fake-stack.
 | 
						|
If this instruction is an operator,
 | 
						|
update the list of available expressions accordingly.
 | 
						|
.PP
 | 
						|
The result of this process is
 | 
						|
a list of available expressions plus the information
 | 
						|
needed to eliminate them.
 | 
						|
Expressions that are desirable to eliminate are eliminated.
 | 
						|
Next, the window is shifted and the process is repeated.
 |