251 lines
8.4 KiB
Plaintext
251 lines
8.4 KiB
Plaintext
.NH 2
|
|
Implementation
|
|
.PP
|
|
.NH 3
|
|
The value number method
|
|
.PP
|
|
To determine whether two expressions have the same result,
|
|
there must be some way to determine whether their operands have
|
|
the same values.
|
|
We use a system of \fIvalue numbers\fP
|
|
.[
|
|
kennedy data flow analysis
|
|
.]
|
|
in which each distinct value of whatever type,
|
|
created or used within the working window,
|
|
receives a unique identifying number, its value number.
|
|
Two items have the same value number if and only if,
|
|
based only upon information from the instructions in the window,
|
|
their values are provably identical.
|
|
For example, after processing the statement
|
|
.DS
|
|
a := 4;
|
|
.DE
|
|
the variable a and the constant 4 have the same value number.
|
|
.PP
|
|
The value number of the result of an expression depends only
|
|
on the kind of operator and the value number(s) of the operand(s).
|
|
The expressions need not be textually equal, as shown in Fig. 7.5.
|
|
.DS
|
|
.TS
|
|
l l.
|
|
a := c; (1)
|
|
use(a * b); (2)
|
|
d := b; (3)
|
|
use(c * d); (4)
|
|
.TE
|
|
|
|
Fig. 7.5 Different expressions with the same value number
|
|
.DE
|
|
At line (1) a receives the same value number as c.
|
|
At line (2) d receives the same value number as b.
|
|
At line (4) the expression "c * d" receives the same value number
|
|
as the expression "a * b" at line (2),
|
|
because the value numbers of their left and right operands are the same,
|
|
and the operator (*) is the same.
|
|
.PP
|
|
As another example of the value number method, consider Fig. 7.6.
|
|
.DS
|
|
.TS
|
|
l l.
|
|
use(a * b); (1)
|
|
a := 123; (2)
|
|
use(a * b); (3)
|
|
.TE
|
|
|
|
Fig. 7.6 Identical expressions with the different value numbers
|
|
.DE
|
|
Although textually the expressions "a * b" in line 1 and line 3 are equal,
|
|
a will have different value numbers at line 3 and line 1.
|
|
The two expressions will not mistakenly be recognized as equivalent.
|
|
.NH 3
|
|
Entities
|
|
.PP
|
|
The Value Number Method distinguishes between operators and operands.
|
|
The value numbers of operands are stored in a table,
|
|
called the \fIsymbol table\fR.
|
|
The value number of a subexpression depends on the
|
|
(root) operator of the expression and on the value numbers
|
|
of its operands.
|
|
A table of "available expressions" is used to do this mapping.
|
|
.PP
|
|
CS recognizes the following kinds of EM operands, called \fIentities\fR:
|
|
.DS
|
|
- constant
|
|
- local variable
|
|
- external variable
|
|
- indirectly accessed entity
|
|
- offsetted entity
|
|
- address of local variable
|
|
- address of external variable
|
|
- address of offsetted entity
|
|
- address of local base
|
|
- address of argument base
|
|
- array element
|
|
- procedure identifier
|
|
- floating zero
|
|
- local base
|
|
- heap pointer
|
|
- ignore mask
|
|
.DE
|
|
.LP
|
|
Whenever a new entity is encountered in the working window,
|
|
it is entered in the symbol table and given a brand new value number.
|
|
Most entities have attributes (e.g. the offset in
|
|
the current stackframe for local variables),
|
|
which are also stored in the symbol table.
|
|
.PP
|
|
An entity is called static if its value cannot be changed
|
|
(e.g. a constant or an address).
|
|
.NH 3
|
|
Parsing expressions
|
|
.PP
|
|
Common subexpressions are recognized by simulating the behaviour
|
|
of the EM machine.
|
|
The EM code is parsed from left to right;
|
|
as EM is postfix code, this is a bottom up parse.
|
|
At any point the current state of the EM runtime stack is
|
|
reflected by a simulated "fake stack",
|
|
containing descriptions of the parsed operands and expressions.
|
|
A descriptor consists of:
|
|
.DS
|
|
(1) the value number of the operand or expression
|
|
(2) the size of the operand or expression
|
|
(3) a pointer to the first line of EM-code
|
|
that constitutes the operand or expression
|
|
.DE
|
|
Note that operands may consist of several EM instructions.
|
|
Whenever an operator is encountered, the
|
|
descriptors of its operands are on top of the fake stack.
|
|
The operator and the value numbers of the operands
|
|
are used as indices in the table of available expressions,
|
|
to determine the value number of the expression.
|
|
.PP
|
|
During the parsing process,
|
|
we keep track of the first line of each expression;
|
|
we need this information when we decide to eliminate the expression.
|
|
.NH 3
|
|
Updating entities
|
|
.PP
|
|
An entity is assigned a value number when it is
|
|
used for the first time
|
|
in the working window.
|
|
If the entity is used as left hand side of an assignment,
|
|
it gets the value number of the right hand side.
|
|
Sometimes the effects of an instruction on an entity cannot
|
|
be determined exactly;
|
|
the current value and value number of the entity may become
|
|
inconsistent.
|
|
Hence the current value number must be forgotten.
|
|
This is achieved by giving the entity a new value number
|
|
that was not used before.
|
|
The entity is said to be \fIkilled\fR.
|
|
.PP
|
|
As information is lost when an entity is killed,
|
|
CS tries to save as many entities as possible.
|
|
In case of an indirect assignment through a pointer,
|
|
some analysis is done to see which variables cannot be altered.
|
|
For a procedure call, the interprocedural information contained
|
|
in the procedure table is used to restrict the set of entities that may
|
|
be changed by the call.
|
|
Local variables for which the front end generated
|
|
a register message can never be changed by an indirect assignment
|
|
or a procedure call.
|
|
.NH 3
|
|
Changing the EM text
|
|
.PP
|
|
When a new expression comes available,
|
|
it is checked whether its result is saved in a local
|
|
that may go in a register.
|
|
The last line of the expression must be followed
|
|
by a STL or SDL instruction
|
|
(depending on the size of the result)
|
|
and a register message must be present for
|
|
this local.
|
|
If there is such a local,
|
|
it is recorded in the available expressions table.
|
|
Each time a new occurrence of this expression
|
|
is found,
|
|
the value number of the local is compared against
|
|
the value number of the result.
|
|
If they are different the local cannot be used and is forgotten.
|
|
.PP
|
|
The available expressions are linked in a list.
|
|
New expressions are linked at the head of the list.
|
|
In this way expressions that are contained within other
|
|
expressions appear later in the list,
|
|
because EM-expressions are postfix.
|
|
The elimination process walks through the list,
|
|
starting at the head, to find the largest expressions first.
|
|
If an expression is eliminated,
|
|
any expression later on in the list, contained in the former expression,
|
|
is removed from the list,
|
|
as expressions can only be eliminated once.
|
|
.PP
|
|
A STL or SDL is emitted after the first occurrence of the expression,
|
|
unless there was an existing local variable that could hold the result.
|
|
.NH 3
|
|
Desirability analysis
|
|
.PP
|
|
Although the global optimizer works on EM code,
|
|
the goal is to improve the quality of the object code.
|
|
Therefore some machine-dependent information is needed
|
|
to decide whether it is desirable to
|
|
eliminate a given expression.
|
|
Because it is impossible for the CS phase to know
|
|
exactly what code will be generated,
|
|
some heuristics are used.
|
|
CS essentially looks for some special cases
|
|
that should not be eliminated.
|
|
These special cases can be turned on or off for a given machine,
|
|
as indicated in a machine descriptor file.
|
|
.PP
|
|
Some operators can sometimes be translated
|
|
into an addressing mode for the machine at hand.
|
|
Such an operator is only eliminated
|
|
if its operand is itself expensive,
|
|
i.e. it is not just a simple load.
|
|
The machine descriptor file contains a set of such operators.
|
|
.PP
|
|
Eliminating the loading of the Local Base or
|
|
the Argument Base by the LXL resp. LXA instruction
|
|
is only beneficial if the difference in lexical levels
|
|
exceeds a certain threshold.
|
|
The machine descriptor file contains this threshold.
|
|
.PP
|
|
Replacing a SAR or a LAR by an AAR followed by a LOI
|
|
may possibly increase the size of the object code.
|
|
We assume that this is only possible when the
|
|
size of the array element is greater than some limit.
|
|
.PP
|
|
There are back ends that can very efficiently translate
|
|
the index computing instruction sequence LOC SLI ADS.
|
|
If this is the case,
|
|
the SLI instruction between a LOC
|
|
and an ADS is not eliminated.
|
|
.PP
|
|
To handle unforseen cases, the descriptor file may also contain
|
|
a set of operators that should never be eliminated.
|
|
.NH 3
|
|
The algorithm
|
|
.PP
|
|
After these preparatory explanations,
|
|
the algorithm itself is easy to understand.
|
|
For each instruction within the current window,
|
|
the following steps are performed in the given order :
|
|
.IP 1.
|
|
Check if this instruction defines an entity.
|
|
If so, the set of entities is updated accordingly.
|
|
.IP 2.
|
|
Kill all entities that might be affected by this instruction.
|
|
.IP 3.
|
|
Simulate the instruction on the fake-stack.
|
|
If this instruction is an operator,
|
|
update the list of available expressions accordingly.
|
|
.PP
|
|
The result of this process is
|
|
a list of available expressions plus the information
|
|
needed to eliminate them.
|
|
Expressions that are desirable to eliminate are eliminated.
|
|
Next, the window is shifted and the process is repeated.
|