Initial revision
This commit is contained in:
parent
2d362c2274
commit
295380491f
136
doc/ego/cj/cj1
Normal file
136
doc/ego/cj/cj1
Normal file
|
@ -0,0 +1,136 @@
|
|||
.bp
|
||||
.NH 1
|
||||
Cross jumping
|
||||
.NH 2
|
||||
Introduction
|
||||
.PP
|
||||
The "Cross Jumping" optimization technique (CJ)
|
||||
.[
|
||||
wulf design optimizing compiler
|
||||
.]
|
||||
is basically a space optimization technique. It looks for pairs of
|
||||
basic blocks (B1,B2), for which:
|
||||
.DS
|
||||
SUCC(B1) = SUCC(B2) = {S}
|
||||
.DE
|
||||
(So B1 and B2 both have one and the same successor).
|
||||
If the last few non-branch instructions are the same for B1 and B2,
|
||||
one such sequence can be eliminated.
|
||||
.DS
|
||||
Pascal:
|
||||
|
||||
if cond then
|
||||
S1
|
||||
S3
|
||||
else
|
||||
S2
|
||||
S3
|
||||
|
||||
(pseudo) EM:
|
||||
|
||||
TEST COND TEST COND
|
||||
BNE *1 BNE *1
|
||||
S1 S1
|
||||
S3 ---> BRA *2
|
||||
BRA *2 1:
|
||||
1: S2
|
||||
S2 2:
|
||||
S3 S3
|
||||
2:
|
||||
|
||||
Fig. 9.1 An example of Cross Jumping
|
||||
.DE
|
||||
As the basic blocks have the same successor,
|
||||
at least one of them ends in an unconditional branch instruction (BRA).
|
||||
Hence no extra branch instruction is ever needed, just the target
|
||||
of an existing branch needs to be changed; neither the program size
|
||||
nor the execution time will ever increase.
|
||||
In general, the execution time will remain the same, unless
|
||||
further optimizations can be applied because of this optimization.
|
||||
.PP
|
||||
This optimization is particularly effective,
|
||||
because it cannot always be done by the programmer at the source level,
|
||||
as demonstrated by the Fig. 8.2.
|
||||
.DS
|
||||
Pascal:
|
||||
|
||||
if cond then
|
||||
x := f(4)
|
||||
else
|
||||
x := g(5)
|
||||
|
||||
|
||||
EM:
|
||||
|
||||
... ...
|
||||
LOC 4 LOC 5
|
||||
CAL F CAL G
|
||||
ASP 2 ASP 2
|
||||
LFR 2 LFR 2
|
||||
STL X STL X
|
||||
|
||||
Fig. 9.2 Effectiveness of Cross Jumping
|
||||
.DE
|
||||
At the source level there is no common tail,
|
||||
but at the EM level there is a common tail.
|
||||
.NH 2
|
||||
Implementation
|
||||
.PP
|
||||
The implementation of cross jumping is rather straightforward.
|
||||
The technique is applied to one procedure at a time.
|
||||
The control flow graph of the procedure
|
||||
is scanned for pairs of basic blocks
|
||||
with the same (single) successor and with common tails.
|
||||
Note that there may be more than two such blocks (e.g. as the result
|
||||
of a case statement).
|
||||
This is dealt with by repeating the entire process until no
|
||||
further optimizations can de done for the current procedure.
|
||||
.sp
|
||||
If a suitable pair of basic blocks has been found, the control flow
|
||||
graph must be altered. One of the basic
|
||||
blocks must be split into two.
|
||||
The control flow graphs before and after the optimization are shown
|
||||
in Fig. 9.3 and Fig. 9.4.
|
||||
.DS
|
||||
|
||||
-------- --------
|
||||
| | | |
|
||||
| S1 | | S2 |
|
||||
| S3 | | S3 |
|
||||
| | | |
|
||||
-------- --------
|
||||
| |
|
||||
|------------------|--------------------|
|
||||
|
|
||||
v
|
||||
|
||||
Fig. 9.3 CFG before optimization
|
||||
.DE
|
||||
.DS
|
||||
|
||||
-------- --------
|
||||
| | | |
|
||||
| S1 | | S2 |
|
||||
| | | |
|
||||
-------- --------
|
||||
| |
|
||||
|--------------------<------------------|
|
||||
v
|
||||
--------
|
||||
| |
|
||||
| S3 |
|
||||
| |
|
||||
--------
|
||||
|
|
||||
v
|
||||
|
||||
Fig. 9.4 CFG after optimization
|
||||
.DE
|
||||
Some attributes of the three resulting blocks (such as immediate dominator)
|
||||
are updated.
|
||||
.PP
|
||||
In some cases, cross jumping might split the computation of an expression
|
||||
into two, by inserting a branch somewhere in the middle.
|
||||
Most code generators will generate very poor assembly code when
|
||||
presented with such EM code.
|
||||
Therefor, cross jumping is not performed in these cases.
|
42
doc/ego/cs/cs1
Normal file
42
doc/ego/cs/cs1
Normal file
|
@ -0,0 +1,42 @@
|
|||
.bp
|
||||
.NH 1
|
||||
Common subexpression elimination
|
||||
.NH 2
|
||||
Introduction
|
||||
.PP
|
||||
The Common Subexpression Elimination optimization technique (CS)
|
||||
tries to eliminate multiple computations of EM expressions
|
||||
that yield the same result.
|
||||
It places the result of one such computation
|
||||
in a temporary variable,
|
||||
and replaces the other computations by a reference
|
||||
to this temporary variable.
|
||||
The primary goal of this technique is to decrease
|
||||
the execution time of the program,
|
||||
but in general it will save space too.
|
||||
.PP
|
||||
As an example of the application of Common Subexpression Elimination,
|
||||
consider the piece of program in Fig. 7.1(a).
|
||||
.DS
|
||||
x := a * b; TMP := a * b; x := a * b;
|
||||
CODE; x := TMP; CODE
|
||||
y := c + a * b; CODE y := x;
|
||||
y := c + TMP;
|
||||
|
||||
(a) (b) (c)
|
||||
|
||||
Fig. 7.1 Examples of Common Subexpression Elimination
|
||||
.DE
|
||||
If neither a nor b is changed in CODE,
|
||||
the instructions can be replaced by those of Fig. 7.1(b),
|
||||
which saves one multiplication,
|
||||
but costs an extra store instruction.
|
||||
If the value of x is not changed in CODE either,
|
||||
the instructions can be replaced by those of Fig. 7.1(c).
|
||||
In this case
|
||||
the extra store is not needed.
|
||||
.PP
|
||||
In the following sections we will describe
|
||||
which transformations are done
|
||||
by CS and how this phase
|
||||
was implemented.
|
83
doc/ego/cs/cs2
Normal file
83
doc/ego/cs/cs2
Normal file
|
@ -0,0 +1,83 @@
|
|||
.NH 2
|
||||
Specification of the Common Subexpression Elimination phase
|
||||
.PP
|
||||
In this section we will describe
|
||||
the window
|
||||
through which CS examines the code,
|
||||
the expressions recognized by CS,
|
||||
and finally the changes made to the code.
|
||||
.NH 3
|
||||
The working window
|
||||
.PP
|
||||
The CS algorithm is applied to the
|
||||
largest sequence of textually adjacent basic blocks
|
||||
B1,..,Bn, for which
|
||||
.DS
|
||||
PRED(Bj) = {Bj-1}, j = 2,..,n.
|
||||
.DE
|
||||
Intuitively, this window consists of straight line code,
|
||||
with only one entry point (at the beginning); it may
|
||||
contain jumps, which should all have their targets outside the window.
|
||||
This is illustrated in Fig. 7.2.
|
||||
.DS
|
||||
x := a * b; (1)
|
||||
if x < 10 then (2)
|
||||
y := a * b; (3)
|
||||
|
||||
Fig. 7.2 The working window of CS
|
||||
.DE
|
||||
Line (2) can only be executed after line (1).
|
||||
Likewise, line (3) can only be executed after
|
||||
line (2).
|
||||
Both a and b have the same values at line (1) and at line (3).
|
||||
.PP
|
||||
Larger windows were avoided.
|
||||
In Fig. 7.3, the value of a at line (4) may have been obtained
|
||||
at more than one point.
|
||||
.DS
|
||||
x := a * b; (1)
|
||||
if x < 10 then (2)
|
||||
a := 100; (3)
|
||||
y := a * b; (4)
|
||||
|
||||
Fig. 7.3 Several working windows
|
||||
.DE
|
||||
.NH 3
|
||||
Recognized expressions.
|
||||
.PP
|
||||
The computations eliminated by CS need not be normal expressions
|
||||
(like "a * b"),
|
||||
but can even consist of a single operand that is expensive to access,
|
||||
such as an array element or a record field.
|
||||
If an array element is used,
|
||||
its address is computed implicitly.
|
||||
CS is able to eliminate either the element itself or its
|
||||
address, whichever one is most profitable.
|
||||
A variable of a textually enclosing procedure may also be
|
||||
expensive to access, depending on the lexical level difference.
|
||||
.NH 3
|
||||
Transformations
|
||||
.PP
|
||||
CS creates a new temporary local variable (TMP)
|
||||
for every eliminated expression,
|
||||
unless it is able to use an existing local variable.
|
||||
It emits code to initialize this variable with the
|
||||
result of the expression.
|
||||
Most recurrences of the expression
|
||||
can simply be replaced by a reference to TMP.
|
||||
If the address of an array element is recognized as
|
||||
a common subexpression,
|
||||
references to the element itself are replaced by
|
||||
indirect references through TMP (see Fig. 7.4).
|
||||
.DS
|
||||
x := A[i]; TMP := &A[i];
|
||||
. . . --> x := *TMP;
|
||||
A[i] := y; . . .
|
||||
*TMP := y;
|
||||
|
||||
Fig. 7.4 Elimination of an array address computation
|
||||
.DE
|
||||
Here, '&' is the 'address of' operator,
|
||||
and unary '*' is the indirection operator.
|
||||
(Note that EM actually has different instructions to do
|
||||
a use-indirect or an assign-indirect.)
|
243
doc/ego/cs/cs3
Normal file
243
doc/ego/cs/cs3
Normal file
|
@ -0,0 +1,243 @@
|
|||
.NH 2
|
||||
Implementation
|
||||
.PP
|
||||
.NH 3
|
||||
The value number method
|
||||
.PP
|
||||
To determine whether two expressions have the same result,
|
||||
there must be some way to determine whether their operands have
|
||||
the same values.
|
||||
We use a system of \fIvalue numbers\fP
|
||||
.[
|
||||
kennedy data flow analysis
|
||||
.]
|
||||
in which each distinct value of whatever type,
|
||||
created or used within the working window,
|
||||
receives a unique identifying number, its value number.
|
||||
Two items have the same value number if and only if,
|
||||
based only upon information from the instructions in the window,
|
||||
their values are provably identical.
|
||||
For example, after processing the statement
|
||||
.DS
|
||||
a := 4;
|
||||
.DE
|
||||
the variable a and the constant 4 have the same value number.
|
||||
.PP
|
||||
The value number of the result of an expression depends only
|
||||
on the kind of operator and the value number(s) of the operand(s).
|
||||
The expressions need not be textually equal, as shown in Fig. 7.5.
|
||||
.DS
|
||||
a := c; (1)
|
||||
use(a * b); (2)
|
||||
d := b; (3)
|
||||
use(c * d); (4)
|
||||
|
||||
Fig. 7.5 Different expressions with the same value number
|
||||
.DE
|
||||
At line (1) a receives the same value number as c.
|
||||
At line (2) d receives the same value number as b.
|
||||
At line (4) the expression "c * d" receives the same value number
|
||||
as the expression "a * b" at line (2),
|
||||
because the value numbers of their left and right operands are the same,
|
||||
and the operator (*) is the same.
|
||||
.PP
|
||||
As another example of the value number method, consider Fig. 7.6.
|
||||
.DS
|
||||
use(a * b); (1)
|
||||
a := 123; (2)
|
||||
use(a * b); (3)
|
||||
|
||||
Fig. 7.6 Identical expressions with the different value numbers
|
||||
.DE
|
||||
Although textually the expressions "a * b" in line 1 and line 3 are equal,
|
||||
a will have different value numbers at line 3 and line 1.
|
||||
The two expressions will not mistakenly be recognized as equivalent.
|
||||
.NH 3
|
||||
Entities
|
||||
.PP
|
||||
The Value Number Method distinguishes between operators and operands.
|
||||
The value numbers of operands are stored in a table,
|
||||
called the \fIsymbol table\fR.
|
||||
The value number of a subexpression depends on the
|
||||
(root) operator of the expression and on the value numbers
|
||||
of its operands.
|
||||
A table of "available expressions" is used to do this mapping.
|
||||
.PP
|
||||
CS recognizes the following kinds of EM operands, called \fIentities\fR:
|
||||
.IP
|
||||
- constant
|
||||
- local variable
|
||||
- external variable
|
||||
- indirectly accessed entity
|
||||
- offsetted entity
|
||||
- address of local variable
|
||||
- address of external variable
|
||||
- address of offsetted entity
|
||||
- address of local base
|
||||
- address of argument base
|
||||
- array element
|
||||
- procedure identifier
|
||||
- floating zero
|
||||
- local base
|
||||
- heap pointer
|
||||
- ignore mask
|
||||
.LP
|
||||
Whenever a new entity is encountered in the working window,
|
||||
it is entered in the symbol table and given a brand new value number.
|
||||
Most entities have attributes (e.g. the offset in
|
||||
the current stackframe for local variables),
|
||||
which are also stored in the symbol table.
|
||||
.PP
|
||||
An entity is called static if its value cannot be changed
|
||||
(e.g. a constant or an address).
|
||||
.NH 3
|
||||
Parsing expressions
|
||||
.PP
|
||||
Common subexpressions are recognized by simulating the behaviour
|
||||
of the EM machine.
|
||||
The EM code is parsed from left to right;
|
||||
as EM is postfix code, this is a bottom up parse.
|
||||
At any point the current state of the EM runtime stack is
|
||||
reflected by a simulated "fake stack",
|
||||
containing descriptions of the parsed operands and expressions.
|
||||
A descriptor consists of:
|
||||
.DS
|
||||
(1) the value number of the operand or expression
|
||||
(2) the size of the operand or expression
|
||||
(3) a pointer to the first line of EM-code
|
||||
that constitutes the operand or expression
|
||||
.DE
|
||||
Note that operands may consist of several EM instructions.
|
||||
Whenever an operator is encountered, the
|
||||
descriptors of its operands are on top of the fake stack.
|
||||
The operator and the value numbers of the operands
|
||||
are used as indices in the table of available expressions,
|
||||
to determine the value number of the expression.
|
||||
.PP
|
||||
During the parsing process,
|
||||
we keep track of the first line of each expression;
|
||||
we need this information when we decide to eliminate the expression.
|
||||
.NH 3
|
||||
Updating entities
|
||||
.PP
|
||||
An entity is assigned a value number when it is
|
||||
used for the first time
|
||||
in the working window.
|
||||
If the entity is used as left hand side of an assignment,
|
||||
it gets the value number of the right hand side.
|
||||
Sometimes the effects of an instruction on an entity cannot
|
||||
be determined exactly;
|
||||
the current value and value number of the entity may become
|
||||
inconsistent.
|
||||
Hence the current value number must be forgotten.
|
||||
This is achieved by giving the entity a new value number
|
||||
that was not used before.
|
||||
The entity is said to be \fIkilled\fR.
|
||||
.PP
|
||||
As information is lost when an entity is killed,
|
||||
CS tries to save as many entities as possible.
|
||||
In case of an indirect assignment through a pointer,
|
||||
some analysis is done to see which variables cannot be altered.
|
||||
For a procedure call, the interprocedural information contained
|
||||
in the procedure table is used to restrict the set of entities that may
|
||||
be changed by the call.
|
||||
Local variables for which the front end generated
|
||||
a register message can never be changed by an indirect assignment
|
||||
or a procedure call.
|
||||
.NH 3
|
||||
Changing the EM text
|
||||
.PP
|
||||
When a new expression comes available,
|
||||
it is checked whether its result is saved in a local
|
||||
that may go in a register.
|
||||
The last line of the expression must be followed
|
||||
by a STL or SDL instruction
|
||||
(depending on the size of the result)
|
||||
and a register message must be present for
|
||||
this local.
|
||||
If there is such a local,
|
||||
it is recorded in the available expressions table.
|
||||
Each time a new occurrence of this expression
|
||||
is found,
|
||||
the value number of the local is compared against
|
||||
the value number of the result.
|
||||
If they are different the local cannot be used and is forgotten.
|
||||
.PP
|
||||
The available expressions are linked in a list.
|
||||
New expressions are linked at the head of the list.
|
||||
In this way expressions that are contained within other
|
||||
expressions appear later in the list,
|
||||
because EM-expressions are postfix.
|
||||
The elimination process walks through the list,
|
||||
starting at the head, to find the largest expressions first.
|
||||
If an expression is eliminated,
|
||||
any expression later on in the list, contained in the former expression,
|
||||
is removed from the list,
|
||||
as expressions can only be eliminated once.
|
||||
.PP
|
||||
A STL or SDL is emitted after the first occurrence of the expression,
|
||||
unless there was an existing local variable that could hold the result.
|
||||
.NH 3
|
||||
Desirability analysis
|
||||
.PP
|
||||
Although the global optimizer works on EM code,
|
||||
the goal is to improve the quality of the object code.
|
||||
Therefore some machine-dependent information is needed
|
||||
to decide whether it is desirable to
|
||||
eliminate a given expression.
|
||||
Because it is impossible for the CS phase to know
|
||||
exactly what code will be generated,
|
||||
some heuristics are used.
|
||||
CS essentially looks for some special cases
|
||||
that should not be eliminated.
|
||||
These special cases can be turned on or off for a given machine,
|
||||
as indicated in a machine descriptor file.
|
||||
.PP
|
||||
Some operators can sometimes be translated
|
||||
into an addressing mode for the machine at hand.
|
||||
Such an operator is only eliminated
|
||||
if its operand is itself expensive,
|
||||
i.e. it is not just a simple load.
|
||||
The machine descriptor file contains a set of such operators.
|
||||
.PP
|
||||
Eliminating the loading of the Local Base or
|
||||
the Argument Base by the LXL resp. LXA instruction
|
||||
is only beneficial if the difference in lexical levels
|
||||
exceeds a certain threshold.
|
||||
The machine descriptor file contains this threshold.
|
||||
.PP
|
||||
Replacing a SAR or a LAR by an AAR followed by a LOI
|
||||
may possibly increase the size of the object code.
|
||||
We assume that this is only possible when the
|
||||
size of the array element is greater than some limit.
|
||||
.PP
|
||||
There are back ends that can very efficiently translate
|
||||
the index computing instruction sequence LOC SLI ADS.
|
||||
If this is the case,
|
||||
the SLI instruction between a LOC
|
||||
and an ADS is not eliminated.
|
||||
.PP
|
||||
To handle unforseen cases, the descriptor file may also contain
|
||||
a set of operators that should never be eliminated.
|
||||
.NH 3
|
||||
The algorithm
|
||||
.PP
|
||||
After these preparatory explanations,
|
||||
the algorithm itself is easy to understand.
|
||||
For each instruction within the current window,
|
||||
the following steps are performed in the given order :
|
||||
.IP 1.
|
||||
Check if this instruction defines an entity.
|
||||
If so, the set of entities is updated accordingly.
|
||||
.IP 2.
|
||||
Kill all entities that might be affected by this instruction.
|
||||
.IP 3.
|
||||
Simulate the instruction on the fake-stack.
|
||||
If this instruction is an operator,
|
||||
update the list of available expressions accordingly.
|
||||
.PP
|
||||
The result of this process is
|
||||
a list of available expressions plus the information
|
||||
needed to eliminate them.
|
||||
Expressions that are desirable to eliminate are eliminated.
|
||||
Next, the window is shifted and the process is repeated.
|
305
doc/ego/cs/cs4
Normal file
305
doc/ego/cs/cs4
Normal file
|
@ -0,0 +1,305 @@
|
|||
.NH 2
|
||||
Implementation.
|
||||
.PP
|
||||
In this section we will discuss the implementation of the CS phase.
|
||||
We will first describe the basic actions that are undertaken
|
||||
by the algorithm, than the algorithm itself.
|
||||
.NH 3
|
||||
Partioning the EM instructions
|
||||
.PP
|
||||
There are over 100 EM instructions.
|
||||
For our purpose we partition this huge set into groups of
|
||||
instructions which can be more or less conveniently handled together.
|
||||
.PP
|
||||
There are groups for all sorts of load instructions:
|
||||
simple loads, expensive loads, loads of an array element.
|
||||
A load is considered \fIexpensive\fP when more than one EM instructions
|
||||
are involved in loading it.
|
||||
The load of a lexical entity is also considered expensive.
|
||||
For instance: LOF is expensive, LAL is not.
|
||||
LAR forms a group on its own,
|
||||
because it is not only an expensive load,
|
||||
but also implicitly includes the ternary operator AAR,
|
||||
which computes the address of the array element.
|
||||
.PP
|
||||
There are groups for all sorts of operators:
|
||||
unary, binary, and ternary.
|
||||
The groups of operators are further partitioned according to the size
|
||||
of their operand(s) and result.
|
||||
\" .PP
|
||||
\" The distinction between operators and expensive loads is not always clear.
|
||||
\" The ADP instruction for example,
|
||||
\" might seem a unary operator because it pops one item
|
||||
\" (a pointer) from the stack.
|
||||
\" However, two ADP-instructions which pop an item with the same value number
|
||||
\" need not have the same result,
|
||||
\" because the attributes (an offset, to be added to the pointer)
|
||||
\" can be different.
|
||||
\" Is it then a binary operator?
|
||||
\" That would give rise to the strange, and undesirable,
|
||||
\" situation that some binary operators pop two operands
|
||||
\" and others pop one.
|
||||
\" The conclusion is inevitable:
|
||||
\" we have been fooled by the name (ADd Pointer).
|
||||
\" The ADP-instruction is an expensive load.
|
||||
\" In this context LAF, meaning Load Address of oFfsetted,
|
||||
\" would have been a better name,
|
||||
\" corresponding to LOF, like LAL,
|
||||
\" Load Address of Local, corresponds to LOL.
|
||||
.PP
|
||||
There are groups for all sorts of stores:
|
||||
direct, indirect, array element.
|
||||
The SAR forms a group on its own for the same reason
|
||||
as appeared with LAR.
|
||||
.PP
|
||||
The effect of the remaining instructions is less clear.
|
||||
They do not help very much in parsing expressions or
|
||||
in constructing our pseudo symboltable.
|
||||
They are partitioned according to the following criteria:
|
||||
.RS
|
||||
.IP "-"
|
||||
They change the value of an entity without using the stack
|
||||
(e.g. ZRL, DEE).
|
||||
.IP "-"
|
||||
They are subroutine calls (CAI, CAL).
|
||||
.IP "-"
|
||||
They change the stack in some irreproduceable way (e.g. ASP, LFR, DUP).
|
||||
.IP "-"
|
||||
They have no effect whatever on the stack or on the entities.
|
||||
This does not mean they can be deleted,
|
||||
but they can be ignored for the moment
|
||||
(e.g. MES, LIN, NOP).
|
||||
.IP "-"
|
||||
Their effect is too complicate too compute,
|
||||
so we just assume worst case behaviour.
|
||||
Hopefully, they do not occur very often.
|
||||
(e.g. MON, STR, BLM).
|
||||
.IP "-"
|
||||
They signal the end of the basic block (e.g. BLT, RET, TRP).
|
||||
.RE
|
||||
.NH 3
|
||||
Parsing expressions
|
||||
.PP
|
||||
To recognize expressions,
|
||||
we simulate the behaviour of the EM machine,
|
||||
by means of a fake-stack.
|
||||
When we scan the instructions in sequential order,
|
||||
we first encounter the instructions that load
|
||||
the operands on the stack,
|
||||
and then the instruction that indicates the operator,
|
||||
because EM expressions are postfix.
|
||||
When we find an instruction to load an operand,
|
||||
we load on the fake-stack a struct with the following information:
|
||||
.DS
|
||||
(1) the value number of the operand
|
||||
(2) the size of the operand
|
||||
(3) a pointer to the first line of EM-code
|
||||
that constitutes the operand
|
||||
.DE
|
||||
In most cases, (3) will point to the line
|
||||
that loaded the operand (e.g. LOL, LOC),
|
||||
i.e. there is only one line that refers to this operand,
|
||||
but sometimes some information must be popped
|
||||
to load the operand (e.g. LOI, LAR).
|
||||
This information must have been pushed before,
|
||||
so we also pop a pointer to the first line that pushed
|
||||
the information.
|
||||
This line is now the first line that defines the operand.
|
||||
.PP
|
||||
When we find the operator instruction,
|
||||
we pop its operand(s) from the fake-stack.
|
||||
The first line that defines the first operand is
|
||||
now the first line of the expression.
|
||||
We now have all information to determine
|
||||
whether the just parsed expression has occurred before.
|
||||
We also know the first and last line of the expression;
|
||||
we need this when we decide to eliminate it.
|
||||
Associated with each available expression is a set of
|
||||
which the elements contains the first and last line of
|
||||
a recurrence of this expression.
|
||||
.PP
|
||||
Not only will the operand(s) be popped from the fake-stack,
|
||||
but the following will be pushed:
|
||||
.DS
|
||||
(1) the value number of the result
|
||||
(2) the size of the result
|
||||
(3) a pointer to the first line of the expression
|
||||
.DE
|
||||
In this way an item on the fake-stack always contains
|
||||
the necessary information.
|
||||
As you see, EM expressions are parsed bottum up.
|
||||
.NH 3
|
||||
Updating entities
|
||||
.PP
|
||||
As said before,
|
||||
we build our private "symboltable",
|
||||
while scanning the EM-instructions.
|
||||
The behaviour of the EM-machine is not only reflected
|
||||
in the fake-stack,
|
||||
but also in the entities.
|
||||
When an entity is created,
|
||||
we do not yet know its value,
|
||||
so we assign a brand new value number to it.
|
||||
Each time a store-instruction is encountered,
|
||||
we change the value number of the target entity of this store
|
||||
to the value number of the token that was popped
|
||||
from the fake-stack.
|
||||
Because entities may overlap,
|
||||
we must also "forget" the value numbers of entities
|
||||
that might be affected by this store.
|
||||
Each such entity will be \fIkilled\fP,
|
||||
i.e. assigned a brand new valuenumber.
|
||||
.PP
|
||||
Because we lose information when we forget
|
||||
the value number of an entity,
|
||||
we try to save as much entities as possible.
|
||||
When we store into an external,
|
||||
we don't have to kill locals and vice versa.
|
||||
Furthermore, we can see whether two locals or
|
||||
two externals overlap,
|
||||
because we know the offset from the local base,
|
||||
resp. the offset within the data block,
|
||||
and the size.
|
||||
The situation becomes more complicated when we have
|
||||
to consider indirection.
|
||||
The worst case is that we store through an unknown pointer.
|
||||
In that case we kill all entities except those locals
|
||||
for which a so-called \fIregister message\fP has been generated;
|
||||
this register message indicates that this local can never be
|
||||
accessed indirectly.
|
||||
If we know this pointer we can be more careful.
|
||||
If it points to a local then the entity that is accessed through
|
||||
this pointer can never overlap with an external.
|
||||
If it points to an external this entity can never overlap with a local.
|
||||
Furthermore, in the latter case,
|
||||
we can find the data block this entity belongs to.
|
||||
Since pointer arithmetic is only defined within a data block,
|
||||
this entity can never overlap with entities that are known to
|
||||
belong to another data block.
|
||||
.PP
|
||||
Not only after a store-instruction but also after a
|
||||
subroutine-call it may be necessary to kill entities;
|
||||
the subroutine may affect global variables or store
|
||||
through a pointer.
|
||||
If a subroutine is called that is not available as EM-text,
|
||||
we assume worst case behaviour,
|
||||
i.e. we kill all entities without register message.
|
||||
.NH 3
|
||||
Additions and replacements.
|
||||
.PP
|
||||
When a new expression comes available,
|
||||
we check whether the result is saved in a local
|
||||
that may go in a register.
|
||||
The last line of the expression must be followed
|
||||
by a STL or SDL instruction,
|
||||
depending on the size of the result
|
||||
(resp. WS and 2*WS),
|
||||
and a register message must be present for
|
||||
this local.
|
||||
If we have found such a local,
|
||||
we store a pointer to it with the available expression.
|
||||
Each time a new occurrence of this expression
|
||||
is found,
|
||||
we compare the value number of the local against
|
||||
the value number of the result.
|
||||
When they are different we remove the pointer to it,
|
||||
because we cannot use it.
|
||||
.PP
|
||||
The available expressions are singly linked in a list.
|
||||
When a new expression comes available,
|
||||
we link it at the head of the list.
|
||||
In this way expressions that are contained within other
|
||||
expressions appear later in the list,
|
||||
because EM-expressions are postfix.
|
||||
When we are going to eliminate expressions,
|
||||
we walk through the list,
|
||||
starting at the head, to find the largest expressions first.
|
||||
When we decide to eliminate an expression,
|
||||
we look at the expressions in the tail of the list,
|
||||
starting from where we are now,
|
||||
to delete expressions that are contained within
|
||||
the chosen one because
|
||||
we cannot eliminate an expression more than once.
|
||||
.PP
|
||||
When we are going to eliminate expressions,
|
||||
and we do not have a local that holds the result,
|
||||
we emit a STL or SDL after the line where the expression
|
||||
was first found.
|
||||
The other occurrences are simply removed,
|
||||
unless they contain instructions that not only have
|
||||
effect on the stack; e.g. messages, stores, calls.
|
||||
Before each instruction that needs the result on the stack,
|
||||
we emit a LOL or LDL.
|
||||
When the expression was an AAR,
|
||||
but the instruction was a LAR or a SAR,
|
||||
we append a LOI resp. a STI of the number of bytes
|
||||
in an array-element after each LOL/LDL.
|
||||
.NH 3
|
||||
Desirability analysis
|
||||
.PP
|
||||
Although the global optimizer works on EM code,
|
||||
the goal is to improve the quality of the object code.
|
||||
Therefore we need some machine dependent information
|
||||
to decide whether it is desirable to
|
||||
eliminate a given expression.
|
||||
Because it is impossible for the CS phase to know
|
||||
exactly what code will be generated,
|
||||
we use some heuristics.
|
||||
In most cases it will save time when we eliminate an
|
||||
operator, so we just do it.
|
||||
We only look for some special cases.
|
||||
.PP
|
||||
Some operators can in some cases be translated
|
||||
into an addressing mode for the machine at hand.
|
||||
We only eliminate such an operator,
|
||||
when its operand is itself "expensive",
|
||||
i.e. not just a simple load.
|
||||
The user of the CS phase has to supply
|
||||
a set of such operators.
|
||||
.PP
|
||||
Eliminating the loading of the Local Base or
|
||||
the Argument Base by the LXL resp. LXA instruction
|
||||
is only beneficial when the number of lexical levels
|
||||
we have to go back exceeds a certain threshold.
|
||||
This threshold will be different when registers
|
||||
are saved by the back end.
|
||||
The user must supply this threshold.
|
||||
.PP
|
||||
Replacing a SAR or a LAR by an AAR followed by a LOI
|
||||
may possibly increase the size of the object code.
|
||||
We assume that this is only possible when the
|
||||
size of the array element is greater than some
|
||||
(user-supplied) limit.
|
||||
.PP
|
||||
There are back ends that can very efficiently translate
|
||||
the index computing instruction sequence LOC SLI ADS.
|
||||
If this is the case,
|
||||
we do not eliminate the SLI instruction between a LOC
|
||||
and an ADS.
|
||||
.PP
|
||||
To handle unforeseen cases, the user may also supply
|
||||
a set of operators that should never be eliminated.
|
||||
.NH 3
|
||||
The algorithm
|
||||
.PP
|
||||
After these preparatory explanations,
|
||||
we can be short about the algorithm itself.
|
||||
For each instruction within our window,
|
||||
the following steps are performed in the order given:
|
||||
.IP 1.
|
||||
We check if this instructin defines an entity.
|
||||
If this is the case the set of entities is updated accordingly.
|
||||
.IP 2.
|
||||
We kill all entities that might be affected by this instruction.
|
||||
.IP 3.
|
||||
The instruction is simulated on the fake-stack.
|
||||
Copy propagation is done.
|
||||
If this instruction is an operator,
|
||||
we update the list of available expressions accordingly.
|
||||
.PP
|
||||
When we have processed all instructions this way,
|
||||
we have built a list of available expressions plus the information we
|
||||
need to eliminate them.
|
||||
Those expressions of which desirability analysis tells us so,
|
||||
we eliminate.
|
||||
The we shift our window and continue.
|
46
doc/ego/cs/cs5
Normal file
46
doc/ego/cs/cs5
Normal file
|
@ -0,0 +1,46 @@
|
|||
.NH 2
|
||||
Source files of CS
|
||||
.PP
|
||||
The sources of CS are in the following files and packages:
|
||||
.IP cs.h 14
|
||||
declarations of global variables and data structures
|
||||
.IP cs.c
|
||||
the routine main;
|
||||
a driving routine to process
|
||||
the basic blocks in the right order
|
||||
.IP vnm
|
||||
implements a procedure that performs
|
||||
the value numbering on one basic block
|
||||
.IP eliminate
|
||||
implements a procedure that does the
|
||||
transformations, if desirable
|
||||
.IP avail
|
||||
implements a procedure that manipulates the list of available expressions
|
||||
.IP entity
|
||||
implements a procedure that manipulates the set of entities
|
||||
.IP getentity
|
||||
implements a procedure that extracts the
|
||||
pseudo symboltable information from EM-instructions;
|
||||
uses a small table
|
||||
.IP kill
|
||||
implements several routines that find the entities
|
||||
that might be changed by EM-instructions
|
||||
and kill them
|
||||
.IP partition
|
||||
implements several routines that partition the huge set
|
||||
of EM-instructions into more or less manageable,
|
||||
more or less logical chunks
|
||||
.IP profit
|
||||
implements a procedure that decides whether it
|
||||
is advantageous to eliminate an expression;
|
||||
also removes expressions with side-effects
|
||||
.IP stack
|
||||
implements the fake-stack and operations on it
|
||||
.IP alloc
|
||||
implements several allocation routines
|
||||
.IP aux
|
||||
implements several auxiliary routines
|
||||
.IP debug
|
||||
implements several routines to provide debugging
|
||||
and verbose output
|
||||
.LP
|
Loading…
Reference in a new issue