Initial revision

This commit is contained in:
ceriel 1987-03-03 10:44:56 +00:00
parent 2d362c2274
commit 295380491f
6 changed files with 855 additions and 0 deletions

136
doc/ego/cj/cj1 Normal file
View file

@ -0,0 +1,136 @@
.bp
.NH 1
Cross jumping
.NH 2
Introduction
.PP
The "Cross Jumping" optimization technique (CJ)
.[
wulf design optimizing compiler
.]
is basically a space optimization technique. It looks for pairs of
basic blocks (B1,B2), for which:
.DS
SUCC(B1) = SUCC(B2) = {S}
.DE
(So B1 and B2 both have one and the same successor).
If the last few non-branch instructions are the same for B1 and B2,
one such sequence can be eliminated.
.DS
Pascal:
if cond then
S1
S3
else
S2
S3
(pseudo) EM:
TEST COND TEST COND
BNE *1 BNE *1
S1 S1
S3 ---> BRA *2
BRA *2 1:
1: S2
S2 2:
S3 S3
2:
Fig. 9.1 An example of Cross Jumping
.DE
As the basic blocks have the same successor,
at least one of them ends in an unconditional branch instruction (BRA).
Hence no extra branch instruction is ever needed, just the target
of an existing branch needs to be changed; neither the program size
nor the execution time will ever increase.
In general, the execution time will remain the same, unless
further optimizations can be applied because of this optimization.
.PP
This optimization is particularly effective,
because it cannot always be done by the programmer at the source level,
as demonstrated by the Fig. 8.2.
.DS
Pascal:
if cond then
x := f(4)
else
x := g(5)
EM:
... ...
LOC 4 LOC 5
CAL F CAL G
ASP 2 ASP 2
LFR 2 LFR 2
STL X STL X
Fig. 9.2 Effectiveness of Cross Jumping
.DE
At the source level there is no common tail,
but at the EM level there is a common tail.
.NH 2
Implementation
.PP
The implementation of cross jumping is rather straightforward.
The technique is applied to one procedure at a time.
The control flow graph of the procedure
is scanned for pairs of basic blocks
with the same (single) successor and with common tails.
Note that there may be more than two such blocks (e.g. as the result
of a case statement).
This is dealt with by repeating the entire process until no
further optimizations can de done for the current procedure.
.sp
If a suitable pair of basic blocks has been found, the control flow
graph must be altered. One of the basic
blocks must be split into two.
The control flow graphs before and after the optimization are shown
in Fig. 9.3 and Fig. 9.4.
.DS
-------- --------
| | | |
| S1 | | S2 |
| S3 | | S3 |
| | | |
-------- --------
| |
|------------------|--------------------|
|
v
Fig. 9.3 CFG before optimization
.DE
.DS
-------- --------
| | | |
| S1 | | S2 |
| | | |
-------- --------
| |
|--------------------<------------------|
v
--------
| |
| S3 |
| |
--------
|
v
Fig. 9.4 CFG after optimization
.DE
Some attributes of the three resulting blocks (such as immediate dominator)
are updated.
.PP
In some cases, cross jumping might split the computation of an expression
into two, by inserting a branch somewhere in the middle.
Most code generators will generate very poor assembly code when
presented with such EM code.
Therefor, cross jumping is not performed in these cases.

42
doc/ego/cs/cs1 Normal file
View file

@ -0,0 +1,42 @@
.bp
.NH 1
Common subexpression elimination
.NH 2
Introduction
.PP
The Common Subexpression Elimination optimization technique (CS)
tries to eliminate multiple computations of EM expressions
that yield the same result.
It places the result of one such computation
in a temporary variable,
and replaces the other computations by a reference
to this temporary variable.
The primary goal of this technique is to decrease
the execution time of the program,
but in general it will save space too.
.PP
As an example of the application of Common Subexpression Elimination,
consider the piece of program in Fig. 7.1(a).
.DS
x := a * b; TMP := a * b; x := a * b;
CODE; x := TMP; CODE
y := c + a * b; CODE y := x;
y := c + TMP;
(a) (b) (c)
Fig. 7.1 Examples of Common Subexpression Elimination
.DE
If neither a nor b is changed in CODE,
the instructions can be replaced by those of Fig. 7.1(b),
which saves one multiplication,
but costs an extra store instruction.
If the value of x is not changed in CODE either,
the instructions can be replaced by those of Fig. 7.1(c).
In this case
the extra store is not needed.
.PP
In the following sections we will describe
which transformations are done
by CS and how this phase
was implemented.

83
doc/ego/cs/cs2 Normal file
View file

@ -0,0 +1,83 @@
.NH 2
Specification of the Common Subexpression Elimination phase
.PP
In this section we will describe
the window
through which CS examines the code,
the expressions recognized by CS,
and finally the changes made to the code.
.NH 3
The working window
.PP
The CS algorithm is applied to the
largest sequence of textually adjacent basic blocks
B1,..,Bn, for which
.DS
PRED(Bj) = {Bj-1}, j = 2,..,n.
.DE
Intuitively, this window consists of straight line code,
with only one entry point (at the beginning); it may
contain jumps, which should all have their targets outside the window.
This is illustrated in Fig. 7.2.
.DS
x := a * b; (1)
if x < 10 then (2)
y := a * b; (3)
Fig. 7.2 The working window of CS
.DE
Line (2) can only be executed after line (1).
Likewise, line (3) can only be executed after
line (2).
Both a and b have the same values at line (1) and at line (3).
.PP
Larger windows were avoided.
In Fig. 7.3, the value of a at line (4) may have been obtained
at more than one point.
.DS
x := a * b; (1)
if x < 10 then (2)
a := 100; (3)
y := a * b; (4)
Fig. 7.3 Several working windows
.DE
.NH 3
Recognized expressions.
.PP
The computations eliminated by CS need not be normal expressions
(like "a * b"),
but can even consist of a single operand that is expensive to access,
such as an array element or a record field.
If an array element is used,
its address is computed implicitly.
CS is able to eliminate either the element itself or its
address, whichever one is most profitable.
A variable of a textually enclosing procedure may also be
expensive to access, depending on the lexical level difference.
.NH 3
Transformations
.PP
CS creates a new temporary local variable (TMP)
for every eliminated expression,
unless it is able to use an existing local variable.
It emits code to initialize this variable with the
result of the expression.
Most recurrences of the expression
can simply be replaced by a reference to TMP.
If the address of an array element is recognized as
a common subexpression,
references to the element itself are replaced by
indirect references through TMP (see Fig. 7.4).
.DS
x := A[i]; TMP := &A[i];
. . . --> x := *TMP;
A[i] := y; . . .
*TMP := y;
Fig. 7.4 Elimination of an array address computation
.DE
Here, '&' is the 'address of' operator,
and unary '*' is the indirection operator.
(Note that EM actually has different instructions to do
a use-indirect or an assign-indirect.)

243
doc/ego/cs/cs3 Normal file
View file

@ -0,0 +1,243 @@
.NH 2
Implementation
.PP
.NH 3
The value number method
.PP
To determine whether two expressions have the same result,
there must be some way to determine whether their operands have
the same values.
We use a system of \fIvalue numbers\fP
.[
kennedy data flow analysis
.]
in which each distinct value of whatever type,
created or used within the working window,
receives a unique identifying number, its value number.
Two items have the same value number if and only if,
based only upon information from the instructions in the window,
their values are provably identical.
For example, after processing the statement
.DS
a := 4;
.DE
the variable a and the constant 4 have the same value number.
.PP
The value number of the result of an expression depends only
on the kind of operator and the value number(s) of the operand(s).
The expressions need not be textually equal, as shown in Fig. 7.5.
.DS
a := c; (1)
use(a * b); (2)
d := b; (3)
use(c * d); (4)
Fig. 7.5 Different expressions with the same value number
.DE
At line (1) a receives the same value number as c.
At line (2) d receives the same value number as b.
At line (4) the expression "c * d" receives the same value number
as the expression "a * b" at line (2),
because the value numbers of their left and right operands are the same,
and the operator (*) is the same.
.PP
As another example of the value number method, consider Fig. 7.6.
.DS
use(a * b); (1)
a := 123; (2)
use(a * b); (3)
Fig. 7.6 Identical expressions with the different value numbers
.DE
Although textually the expressions "a * b" in line 1 and line 3 are equal,
a will have different value numbers at line 3 and line 1.
The two expressions will not mistakenly be recognized as equivalent.
.NH 3
Entities
.PP
The Value Number Method distinguishes between operators and operands.
The value numbers of operands are stored in a table,
called the \fIsymbol table\fR.
The value number of a subexpression depends on the
(root) operator of the expression and on the value numbers
of its operands.
A table of "available expressions" is used to do this mapping.
.PP
CS recognizes the following kinds of EM operands, called \fIentities\fR:
.IP
- constant
- local variable
- external variable
- indirectly accessed entity
- offsetted entity
- address of local variable
- address of external variable
- address of offsetted entity
- address of local base
- address of argument base
- array element
- procedure identifier
- floating zero
- local base
- heap pointer
- ignore mask
.LP
Whenever a new entity is encountered in the working window,
it is entered in the symbol table and given a brand new value number.
Most entities have attributes (e.g. the offset in
the current stackframe for local variables),
which are also stored in the symbol table.
.PP
An entity is called static if its value cannot be changed
(e.g. a constant or an address).
.NH 3
Parsing expressions
.PP
Common subexpressions are recognized by simulating the behaviour
of the EM machine.
The EM code is parsed from left to right;
as EM is postfix code, this is a bottom up parse.
At any point the current state of the EM runtime stack is
reflected by a simulated "fake stack",
containing descriptions of the parsed operands and expressions.
A descriptor consists of:
.DS
(1) the value number of the operand or expression
(2) the size of the operand or expression
(3) a pointer to the first line of EM-code
that constitutes the operand or expression
.DE
Note that operands may consist of several EM instructions.
Whenever an operator is encountered, the
descriptors of its operands are on top of the fake stack.
The operator and the value numbers of the operands
are used as indices in the table of available expressions,
to determine the value number of the expression.
.PP
During the parsing process,
we keep track of the first line of each expression;
we need this information when we decide to eliminate the expression.
.NH 3
Updating entities
.PP
An entity is assigned a value number when it is
used for the first time
in the working window.
If the entity is used as left hand side of an assignment,
it gets the value number of the right hand side.
Sometimes the effects of an instruction on an entity cannot
be determined exactly;
the current value and value number of the entity may become
inconsistent.
Hence the current value number must be forgotten.
This is achieved by giving the entity a new value number
that was not used before.
The entity is said to be \fIkilled\fR.
.PP
As information is lost when an entity is killed,
CS tries to save as many entities as possible.
In case of an indirect assignment through a pointer,
some analysis is done to see which variables cannot be altered.
For a procedure call, the interprocedural information contained
in the procedure table is used to restrict the set of entities that may
be changed by the call.
Local variables for which the front end generated
a register message can never be changed by an indirect assignment
or a procedure call.
.NH 3
Changing the EM text
.PP
When a new expression comes available,
it is checked whether its result is saved in a local
that may go in a register.
The last line of the expression must be followed
by a STL or SDL instruction
(depending on the size of the result)
and a register message must be present for
this local.
If there is such a local,
it is recorded in the available expressions table.
Each time a new occurrence of this expression
is found,
the value number of the local is compared against
the value number of the result.
If they are different the local cannot be used and is forgotten.
.PP
The available expressions are linked in a list.
New expressions are linked at the head of the list.
In this way expressions that are contained within other
expressions appear later in the list,
because EM-expressions are postfix.
The elimination process walks through the list,
starting at the head, to find the largest expressions first.
If an expression is eliminated,
any expression later on in the list, contained in the former expression,
is removed from the list,
as expressions can only be eliminated once.
.PP
A STL or SDL is emitted after the first occurrence of the expression,
unless there was an existing local variable that could hold the result.
.NH 3
Desirability analysis
.PP
Although the global optimizer works on EM code,
the goal is to improve the quality of the object code.
Therefore some machine-dependent information is needed
to decide whether it is desirable to
eliminate a given expression.
Because it is impossible for the CS phase to know
exactly what code will be generated,
some heuristics are used.
CS essentially looks for some special cases
that should not be eliminated.
These special cases can be turned on or off for a given machine,
as indicated in a machine descriptor file.
.PP
Some operators can sometimes be translated
into an addressing mode for the machine at hand.
Such an operator is only eliminated
if its operand is itself expensive,
i.e. it is not just a simple load.
The machine descriptor file contains a set of such operators.
.PP
Eliminating the loading of the Local Base or
the Argument Base by the LXL resp. LXA instruction
is only beneficial if the difference in lexical levels
exceeds a certain threshold.
The machine descriptor file contains this threshold.
.PP
Replacing a SAR or a LAR by an AAR followed by a LOI
may possibly increase the size of the object code.
We assume that this is only possible when the
size of the array element is greater than some limit.
.PP
There are back ends that can very efficiently translate
the index computing instruction sequence LOC SLI ADS.
If this is the case,
the SLI instruction between a LOC
and an ADS is not eliminated.
.PP
To handle unforseen cases, the descriptor file may also contain
a set of operators that should never be eliminated.
.NH 3
The algorithm
.PP
After these preparatory explanations,
the algorithm itself is easy to understand.
For each instruction within the current window,
the following steps are performed in the given order :
.IP 1.
Check if this instruction defines an entity.
If so, the set of entities is updated accordingly.
.IP 2.
Kill all entities that might be affected by this instruction.
.IP 3.
Simulate the instruction on the fake-stack.
If this instruction is an operator,
update the list of available expressions accordingly.
.PP
The result of this process is
a list of available expressions plus the information
needed to eliminate them.
Expressions that are desirable to eliminate are eliminated.
Next, the window is shifted and the process is repeated.

305
doc/ego/cs/cs4 Normal file
View file

@ -0,0 +1,305 @@
.NH 2
Implementation.
.PP
In this section we will discuss the implementation of the CS phase.
We will first describe the basic actions that are undertaken
by the algorithm, than the algorithm itself.
.NH 3
Partioning the EM instructions
.PP
There are over 100 EM instructions.
For our purpose we partition this huge set into groups of
instructions which can be more or less conveniently handled together.
.PP
There are groups for all sorts of load instructions:
simple loads, expensive loads, loads of an array element.
A load is considered \fIexpensive\fP when more than one EM instructions
are involved in loading it.
The load of a lexical entity is also considered expensive.
For instance: LOF is expensive, LAL is not.
LAR forms a group on its own,
because it is not only an expensive load,
but also implicitly includes the ternary operator AAR,
which computes the address of the array element.
.PP
There are groups for all sorts of operators:
unary, binary, and ternary.
The groups of operators are further partitioned according to the size
of their operand(s) and result.
\" .PP
\" The distinction between operators and expensive loads is not always clear.
\" The ADP instruction for example,
\" might seem a unary operator because it pops one item
\" (a pointer) from the stack.
\" However, two ADP-instructions which pop an item with the same value number
\" need not have the same result,
\" because the attributes (an offset, to be added to the pointer)
\" can be different.
\" Is it then a binary operator?
\" That would give rise to the strange, and undesirable,
\" situation that some binary operators pop two operands
\" and others pop one.
\" The conclusion is inevitable:
\" we have been fooled by the name (ADd Pointer).
\" The ADP-instruction is an expensive load.
\" In this context LAF, meaning Load Address of oFfsetted,
\" would have been a better name,
\" corresponding to LOF, like LAL,
\" Load Address of Local, corresponds to LOL.
.PP
There are groups for all sorts of stores:
direct, indirect, array element.
The SAR forms a group on its own for the same reason
as appeared with LAR.
.PP
The effect of the remaining instructions is less clear.
They do not help very much in parsing expressions or
in constructing our pseudo symboltable.
They are partitioned according to the following criteria:
.RS
.IP "-"
They change the value of an entity without using the stack
(e.g. ZRL, DEE).
.IP "-"
They are subroutine calls (CAI, CAL).
.IP "-"
They change the stack in some irreproduceable way (e.g. ASP, LFR, DUP).
.IP "-"
They have no effect whatever on the stack or on the entities.
This does not mean they can be deleted,
but they can be ignored for the moment
(e.g. MES, LIN, NOP).
.IP "-"
Their effect is too complicate too compute,
so we just assume worst case behaviour.
Hopefully, they do not occur very often.
(e.g. MON, STR, BLM).
.IP "-"
They signal the end of the basic block (e.g. BLT, RET, TRP).
.RE
.NH 3
Parsing expressions
.PP
To recognize expressions,
we simulate the behaviour of the EM machine,
by means of a fake-stack.
When we scan the instructions in sequential order,
we first encounter the instructions that load
the operands on the stack,
and then the instruction that indicates the operator,
because EM expressions are postfix.
When we find an instruction to load an operand,
we load on the fake-stack a struct with the following information:
.DS
(1) the value number of the operand
(2) the size of the operand
(3) a pointer to the first line of EM-code
that constitutes the operand
.DE
In most cases, (3) will point to the line
that loaded the operand (e.g. LOL, LOC),
i.e. there is only one line that refers to this operand,
but sometimes some information must be popped
to load the operand (e.g. LOI, LAR).
This information must have been pushed before,
so we also pop a pointer to the first line that pushed
the information.
This line is now the first line that defines the operand.
.PP
When we find the operator instruction,
we pop its operand(s) from the fake-stack.
The first line that defines the first operand is
now the first line of the expression.
We now have all information to determine
whether the just parsed expression has occurred before.
We also know the first and last line of the expression;
we need this when we decide to eliminate it.
Associated with each available expression is a set of
which the elements contains the first and last line of
a recurrence of this expression.
.PP
Not only will the operand(s) be popped from the fake-stack,
but the following will be pushed:
.DS
(1) the value number of the result
(2) the size of the result
(3) a pointer to the first line of the expression
.DE
In this way an item on the fake-stack always contains
the necessary information.
As you see, EM expressions are parsed bottum up.
.NH 3
Updating entities
.PP
As said before,
we build our private "symboltable",
while scanning the EM-instructions.
The behaviour of the EM-machine is not only reflected
in the fake-stack,
but also in the entities.
When an entity is created,
we do not yet know its value,
so we assign a brand new value number to it.
Each time a store-instruction is encountered,
we change the value number of the target entity of this store
to the value number of the token that was popped
from the fake-stack.
Because entities may overlap,
we must also "forget" the value numbers of entities
that might be affected by this store.
Each such entity will be \fIkilled\fP,
i.e. assigned a brand new valuenumber.
.PP
Because we lose information when we forget
the value number of an entity,
we try to save as much entities as possible.
When we store into an external,
we don't have to kill locals and vice versa.
Furthermore, we can see whether two locals or
two externals overlap,
because we know the offset from the local base,
resp. the offset within the data block,
and the size.
The situation becomes more complicated when we have
to consider indirection.
The worst case is that we store through an unknown pointer.
In that case we kill all entities except those locals
for which a so-called \fIregister message\fP has been generated;
this register message indicates that this local can never be
accessed indirectly.
If we know this pointer we can be more careful.
If it points to a local then the entity that is accessed through
this pointer can never overlap with an external.
If it points to an external this entity can never overlap with a local.
Furthermore, in the latter case,
we can find the data block this entity belongs to.
Since pointer arithmetic is only defined within a data block,
this entity can never overlap with entities that are known to
belong to another data block.
.PP
Not only after a store-instruction but also after a
subroutine-call it may be necessary to kill entities;
the subroutine may affect global variables or store
through a pointer.
If a subroutine is called that is not available as EM-text,
we assume worst case behaviour,
i.e. we kill all entities without register message.
.NH 3
Additions and replacements.
.PP
When a new expression comes available,
we check whether the result is saved in a local
that may go in a register.
The last line of the expression must be followed
by a STL or SDL instruction,
depending on the size of the result
(resp. WS and 2*WS),
and a register message must be present for
this local.
If we have found such a local,
we store a pointer to it with the available expression.
Each time a new occurrence of this expression
is found,
we compare the value number of the local against
the value number of the result.
When they are different we remove the pointer to it,
because we cannot use it.
.PP
The available expressions are singly linked in a list.
When a new expression comes available,
we link it at the head of the list.
In this way expressions that are contained within other
expressions appear later in the list,
because EM-expressions are postfix.
When we are going to eliminate expressions,
we walk through the list,
starting at the head, to find the largest expressions first.
When we decide to eliminate an expression,
we look at the expressions in the tail of the list,
starting from where we are now,
to delete expressions that are contained within
the chosen one because
we cannot eliminate an expression more than once.
.PP
When we are going to eliminate expressions,
and we do not have a local that holds the result,
we emit a STL or SDL after the line where the expression
was first found.
The other occurrences are simply removed,
unless they contain instructions that not only have
effect on the stack; e.g. messages, stores, calls.
Before each instruction that needs the result on the stack,
we emit a LOL or LDL.
When the expression was an AAR,
but the instruction was a LAR or a SAR,
we append a LOI resp. a STI of the number of bytes
in an array-element after each LOL/LDL.
.NH 3
Desirability analysis
.PP
Although the global optimizer works on EM code,
the goal is to improve the quality of the object code.
Therefore we need some machine dependent information
to decide whether it is desirable to
eliminate a given expression.
Because it is impossible for the CS phase to know
exactly what code will be generated,
we use some heuristics.
In most cases it will save time when we eliminate an
operator, so we just do it.
We only look for some special cases.
.PP
Some operators can in some cases be translated
into an addressing mode for the machine at hand.
We only eliminate such an operator,
when its operand is itself "expensive",
i.e. not just a simple load.
The user of the CS phase has to supply
a set of such operators.
.PP
Eliminating the loading of the Local Base or
the Argument Base by the LXL resp. LXA instruction
is only beneficial when the number of lexical levels
we have to go back exceeds a certain threshold.
This threshold will be different when registers
are saved by the back end.
The user must supply this threshold.
.PP
Replacing a SAR or a LAR by an AAR followed by a LOI
may possibly increase the size of the object code.
We assume that this is only possible when the
size of the array element is greater than some
(user-supplied) limit.
.PP
There are back ends that can very efficiently translate
the index computing instruction sequence LOC SLI ADS.
If this is the case,
we do not eliminate the SLI instruction between a LOC
and an ADS.
.PP
To handle unforeseen cases, the user may also supply
a set of operators that should never be eliminated.
.NH 3
The algorithm
.PP
After these preparatory explanations,
we can be short about the algorithm itself.
For each instruction within our window,
the following steps are performed in the order given:
.IP 1.
We check if this instructin defines an entity.
If this is the case the set of entities is updated accordingly.
.IP 2.
We kill all entities that might be affected by this instruction.
.IP 3.
The instruction is simulated on the fake-stack.
Copy propagation is done.
If this instruction is an operator,
we update the list of available expressions accordingly.
.PP
When we have processed all instructions this way,
we have built a list of available expressions plus the information we
need to eliminate them.
Those expressions of which desirability analysis tells us so,
we eliminate.
The we shift our window and continue.

46
doc/ego/cs/cs5 Normal file
View file

@ -0,0 +1,46 @@
.NH 2
Source files of CS
.PP
The sources of CS are in the following files and packages:
.IP cs.h 14
declarations of global variables and data structures
.IP cs.c
the routine main;
a driving routine to process
the basic blocks in the right order
.IP vnm
implements a procedure that performs
the value numbering on one basic block
.IP eliminate
implements a procedure that does the
transformations, if desirable
.IP avail
implements a procedure that manipulates the list of available expressions
.IP entity
implements a procedure that manipulates the set of entities
.IP getentity
implements a procedure that extracts the
pseudo symboltable information from EM-instructions;
uses a small table
.IP kill
implements several routines that find the entities
that might be changed by EM-instructions
and kill them
.IP partition
implements several routines that partition the huge set
of EM-instructions into more or less manageable,
more or less logical chunks
.IP profit
implements a procedure that decides whether it
is advantageous to eliminate an expression;
also removes expressions with side-effects
.IP stack
implements the fake-stack and operations on it
.IP alloc
implements several allocation routines
.IP aux
implements several auxiliary routines
.IP debug
implements several routines to provide debugging
and verbose output
.LP