Initial revision
This commit is contained in:
parent
4d4c8b45fb
commit
004f017550
30 changed files with 3903 additions and 0 deletions
57
doc/ego/ic/ic1
Normal file
57
doc/ego/ic/ic1
Normal file
|
@ -0,0 +1,57 @@
|
||||||
|
.bp
|
||||||
|
.NH
|
||||||
|
The Intermediate Code and the IC phase
|
||||||
|
.PP
|
||||||
|
In this chapter the intermediate code of the EM global optimizer
|
||||||
|
will be defined.
|
||||||
|
The 'Intermediate Code construction' phase (IC),
|
||||||
|
which builds the initial intermediate code from
|
||||||
|
EM Compact Assembly Language,
|
||||||
|
will be described.
|
||||||
|
.NH 2
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
The EM global optimizer is a multi pass program,
|
||||||
|
hence there is a need for an intermediate code.
|
||||||
|
Usually, programs in the Amsterdam Compiler Kit use the
|
||||||
|
Compact Assembly Language format
|
||||||
|
.[~[
|
||||||
|
keizer architecture
|
||||||
|
.], section 11.2]
|
||||||
|
for this purpose.
|
||||||
|
Although this code has some convenient features,
|
||||||
|
such as being compact,
|
||||||
|
it is quite unsuitable in our case,
|
||||||
|
because of a number of reasons.
|
||||||
|
At first, the code lacks global information
|
||||||
|
about whole procedures or whole basic blocks.
|
||||||
|
Second, it uses identifiers ('names') to bind
|
||||||
|
defining and applied occurrences of
|
||||||
|
procedures, data labels and instruction labels.
|
||||||
|
Although this is usual in high level programming
|
||||||
|
languages, it is awkward in an intermediate code
|
||||||
|
that must be read many times.
|
||||||
|
Each pass of the optimizer would have
|
||||||
|
to incorporate an identifier look-up mechanism
|
||||||
|
to associate a defining occurrence with each
|
||||||
|
applied occurrence of an identifier.
|
||||||
|
Finally, EM programs are used to declare blocks of bytes,
|
||||||
|
rather than variables. A 'hol 6' instruction may be used to
|
||||||
|
declare three 2-byte variables.
|
||||||
|
Clearly, the optimizer wants to deal with variables, and
|
||||||
|
not with rows of bytes.
|
||||||
|
.PP
|
||||||
|
To overcome these problems, we have developed a new
|
||||||
|
intermediate code.
|
||||||
|
This code does not merely consist of the EM instructions,
|
||||||
|
but also contains global information in the
|
||||||
|
form of tables and graphs.
|
||||||
|
Before describing the intermediate code we will
|
||||||
|
first leap aside to outline
|
||||||
|
the problems one generally encounters
|
||||||
|
when trying to store complex data structures such as
|
||||||
|
graphs outside the program, i.e. in a file.
|
||||||
|
We trust this will enhance the
|
||||||
|
comprehensibility of the
|
||||||
|
intermediate code definition and the design and implementation
|
||||||
|
of the IC phase.
|
146
doc/ego/ic/ic2
Normal file
146
doc/ego/ic/ic2
Normal file
|
@ -0,0 +1,146 @@
|
||||||
|
.NH 2
|
||||||
|
Representation of complex data structures in a sequential file
|
||||||
|
.PP
|
||||||
|
Most programmers are quite used to deal with
|
||||||
|
complex data structures, such as
|
||||||
|
arrays, graphs and trees.
|
||||||
|
There are some particular problems that occur
|
||||||
|
when storing such a data structure
|
||||||
|
in a sequential file.
|
||||||
|
We call data that is kept in
|
||||||
|
main memory
|
||||||
|
.UL internal
|
||||||
|
,as opposed to
|
||||||
|
.UL external
|
||||||
|
data
|
||||||
|
that is kept in a file outside the program.
|
||||||
|
.sp
|
||||||
|
We assume a simple data structure of a
|
||||||
|
scalar type (integer, floating point number)
|
||||||
|
has some known external representation.
|
||||||
|
An
|
||||||
|
.UL array
|
||||||
|
having elements of a scalar type can be represented
|
||||||
|
externally easily, by successively
|
||||||
|
representing its elements.
|
||||||
|
The external representation may be preceded by a
|
||||||
|
number, giving the length of the array.
|
||||||
|
Now, consider a linear, singly linked list,
|
||||||
|
the elements of which look like:
|
||||||
|
.DS
|
||||||
|
record
|
||||||
|
data: scalar_type;
|
||||||
|
next: pointer_type;
|
||||||
|
end;
|
||||||
|
.DE
|
||||||
|
It is significant to note that the "next"
|
||||||
|
fields of the elements only have a meaning within
|
||||||
|
main memory.
|
||||||
|
The field contains the address of some location in
|
||||||
|
main memory.
|
||||||
|
If a list element is written to a file in
|
||||||
|
some program,
|
||||||
|
and read by another program,
|
||||||
|
the element will be allocated at a different
|
||||||
|
address in main memory.
|
||||||
|
Hence this address value is completely
|
||||||
|
useless outside the program.
|
||||||
|
.sp
|
||||||
|
One may represent the list by ignoring these "next" fields
|
||||||
|
and storing the data items in the order they are linked.
|
||||||
|
The "next" fields are represented \fIimplicitly\fR.
|
||||||
|
When the file is read again,
|
||||||
|
the same list can be reconstructed.
|
||||||
|
In order to know where the external representation of the
|
||||||
|
list ends,
|
||||||
|
it may be useful to put the length of
|
||||||
|
the list in front of it.
|
||||||
|
.sp
|
||||||
|
Note that arrays and linear lists have the
|
||||||
|
same external representation.
|
||||||
|
.PP
|
||||||
|
A doubly linked, linear list,
|
||||||
|
with elements of the type:
|
||||||
|
.DS
|
||||||
|
record
|
||||||
|
data: scalar_type;
|
||||||
|
next,
|
||||||
|
previous: pointer_type;
|
||||||
|
end
|
||||||
|
.DE
|
||||||
|
can be represented in precisely the same way.
|
||||||
|
Both the "next" and the "previous" fields are represented
|
||||||
|
implicitly.
|
||||||
|
.PP
|
||||||
|
Next, consider a binary tree,
|
||||||
|
the nodes of which have type:
|
||||||
|
.DS
|
||||||
|
record
|
||||||
|
data: scalar_type;
|
||||||
|
left,
|
||||||
|
right: pointer_type;
|
||||||
|
end
|
||||||
|
.DE
|
||||||
|
Such a tree can be represented sequentially,
|
||||||
|
by storing its nodes in some fixed order, e.g. prefix order.
|
||||||
|
A special null data item may be used to
|
||||||
|
denote a missing left or right son.
|
||||||
|
For example, let the scalar type be integer,
|
||||||
|
and let the null item be 0.
|
||||||
|
Then the tree of fig. 3.1(a)
|
||||||
|
can be represented as in fig. 3.1(b).
|
||||||
|
.DS
|
||||||
|
4
|
||||||
|
|
||||||
|
9 12
|
||||||
|
|
||||||
|
12 3 4 6
|
||||||
|
|
||||||
|
8 1 5 1
|
||||||
|
|
||||||
|
Fig. 3.1(a) A binary tree
|
||||||
|
|
||||||
|
|
||||||
|
4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
|
||||||
|
|
||||||
|
Fig. 3.1(b) Its sequential representation
|
||||||
|
.DE
|
||||||
|
We are still able to represent the pointer fields ("left"
|
||||||
|
and "right") implicitly.
|
||||||
|
.PP
|
||||||
|
Finally, consider a general
|
||||||
|
.UL graph
|
||||||
|
, where each node has a "data" field and
|
||||||
|
pointer fields,
|
||||||
|
with no restriction on where they may point to.
|
||||||
|
Now we're at the end of our tale.
|
||||||
|
There is no way to represent the pointers implicitly,
|
||||||
|
like we did with lists and trees.
|
||||||
|
In order to represent them explicitly,
|
||||||
|
we use the following scheme.
|
||||||
|
Every node gets an extra field,
|
||||||
|
containing some unique number that identifies the node.
|
||||||
|
We call this number its
|
||||||
|
.UL id.
|
||||||
|
A pointer is represented externally as the id of the node
|
||||||
|
it points to.
|
||||||
|
When reading the file we use a table that maps
|
||||||
|
an id to the address of its node.
|
||||||
|
In general this table will not be completely filled in
|
||||||
|
until we have read the entire external representation of
|
||||||
|
the graph and allocated internal memory locations for
|
||||||
|
every node.
|
||||||
|
Hence we cannot reconstruct the graph in one scan.
|
||||||
|
That is, there may be some pointers from node A to B,
|
||||||
|
where B is placed after A in the sequential file than A.
|
||||||
|
When we read the node of A we cannot map the id of B
|
||||||
|
to the address of node B,
|
||||||
|
as we have not yet allocated node B.
|
||||||
|
We can overcome this problem if the size
|
||||||
|
of every node is known in advance.
|
||||||
|
In this case we can allocate memory for a node
|
||||||
|
on first reference.
|
||||||
|
Else, the mapping from id to pointer
|
||||||
|
cannot be done while reading nodes.
|
||||||
|
The mapping can be done either in an extra scan
|
||||||
|
or at every reference to the node.
|
414
doc/ego/ic/ic3
Normal file
414
doc/ego/ic/ic3
Normal file
|
@ -0,0 +1,414 @@
|
||||||
|
.NH 2
|
||||||
|
Definition of the intermediate code
|
||||||
|
.PP
|
||||||
|
The intermediate code of the optimizer consists
|
||||||
|
of several components:
|
||||||
|
.IP -
|
||||||
|
the object table
|
||||||
|
.IP -
|
||||||
|
the procedure table
|
||||||
|
.IP -
|
||||||
|
the em code
|
||||||
|
.IP -
|
||||||
|
the control flow graphs
|
||||||
|
.IP -
|
||||||
|
the loop table
|
||||||
|
.LP -
|
||||||
|
.PP
|
||||||
|
These components are described in
|
||||||
|
the next sections.
|
||||||
|
The syntactic structure of every component
|
||||||
|
is described by a set of context free syntax rules,
|
||||||
|
with the following conventions:
|
||||||
|
.DS
|
||||||
|
x a non-terminal symbol
|
||||||
|
A a terminal symbol (in capitals)
|
||||||
|
x: a b c; a grammar rule
|
||||||
|
a | b a or b
|
||||||
|
(a)+ 1 or more occurrences of a
|
||||||
|
{a} 0 or more occurrences of a
|
||||||
|
.DE
|
||||||
|
.NH 3
|
||||||
|
The object table
|
||||||
|
.PP
|
||||||
|
EM programs declare blocks of bytes rather than (global) variables.
|
||||||
|
A typical program may declare 'HOL 7780'
|
||||||
|
to allocate space for 8 I/O buffers,
|
||||||
|
2 large arrays and 10 scalar variables.
|
||||||
|
The optimizer wants to deal with
|
||||||
|
.UL objects
|
||||||
|
like variables, buffers and arrays
|
||||||
|
and certainly not with huge numbers of bytes.
|
||||||
|
Therefore the intermediate code contains information
|
||||||
|
about which global objects are used.
|
||||||
|
This information can be obtained from an EM program
|
||||||
|
by just looking at the operands of instruction
|
||||||
|
such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
|
||||||
|
.PP
|
||||||
|
The object table consists of a list of
|
||||||
|
.UL datablock
|
||||||
|
entries.
|
||||||
|
Each such entry represents a declaration like HOL, BSS,
|
||||||
|
CON or ROM.
|
||||||
|
There are five kinds of datablock entries.
|
||||||
|
The fifth kind,
|
||||||
|
UNKNOWN, denotes a declaration in a
|
||||||
|
separately compiled file that is not made
|
||||||
|
available to the optimizer.
|
||||||
|
Each datablock entry contains the type of the block,
|
||||||
|
its size, and a description of the objects that
|
||||||
|
belong to it.
|
||||||
|
If it is a rom,
|
||||||
|
it also contains a list of values given
|
||||||
|
as arguments to the rom instruction,
|
||||||
|
provided that this list contains only integer numbers.
|
||||||
|
An object has an offset (within its datablock)
|
||||||
|
and a size.
|
||||||
|
The size need not always be determinable.
|
||||||
|
Both datablock and object contain a unique
|
||||||
|
identifying number
|
||||||
|
(see previous section for their use).
|
||||||
|
.DS
|
||||||
|
.UL syntax
|
||||||
|
object_table:
|
||||||
|
{datablock} ;
|
||||||
|
datablock:
|
||||||
|
D_ID -- unique identifying number
|
||||||
|
PSEUDO -- one of ROM,CON,BSS,HOL,UNKNOWN
|
||||||
|
SIZE -- # bytes declared
|
||||||
|
FLAGS
|
||||||
|
{value} -- contents of rom
|
||||||
|
{object} ; -- objects of the datablock
|
||||||
|
object:
|
||||||
|
O_ID -- unique identifying number
|
||||||
|
OFFSET -- offset within the datablock
|
||||||
|
SIZE ; -- size of the object in bytes
|
||||||
|
value:
|
||||||
|
argument ;
|
||||||
|
.DE
|
||||||
|
A data block has only one flag: "external", indicating
|
||||||
|
whether the data label is externally visible.
|
||||||
|
The syntax for "argument" will be given later on
|
||||||
|
(see em_text).
|
||||||
|
.NH 3
|
||||||
|
The procedure table
|
||||||
|
.PP
|
||||||
|
The procedure table contains global information
|
||||||
|
about all procedures that are made available
|
||||||
|
to the optimizer
|
||||||
|
and that are needed by the EM program.
|
||||||
|
(Library units may not be needed, see section 3.5).
|
||||||
|
The table has one entry for
|
||||||
|
every procedure.
|
||||||
|
.DS
|
||||||
|
.UL syntax
|
||||||
|
procedure_table:
|
||||||
|
{procedure}
|
||||||
|
procedure:
|
||||||
|
P_ID -- unique identifying number
|
||||||
|
#LABELS -- number of instruction labels
|
||||||
|
#LOCALS -- number of bytes for locals
|
||||||
|
#FORMALS -- number of bytes for formals
|
||||||
|
FLAGS -- flag bits
|
||||||
|
calling -- procedures called by this one
|
||||||
|
change -- info about global variables changed
|
||||||
|
use ; -- info about global variables used
|
||||||
|
calling:
|
||||||
|
{P_ID} ; -- procedures called
|
||||||
|
change:
|
||||||
|
ext -- external variables changed
|
||||||
|
FLAGS ;
|
||||||
|
use:
|
||||||
|
FLAGS ;
|
||||||
|
ext:
|
||||||
|
{O_ID} ; -- a set of objects
|
||||||
|
.DE
|
||||||
|
.PP
|
||||||
|
The number of bytes of formal parameters accessed by
|
||||||
|
a procedure is determined by the front ends and
|
||||||
|
passed via a message (parameter message) to the optimizer.
|
||||||
|
If the front end is not able to determine this number
|
||||||
|
(e.g. the parameter may be an array of dynamic size or
|
||||||
|
the procedure may have a variable number of arguments) the attribute
|
||||||
|
contains the value 'UNKNOWN_SIZE'.
|
||||||
|
.sp 0
|
||||||
|
A procedure has the following flags:
|
||||||
|
.IP -
|
||||||
|
external: true if the proc. is externally visible
|
||||||
|
.IP -
|
||||||
|
bodyseen: true if its code is available as EM text
|
||||||
|
.IP -
|
||||||
|
calunknown: true if it calls a procedure that has its bodyseen
|
||||||
|
flag not set
|
||||||
|
.IP -
|
||||||
|
environ: true if it uses or changes a (non-global) variable in
|
||||||
|
a lexically enclosing procedure
|
||||||
|
.IP -
|
||||||
|
lpi: true if is used as operand of an lpi instruction, so
|
||||||
|
it may be called indirect
|
||||||
|
.LP
|
||||||
|
The change and use attributes both have one flag: "indirect",
|
||||||
|
indicating whether the procedure does a 'use indirect'
|
||||||
|
or a 'store indirect' (indirect means through a pointer).
|
||||||
|
.NH 3
|
||||||
|
The EM text
|
||||||
|
.PP
|
||||||
|
The EM text contains the EM instructions.
|
||||||
|
Every EM instruction has an operation code (opcode)
|
||||||
|
and 0 or 1 operands.
|
||||||
|
EM pseudo instructions can have more than
|
||||||
|
1 operand.
|
||||||
|
The opcode is just a small (8 bit) integer.
|
||||||
|
.sp
|
||||||
|
There are several kinds of operands, which we will
|
||||||
|
refer to as
|
||||||
|
.UL types.
|
||||||
|
Many EM instructions can have more than one type of operand.
|
||||||
|
The types and their encodings in Compact Assembly Language
|
||||||
|
are discussed extensively in.
|
||||||
|
.[~[
|
||||||
|
keizer architecture
|
||||||
|
.], section 11.2]
|
||||||
|
Of special interest is the way numeric values
|
||||||
|
are represented.
|
||||||
|
Of prime importance is the machine independency of
|
||||||
|
the representation.
|
||||||
|
Ultimately, one could store every integer
|
||||||
|
just as a string of the characters '0' to '9'.
|
||||||
|
As doing arithmetic on strings is awkward,
|
||||||
|
Compact Assembly Language allows several alternatives.
|
||||||
|
The main idea is to look at the value of the integer.
|
||||||
|
Integers that fit in 16, 32 or 64 bits are
|
||||||
|
represented as a row of resp. 2, 4 and 8 bytes,
|
||||||
|
preceded by an indication of how many bytes are used.
|
||||||
|
Longer integers are represented as strings;
|
||||||
|
this is only allowed within pseudo instructions, however.
|
||||||
|
This concept works very well for target machines
|
||||||
|
with reasonable word sizes.
|
||||||
|
At present, most ACK software cannot be used for word sizes
|
||||||
|
higher than 32 bits,
|
||||||
|
although the handles for using larger word sizes are
|
||||||
|
present in the design of the EM code.
|
||||||
|
In the intermediate code we essentially use the
|
||||||
|
same ideas.
|
||||||
|
We allow three representations of integers.
|
||||||
|
.IP -
|
||||||
|
integers that fit in a short are represented as a short
|
||||||
|
.IP -
|
||||||
|
integers that fit in a long but not in a short are represented
|
||||||
|
as longs
|
||||||
|
.IP -
|
||||||
|
all remaining integers are represented as strings
|
||||||
|
(only allowed in pseudos).
|
||||||
|
.LP
|
||||||
|
The terms short and long are defined in
|
||||||
|
.[~[
|
||||||
|
ritchie reference manual programming language
|
||||||
|
.], section 4]
|
||||||
|
and depend only on the source machine
|
||||||
|
(i.e. the machine on which ACK runs),
|
||||||
|
not on the target machines.
|
||||||
|
For historical reasons a long will often be called an
|
||||||
|
.UL offset.
|
||||||
|
.PP
|
||||||
|
Operands can also be instruction labels,
|
||||||
|
objects or procedures.
|
||||||
|
Instruction labels are denoted by a
|
||||||
|
.UL label
|
||||||
|
.UL identifier,
|
||||||
|
which can be distinguished from a normal identifier.
|
||||||
|
.sp
|
||||||
|
The operand of a pseudo instruction can be a list of
|
||||||
|
.UL arguments.
|
||||||
|
Arguments can have the same type as operands, except
|
||||||
|
for the type short, which is not used for arguments.
|
||||||
|
Furthermore, an argument can be a string or
|
||||||
|
a string representation of a signed integer, unsigned integer
|
||||||
|
or floating point number.
|
||||||
|
If the number of arguments is not fully determined by
|
||||||
|
the pseudo instruction (e.g. a ROM pseudo can have any number
|
||||||
|
of arguments), then the list is terminated by a special
|
||||||
|
argument of type CEND.
|
||||||
|
.DS
|
||||||
|
.UL syntax
|
||||||
|
em_text:
|
||||||
|
{line} ;
|
||||||
|
line:
|
||||||
|
INSTR -- opcode
|
||||||
|
OPTYPE -- operand type
|
||||||
|
operand ;
|
||||||
|
operand:
|
||||||
|
empty | -- OPTYPE = NO
|
||||||
|
SHORT | -- OPTYPE = SHORT
|
||||||
|
OFFSET | -- OPTYPE = OFFSET
|
||||||
|
LAB_ID | -- OPTYPE = INSTRLAB
|
||||||
|
O_ID | -- OPTYPE = OBJECT
|
||||||
|
P_ID | -- OPTYPE = PROCEDURE
|
||||||
|
{argument} ; -- OPTYPE = LIST
|
||||||
|
argument:
|
||||||
|
ARGTYPE
|
||||||
|
arg ;
|
||||||
|
arg:
|
||||||
|
empty | -- ARGTYPE = CEND
|
||||||
|
OFFSET |
|
||||||
|
LAB_ID |
|
||||||
|
O_ID |
|
||||||
|
P_ID |
|
||||||
|
string | -- ARGTYPE = STRING
|
||||||
|
const ; -- ARGTYPE = ICON,UCON or FCON
|
||||||
|
string:
|
||||||
|
LENGTH -- number of characters
|
||||||
|
{CHARACTER} ;
|
||||||
|
const:
|
||||||
|
SIZE -- number of bytes
|
||||||
|
string ; -- string representation of (un)signed
|
||||||
|
-- or floating point constant
|
||||||
|
.DE
|
||||||
|
.NH 3
|
||||||
|
The control flow graphs
|
||||||
|
.PP
|
||||||
|
Each procedure can be divided
|
||||||
|
into a number of basic blocks.
|
||||||
|
A basic block is a piece of code with
|
||||||
|
no jumps in, except at the beginning,
|
||||||
|
and no jumps out, except at the end.
|
||||||
|
.PP
|
||||||
|
Every basic block has a set of
|
||||||
|
.UL successors,
|
||||||
|
which are basic blocks that can follow it immediately in
|
||||||
|
the dynamic execution sequence.
|
||||||
|
The
|
||||||
|
.UL predecessors
|
||||||
|
are the basic blocks of which this one
|
||||||
|
is a successor.
|
||||||
|
The successor and predecessor attributes
|
||||||
|
of all basic blocks of a single procedure
|
||||||
|
are said to form the
|
||||||
|
.UL control
|
||||||
|
.UL flow
|
||||||
|
.UL graph
|
||||||
|
of that procedure.
|
||||||
|
.PP
|
||||||
|
Another important attribute is the
|
||||||
|
.UL immediate
|
||||||
|
.UL dominator.
|
||||||
|
A basic block B dominates a block C if
|
||||||
|
every path in the graph from the procedure entry block
|
||||||
|
to C goes through B.
|
||||||
|
The immediate dominator of C is the closest dominator
|
||||||
|
of C on any path from the entry block.
|
||||||
|
(Note that the dominator relation is transitive,
|
||||||
|
so the immediate dominator is well defined.)
|
||||||
|
.PP
|
||||||
|
A basic block also has an attribute containing
|
||||||
|
the identifiers of every
|
||||||
|
.UL loop
|
||||||
|
that the block belongs to (see next section for loops).
|
||||||
|
.DS
|
||||||
|
.UL syntax
|
||||||
|
control_flow_graph:
|
||||||
|
{basic_block} ;
|
||||||
|
basic_block:
|
||||||
|
B_ID -- unique identifying number
|
||||||
|
#INSTR -- number of EM instructions
|
||||||
|
succ
|
||||||
|
pred
|
||||||
|
idom -- immediate dominator
|
||||||
|
loops -- set of loops
|
||||||
|
FLAGS ; -- flag bits
|
||||||
|
succ:
|
||||||
|
{B_ID} ;
|
||||||
|
pred:
|
||||||
|
{B_ID} ;
|
||||||
|
idom:
|
||||||
|
B_ID ;
|
||||||
|
loops:
|
||||||
|
{LP_ID} ;
|
||||||
|
.DE
|
||||||
|
The flag bits can have the values 'firm' and 'strong',
|
||||||
|
which are explained below.
|
||||||
|
.NH 3
|
||||||
|
The loop tables
|
||||||
|
.PP
|
||||||
|
Every procedure has an associated
|
||||||
|
.UL loop
|
||||||
|
.UL table
|
||||||
|
containing information about all the loops
|
||||||
|
in the procedure.
|
||||||
|
Loops can be detected by a close inspection of
|
||||||
|
the control flow graph.
|
||||||
|
The main idea is to look for two basic blocks,
|
||||||
|
B and C, for which the following holds:
|
||||||
|
.IP -
|
||||||
|
B is a successor of C
|
||||||
|
.IP -
|
||||||
|
B is a dominator of C
|
||||||
|
.LP
|
||||||
|
B is called the loop
|
||||||
|
.UL entry
|
||||||
|
and C is called the loop
|
||||||
|
.UL end.
|
||||||
|
Intuitively, C contains a jump backwards to
|
||||||
|
the beginning of the loop (B).
|
||||||
|
.PP
|
||||||
|
A loop L1 is said to be
|
||||||
|
.UL nested
|
||||||
|
within loop L2 if all basic blocks of L1
|
||||||
|
are also part of L2.
|
||||||
|
It is important to note that loops could
|
||||||
|
originally be written as a well structured for -or
|
||||||
|
while loop or as a messy goto loop.
|
||||||
|
Hence loops may partly overlap without one
|
||||||
|
being nested inside the other.
|
||||||
|
The
|
||||||
|
.UL nesting
|
||||||
|
.UL level
|
||||||
|
of a loop is the number of loops in
|
||||||
|
which it is nested (so it is 0 for
|
||||||
|
an outermost loop).
|
||||||
|
The details of loop detection will be discussed later.
|
||||||
|
.PP
|
||||||
|
It is often desirable to know whether a
|
||||||
|
basic block gets executed during every iteration
|
||||||
|
of a loop.
|
||||||
|
This leads to the following definitions:
|
||||||
|
.IP -
|
||||||
|
A basic block B of a loop L is said to be a \fIfirm\fR block
|
||||||
|
of L if B is executed on all successive iterations of L,
|
||||||
|
with the only possible exception of the last iteration.
|
||||||
|
.IP -
|
||||||
|
A basic block B of a loop L is said to be a \fIstrong\fR block
|
||||||
|
of L if B is executed on all successive iterations of L.
|
||||||
|
.LP
|
||||||
|
Note that a strong block is also a firm block.
|
||||||
|
If a block is part of a conditional statement, it is neither
|
||||||
|
strong nor firm, as it may be skipped during some iterations
|
||||||
|
(see Fig. 3.2).
|
||||||
|
.DS
|
||||||
|
loop
|
||||||
|
if cond1 then
|
||||||
|
... -- this code will not
|
||||||
|
-- result in a firm or strong block
|
||||||
|
end if;
|
||||||
|
... -- strong (always executed)
|
||||||
|
exit when cond2;
|
||||||
|
... -- firm (not executed on
|
||||||
|
-- last iteration).
|
||||||
|
end loop;
|
||||||
|
|
||||||
|
Fig. 3.2 Example of firm and strong block
|
||||||
|
.DE
|
||||||
|
.DS
|
||||||
|
.UL syntax
|
||||||
|
looptable:
|
||||||
|
{loop} ;
|
||||||
|
loop:
|
||||||
|
LP_ID -- unique identifying number
|
||||||
|
LEVEL -- loop nesting level
|
||||||
|
entry -- loop entry block
|
||||||
|
end ;
|
||||||
|
entry:
|
||||||
|
B_ID ;
|
||||||
|
end:
|
||||||
|
B_ID ;
|
||||||
|
.DE
|
80
doc/ego/ic/ic4
Normal file
80
doc/ego/ic/ic4
Normal file
|
@ -0,0 +1,80 @@
|
||||||
|
.NH 2
|
||||||
|
External representation of the intermediate code
|
||||||
|
.PP
|
||||||
|
The syntax of the intermediate code was given
|
||||||
|
in the previous section.
|
||||||
|
In this section we will make some remarks about
|
||||||
|
the representation of the code in sequential files.
|
||||||
|
.sp
|
||||||
|
We use sequential files in order to avoid
|
||||||
|
the bookkeeping of complex file indices.
|
||||||
|
As a consequence of this decision
|
||||||
|
we can't store all components
|
||||||
|
of the intermediate code
|
||||||
|
in one file.
|
||||||
|
If a phase wishes to change some attribute
|
||||||
|
of a procedure,
|
||||||
|
or wants to add or delete entire procedures
|
||||||
|
(inline substitution may do the latter),
|
||||||
|
the procedure table will only be fully updated
|
||||||
|
after the entire EM text has been scanned.
|
||||||
|
Yet, the next phase undoubtedly wants
|
||||||
|
to read the procedure table before it
|
||||||
|
starts working on the EM text.
|
||||||
|
Hence there is an ordering problem, which
|
||||||
|
can be solved easily by putting the
|
||||||
|
procedure table in a separate file.
|
||||||
|
Similarly, the data block table is kept
|
||||||
|
in a file of its own.
|
||||||
|
.PP
|
||||||
|
The control flow graphs (CFGs) could be mixed
|
||||||
|
with the EM text.
|
||||||
|
Rather, we have chosen to put them
|
||||||
|
in a separate file too.
|
||||||
|
The control flow graph file should be regarded as a
|
||||||
|
file that imposes some structure on the EM-text file,
|
||||||
|
just as an overhead sheet containing a picture
|
||||||
|
of a Flow Chart may be put on an overhead sheet
|
||||||
|
containing statements.
|
||||||
|
The loop tables are also put in the CFG file.
|
||||||
|
A loop imposes an extra structure on the
|
||||||
|
CFGs and hence on the EM text.
|
||||||
|
So there are four files:
|
||||||
|
.IP -
|
||||||
|
the EM-text file
|
||||||
|
.IP -
|
||||||
|
the procedure table file
|
||||||
|
.IP -
|
||||||
|
the object table file
|
||||||
|
.IP -
|
||||||
|
the CFG and loop tables file
|
||||||
|
.LP
|
||||||
|
Every table is preceded by its length, in order to
|
||||||
|
tell where it ends.
|
||||||
|
The CFG file also contains the number of instructions of
|
||||||
|
every basic block,
|
||||||
|
indicating which part of the EM text belongs
|
||||||
|
to that block.
|
||||||
|
.DS
|
||||||
|
.UL syntax
|
||||||
|
intermediate_code:
|
||||||
|
object_table_file
|
||||||
|
proctable_file
|
||||||
|
em_text_file
|
||||||
|
cfg_file ;
|
||||||
|
object_table_file:
|
||||||
|
LENGTH -- number of objects
|
||||||
|
object_table ;
|
||||||
|
proctable_file:
|
||||||
|
LENGTH -- number of procedures
|
||||||
|
procedure_table ;
|
||||||
|
em_text_file:
|
||||||
|
em_text ;
|
||||||
|
cfg_file:
|
||||||
|
{per_proc} ; -- one for every procedure
|
||||||
|
per_proc:
|
||||||
|
BLENGTH -- number of basic blocks
|
||||||
|
LLENGTH -- number of loops
|
||||||
|
control_flow_graph
|
||||||
|
looptable ;
|
||||||
|
.DE
|
163
doc/ego/ic/ic5
Normal file
163
doc/ego/ic/ic5
Normal file
|
@ -0,0 +1,163 @@
|
||||||
|
.NH 2
|
||||||
|
The Intermediate Code construction phase
|
||||||
|
.PP
|
||||||
|
The first phase of the global optimizer,
|
||||||
|
called
|
||||||
|
.UL IC,
|
||||||
|
constructs a major part of the intermediate code.
|
||||||
|
To be specific, it produces:
|
||||||
|
.IP -
|
||||||
|
the EM text
|
||||||
|
.IP -
|
||||||
|
the object table
|
||||||
|
.IP -
|
||||||
|
part of the procedure table
|
||||||
|
.LP
|
||||||
|
The calling, change and use attributes of a procedure
|
||||||
|
and all its flags except the external and bodyseen flags
|
||||||
|
are computed by the next phase (Control Flow phase).
|
||||||
|
.PP
|
||||||
|
As explained before,
|
||||||
|
the intermediate code does not contain
|
||||||
|
any names of variables or procedures.
|
||||||
|
The normal identifiers are replaced by identifying
|
||||||
|
numbers.
|
||||||
|
Yet, the output of the global optimizer must
|
||||||
|
contain normal identifiers, as this
|
||||||
|
output is in Compact Assembly Language format.
|
||||||
|
We certainly want all externally visible names
|
||||||
|
to be the same in the input as in the output,
|
||||||
|
because the optimized EM module may be a library unit,
|
||||||
|
used by other modules.
|
||||||
|
IC dumps the names of all procedures and data labels
|
||||||
|
on two files:
|
||||||
|
.IP -
|
||||||
|
the procedure dump file, containing tuples (P_ID, procedure name)
|
||||||
|
.IP -
|
||||||
|
the data dump file, containing tuples (D_ID, data label name)
|
||||||
|
.LP
|
||||||
|
The names of instruction labels are not dumped,
|
||||||
|
as they are not visible outside the procedure
|
||||||
|
in which they are defined.
|
||||||
|
.PP
|
||||||
|
The input to IC consists of one or more files.
|
||||||
|
Each file is either an EM module in Compact Assembly Language
|
||||||
|
format, or a Unix archive file (library) containing such modules.
|
||||||
|
IC only extracts those modules from a library that are
|
||||||
|
needed somehow, just as a linker does.
|
||||||
|
It is advisable to present as much code
|
||||||
|
of the EM program as possible to the optimizer,
|
||||||
|
although it is not required to present the whole program.
|
||||||
|
If a procedure is called somewhere in the EM text,
|
||||||
|
but its body (text) is not included in the input,
|
||||||
|
its bodyseen flag in the procedure table will still
|
||||||
|
be off.
|
||||||
|
Whenever such a procedure is called,
|
||||||
|
we assume the worst case for everything;
|
||||||
|
it will change and use all variables it has access to,
|
||||||
|
it will call every procedure etc.
|
||||||
|
.sp
|
||||||
|
Similarly, if a data label is used
|
||||||
|
but not defined, the PSEUDO attribute in its data block
|
||||||
|
will be set to UNKNOWN.
|
||||||
|
.NH 3
|
||||||
|
Implementation
|
||||||
|
.PP
|
||||||
|
Part of the code for the EM Peephole Optimizer
|
||||||
|
.[
|
||||||
|
staveren peephole toplass
|
||||||
|
.]
|
||||||
|
has been used for IC.
|
||||||
|
Especially the routines that read and unravel
|
||||||
|
Compact Assembly Language and the identifier
|
||||||
|
lookup mechanism have been used.
|
||||||
|
New code was added to recognize objects,
|
||||||
|
build the object and procedure tables and to
|
||||||
|
output the intermediate code.
|
||||||
|
.PP
|
||||||
|
IC uses singly linked linear lists for both the
|
||||||
|
procedure and object table.
|
||||||
|
Hence there are no limits on the size of such
|
||||||
|
a table (except for the trivial fact that it must fit
|
||||||
|
in main memory).
|
||||||
|
Both tables are outputted after all EM code has
|
||||||
|
been processed.
|
||||||
|
IC reads the EM text of one entire procedure
|
||||||
|
at a time,
|
||||||
|
processes it and appends the modified code to
|
||||||
|
the EM text file.
|
||||||
|
EM code is represented internally as a doubly linked linear
|
||||||
|
list of EM instructions.
|
||||||
|
.PP
|
||||||
|
Objects are recognized by looking at the operands
|
||||||
|
of instructions that reference global data.
|
||||||
|
If we come across the instructions:
|
||||||
|
.DS
|
||||||
|
LDE X+6 -- Load Double External
|
||||||
|
LAE X+20 -- Load Address External
|
||||||
|
.DE
|
||||||
|
we conclude that the data block
|
||||||
|
preceded by the data label X contains an object
|
||||||
|
at offset 6 of size twice the word size,
|
||||||
|
and an object at offset 20 of unknown size.
|
||||||
|
.sp
|
||||||
|
A data block entry of the object table is allocated
|
||||||
|
at the first reference to a data label.
|
||||||
|
If this reference is a defining occurrence
|
||||||
|
or a INA pseudo instruction,
|
||||||
|
the label is not externally visible
|
||||||
|
.[~[
|
||||||
|
keizer architecture
|
||||||
|
.], section 11.1.4.3]
|
||||||
|
In this case, the external flag of the data block
|
||||||
|
is turned off.
|
||||||
|
If the first reference is an applied occurrence
|
||||||
|
or a EXA pseudo instruction, the flag is set.
|
||||||
|
We record this information, because the
|
||||||
|
optimizer may change the order of defining and
|
||||||
|
applied occurrences.
|
||||||
|
The INA and EXA pseudos are removed from the EM text.
|
||||||
|
They may be regenerated by the last phase
|
||||||
|
of the optimizer.
|
||||||
|
.sp
|
||||||
|
Similar rules hold for the procedure table
|
||||||
|
and the INP and EXP pseudos.
|
||||||
|
.NH 3
|
||||||
|
Source files of IC
|
||||||
|
.PP
|
||||||
|
The source files of IC consist
|
||||||
|
of the files ic.c, ic.h and several packages.
|
||||||
|
.UL ic.h
|
||||||
|
contains type definitions, macros and
|
||||||
|
variable declarations that may be used by
|
||||||
|
ic.c and by every package.
|
||||||
|
.UL ic.c
|
||||||
|
contains the definitions of these variables,
|
||||||
|
the procedure
|
||||||
|
.UL main
|
||||||
|
and some high level I/O routines used by main.
|
||||||
|
.sp
|
||||||
|
Every package xxx consists of two files.
|
||||||
|
ic_xxx.h contains type definitions,
|
||||||
|
macros, variable declarations and
|
||||||
|
procedure declarations that may be used by
|
||||||
|
every .c file that includes this .h file.
|
||||||
|
The file ic_xxx.c provides the
|
||||||
|
definitions of these variables and
|
||||||
|
the implementation of the declared procedures.
|
||||||
|
IC uses the following packages:
|
||||||
|
.IP lookup: 18
|
||||||
|
procedures that loop up procedure, data label
|
||||||
|
and instruction label names; procedures to dump
|
||||||
|
the procedure and data label names.
|
||||||
|
.IP lib:
|
||||||
|
one procedure that gets the next useful input module;
|
||||||
|
while scanning archives, it skips unnecessary modules.
|
||||||
|
.IP aux:
|
||||||
|
several auxiliary routines.
|
||||||
|
.IP io:
|
||||||
|
low-level I/O routines that unravel the Compact
|
||||||
|
Assembly Language.
|
||||||
|
.IP put:
|
||||||
|
routines that output the intermediate code
|
||||||
|
.LP
|
112
doc/ego/il/il1
Normal file
112
doc/ego/il/il1
Normal file
|
@ -0,0 +1,112 @@
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Inline substitution
|
||||||
|
.NH 2
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
The Inline Substitution technique (IL)
|
||||||
|
tries to decrease the overhead associated
|
||||||
|
with procedure calls (invocations).
|
||||||
|
During a procedure call, several actions
|
||||||
|
must be undertaken to set up the right
|
||||||
|
environment for the called procedure.
|
||||||
|
.[
|
||||||
|
johnson calling sequence
|
||||||
|
.]
|
||||||
|
On return from the procedure, most of these
|
||||||
|
effects must be undone.
|
||||||
|
This entire process introduces significant
|
||||||
|
costs in execution time as well as
|
||||||
|
in object code size.
|
||||||
|
.PP
|
||||||
|
The inline substitution technique replaces
|
||||||
|
some of the calls by the modified body of
|
||||||
|
the called procedure, hence eliminating
|
||||||
|
the overhead.
|
||||||
|
Furthermore, as the calling and called procedure
|
||||||
|
are now integrated, they can be optimized
|
||||||
|
together, using other techniques of the optimizer.
|
||||||
|
This often leads to extra opportunities for
|
||||||
|
optimization
|
||||||
|
.[
|
||||||
|
ball predicting effects
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
carter code generation cacm
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
scheifler inline cacm
|
||||||
|
.]
|
||||||
|
.PP
|
||||||
|
An inline substitution of a call to a procedure P increases
|
||||||
|
the size of the program, unless P is very small or P is
|
||||||
|
called only once.
|
||||||
|
In the latter case, P can be eliminated.
|
||||||
|
In practice, procedures that are called only once occur
|
||||||
|
quite frequently, due to the
|
||||||
|
introduction of structured programming.
|
||||||
|
(Carter
|
||||||
|
.[
|
||||||
|
carter umi ann arbor
|
||||||
|
.]
|
||||||
|
states that almost 50% of the Pascal procedures
|
||||||
|
he analyzed were called just once).
|
||||||
|
.PP
|
||||||
|
Scheifler
|
||||||
|
.[
|
||||||
|
scheifler inline cacm
|
||||||
|
.]
|
||||||
|
has a more general view of inline substitution.
|
||||||
|
In his model, the program under consideration is
|
||||||
|
allowed to grow by a certain amount,
|
||||||
|
i.e. code size is sacrificed to speed up the program.
|
||||||
|
The above two cases are just special cases of
|
||||||
|
his model, obtained by setting the size-change to
|
||||||
|
(approximately) zero.
|
||||||
|
He formulates the substitution problem as follows:
|
||||||
|
.IP
|
||||||
|
"Given a program, a subset of all invocations,
|
||||||
|
a maximum program size, and a maximum procedure size,
|
||||||
|
find a sequence of substitutions that minimizes
|
||||||
|
the expected execution time."
|
||||||
|
.LP
|
||||||
|
Scheifler shows that this problem is NP-complete
|
||||||
|
.[~[
|
||||||
|
aho hopcroft ullman analysis algorithms
|
||||||
|
.], chapter 10]
|
||||||
|
by reduction to the Knapsack Problem.
|
||||||
|
Heuristics will have to be used to find a near-optimal
|
||||||
|
solution.
|
||||||
|
.PP
|
||||||
|
In the following chapters we will extend
|
||||||
|
Scheifler's view and adapt it to the EM Global Optimizer.
|
||||||
|
We will first describe the transformations that have
|
||||||
|
to be applied to the EM text when a call is substituted
|
||||||
|
in line.
|
||||||
|
Next we will examine in which cases inline substitution
|
||||||
|
is not possible or desirable.
|
||||||
|
Heuristics will be developed for
|
||||||
|
chosing a good sequence of substitutions.
|
||||||
|
These heuristics make no demand on the user
|
||||||
|
(such as making profiles
|
||||||
|
.[
|
||||||
|
scheifler inline cacm
|
||||||
|
.]
|
||||||
|
or giving pragmats
|
||||||
|
.[~[
|
||||||
|
ichbiah ada military standard
|
||||||
|
.], section 6.3.2]),
|
||||||
|
although the model could easily be extended
|
||||||
|
to use such information.
|
||||||
|
Finally, we will discuss the implementation
|
||||||
|
of the IL phase of the optimizer.
|
||||||
|
.PP
|
||||||
|
We will often use the term inline expansion
|
||||||
|
as a synonym of inline substitution.
|
||||||
|
.sp 0
|
||||||
|
The inverse technique of procedure abstraction
|
||||||
|
(automatic subroutine generation)
|
||||||
|
.[
|
||||||
|
shaffer subroutine generation
|
||||||
|
.]
|
||||||
|
will not be discussed in this report.
|
93
doc/ego/il/il2
Normal file
93
doc/ego/il/il2
Normal file
|
@ -0,0 +1,93 @@
|
||||||
|
.NH 2
|
||||||
|
Parameters and local variables.
|
||||||
|
.PP
|
||||||
|
In the EM calling sequence, the calling procedure
|
||||||
|
pushes its parameters on the stack
|
||||||
|
before doing the CAL.
|
||||||
|
The called routine first saves some
|
||||||
|
status information on the stack and then
|
||||||
|
allocates space for its own locals
|
||||||
|
(also on the stack).
|
||||||
|
Usually, one special purpose register,
|
||||||
|
the Local Base (LB) register,
|
||||||
|
is used to access both the locals and the
|
||||||
|
parameters.
|
||||||
|
If memory is highly segmented,
|
||||||
|
the stack frames of the caller and the callee
|
||||||
|
may be allocated in different fragments;
|
||||||
|
an extra Argument Base (AB) register is used
|
||||||
|
in this case to access the actual parameters.
|
||||||
|
See 4.2 of
|
||||||
|
.[
|
||||||
|
keizer architecture
|
||||||
|
.]
|
||||||
|
for further details.
|
||||||
|
.PP
|
||||||
|
If a procedure call is expanded in line,
|
||||||
|
there are two problems:
|
||||||
|
.IP 1. 3
|
||||||
|
No stack frame will be allocated for the called procedure;
|
||||||
|
we must find another place to put its locals.
|
||||||
|
.IP 2.
|
||||||
|
The LB register cannot be used to access the actual
|
||||||
|
parameters;
|
||||||
|
as the CAL instruction is deleted, the LB will
|
||||||
|
still point to the local base of the \fIcalling\fR procedure.
|
||||||
|
.LP
|
||||||
|
The local variables of the called procedure will
|
||||||
|
be put in the stack frame of the calling procedure,
|
||||||
|
just after its own locals.
|
||||||
|
The size of the stack frame of the
|
||||||
|
calling procedure will be increased
|
||||||
|
during its entire lifetime.
|
||||||
|
Therefore our model will allow a
|
||||||
|
limit to be set on the number of bytes
|
||||||
|
for locals that the called procedure may have
|
||||||
|
(see next section).
|
||||||
|
.PP
|
||||||
|
There are several alternatives to access the parameters.
|
||||||
|
An actual parameter may be any auxiliary expression,
|
||||||
|
which we will refer to as
|
||||||
|
the \fIactual parameter expression\fR.
|
||||||
|
The value of this expression is stored
|
||||||
|
in a location on the stack (see above),
|
||||||
|
the \fIparameter location\fR.
|
||||||
|
.sp 0
|
||||||
|
The alternatives for accessing parameters are:
|
||||||
|
.IP -
|
||||||
|
save the value of the stackpointer at the point of the CAL
|
||||||
|
in a temporary variable X;
|
||||||
|
this variable can be used to simulate the AB register, i.e.
|
||||||
|
parameter locations are accessed via an offset to
|
||||||
|
the value of X.
|
||||||
|
.IP -
|
||||||
|
create a new temporary local variable T for
|
||||||
|
the parameter (in the stack frame of the caller);
|
||||||
|
every access to the parameter location must be changed
|
||||||
|
into an access to T.
|
||||||
|
.IP -
|
||||||
|
do not evaluate the actual parameter expression before the call;
|
||||||
|
instead, substitute this expression for every use of the
|
||||||
|
parameter location.
|
||||||
|
.LP
|
||||||
|
The first method may be expensive if X is not
|
||||||
|
put in a register.
|
||||||
|
We will not use this method.
|
||||||
|
The time required to evaluate and access the
|
||||||
|
parameters when the second method is used
|
||||||
|
will not differ much from the normal
|
||||||
|
calling sequence (i.e. not in line call).
|
||||||
|
It is not expensive, but there are no
|
||||||
|
extra savings either.
|
||||||
|
The third method is essentially the 'by name'
|
||||||
|
parameter mechanism of Algol60.
|
||||||
|
If the actual parameter is just a numeric constant,
|
||||||
|
it is advantageous to use it.
|
||||||
|
Yet, there are several circumstances
|
||||||
|
under which it cannot or should not be used.
|
||||||
|
We will deal with this in the next section.
|
||||||
|
.sp 0
|
||||||
|
In general we will use the third method,
|
||||||
|
if it is possible and desirable.
|
||||||
|
Such parameters will be called \fIin line parameters\fR.
|
||||||
|
In all other cases we will use the second method.
|
164
doc/ego/il/il3
Normal file
164
doc/ego/il/il3
Normal file
|
@ -0,0 +1,164 @@
|
||||||
|
.NH 2
|
||||||
|
Feasibility and desirability analysis
|
||||||
|
.PP
|
||||||
|
Feasibility and desirability analysis
|
||||||
|
of in line substitution differ
|
||||||
|
somewhat from most other techniques.
|
||||||
|
Usually, much effort is needed to find
|
||||||
|
a feasible opportunity for optimization
|
||||||
|
(e.g. a redundant subexpression).
|
||||||
|
Desirability analysis then checks
|
||||||
|
if it is really advantageous to do
|
||||||
|
the optimization.
|
||||||
|
For IL, opportunities are easy to find.
|
||||||
|
To see if an in line expansion is
|
||||||
|
desirable will not be hard either.
|
||||||
|
Yet, the main problem is to find the most
|
||||||
|
desirable ones.
|
||||||
|
We will deal with this problem later and
|
||||||
|
we will first attend feasibility and
|
||||||
|
desirability analysis.
|
||||||
|
.PP
|
||||||
|
There are several reasons why a procedure invocation
|
||||||
|
cannot or should not be expanded in line.
|
||||||
|
.sp
|
||||||
|
A call to a procedure P cannot be expanded in line
|
||||||
|
in any of the following cases:
|
||||||
|
.IP 1. 3
|
||||||
|
The body of P is not available as EM text.
|
||||||
|
Clearly, there is no way to do the substitution.
|
||||||
|
.IP 2.
|
||||||
|
P, or any procedure called by P (transitively),
|
||||||
|
follows the chain of statically enclosing
|
||||||
|
procedures (via a LXL or LXA instruction)
|
||||||
|
or follows the chain of dynamically enclosing
|
||||||
|
procedures (via a DCH).
|
||||||
|
If the call were expanded in line,
|
||||||
|
one level would be removed from the chains,
|
||||||
|
leading to total chaos.
|
||||||
|
This chaos could be solved by patching up
|
||||||
|
every LXL, LXA or DCH in all procedures
|
||||||
|
that could be part of the chains,
|
||||||
|
but this is hard to implement.
|
||||||
|
.IP 3.
|
||||||
|
P, or any procedure called by P (transitively),
|
||||||
|
calls a procedure whose body is not
|
||||||
|
available as EM text.
|
||||||
|
The unknown procedure may use an LXL, LXA or DCH.
|
||||||
|
However, in several languages a separately
|
||||||
|
compiled procedure has no access to the
|
||||||
|
static or dynamic chain.
|
||||||
|
In this case
|
||||||
|
this point does not apply.
|
||||||
|
.IP 4.
|
||||||
|
P, or any procedure called by P (transitively),
|
||||||
|
uses the LPB instruction, which converts a
|
||||||
|
local base to an argument base;
|
||||||
|
as the locals and parameters are stored
|
||||||
|
in a non-standard way (differing from the
|
||||||
|
normal EM calling sequence) this instruction
|
||||||
|
would yield incorrect results.
|
||||||
|
.IP 5.
|
||||||
|
The total number of bytes of the parameters
|
||||||
|
of P is not known.
|
||||||
|
P may be a procedure with a variable number
|
||||||
|
of parameters or may have an array of dynamic size
|
||||||
|
as value parameter.
|
||||||
|
.LP
|
||||||
|
It is undesirable to expand a call to a procedure P in line
|
||||||
|
in any of the following cases:
|
||||||
|
.IP 1. 3
|
||||||
|
P is large, i.e. the number of EM instructions
|
||||||
|
of P exceeds some threshold.
|
||||||
|
The expanded code would be large too.
|
||||||
|
Furthermore, several programs in ACK,
|
||||||
|
including the global optimizer itself,
|
||||||
|
may run out of memory if they they have to run
|
||||||
|
in a small address space and are provided
|
||||||
|
very large procedures.
|
||||||
|
The threshold may be set to infinite,
|
||||||
|
in which case this point does not apply.
|
||||||
|
.IP 2.
|
||||||
|
P has many local variables.
|
||||||
|
All these variables would have to be allocated
|
||||||
|
in the stack frame of the calling procedure.
|
||||||
|
.PP
|
||||||
|
If a call may be expanded in line, we have to
|
||||||
|
decide how to access its parameters.
|
||||||
|
In the previous section we stated that we would
|
||||||
|
use in line parameters whenever possible and desirable.
|
||||||
|
There are several reasons why a parameter
|
||||||
|
cannot or should not be expanded in line.
|
||||||
|
.sp
|
||||||
|
No parameter of a procedure P can be expanded in line,
|
||||||
|
in any of the following cases:
|
||||||
|
.IP 1. 3
|
||||||
|
P, or any procedure called by P (transitively),
|
||||||
|
does a store-indirect or a use-indirect (i.e. through
|
||||||
|
a pointer).
|
||||||
|
However, if the front-end has generated messages
|
||||||
|
telling that certain parameters can not be accessed
|
||||||
|
indirectly, those parameters may be expanded in line.
|
||||||
|
.IP 2.
|
||||||
|
P, or any procedure called by P (transitively),
|
||||||
|
calls a procedure whose body is not available as EM text.
|
||||||
|
The unknown procedure may do a store-indirect
|
||||||
|
or a use-indirect.
|
||||||
|
However, the same remark about front-end messages
|
||||||
|
as for 1. holds here.
|
||||||
|
.IP 3.
|
||||||
|
The address of a parameter location is taken (via a LAL).
|
||||||
|
In the normal calling sequence, all parameters
|
||||||
|
are stored sequentially. If the address of one
|
||||||
|
parameter location is taken, the address of any
|
||||||
|
other parameter location can be computed from it.
|
||||||
|
Hence we must put every parameter in a temporary location;
|
||||||
|
furthermore, all these locations must be in
|
||||||
|
the same order as for the normal calling sequence.
|
||||||
|
.IP 4.
|
||||||
|
P has overlapping parameters; for example, it uses
|
||||||
|
the parameter at offset 10 both as a 2 byte and as a 4 byte
|
||||||
|
parameter.
|
||||||
|
Such code may be produced by the front ends if
|
||||||
|
the formal parameter is of some record type
|
||||||
|
with variants.
|
||||||
|
.PP
|
||||||
|
Sometimes a specific parameter must not be expanded in line.
|
||||||
|
.sp 0
|
||||||
|
An actual parameter expression cannot be expanded in line
|
||||||
|
in any of the following cases:
|
||||||
|
.IP 1. 3
|
||||||
|
P stores into the parameter location.
|
||||||
|
Even if the actual parameter expression is a simple
|
||||||
|
variable, it is incorrect to change the 'store into
|
||||||
|
formal' into a 'store into actual', because of
|
||||||
|
the parameter mechanism used.
|
||||||
|
In Pascal, the following expansion is incorrect:
|
||||||
|
.DS
|
||||||
|
procedure p (x:integer);
|
||||||
|
begin
|
||||||
|
x := 20;
|
||||||
|
end;
|
||||||
|
...
|
||||||
|
a := 10; a := 10;
|
||||||
|
p(a); ---> a := 20;
|
||||||
|
write(a); write(a);
|
||||||
|
.DE
|
||||||
|
.IP 2.
|
||||||
|
P changes any of the operands of the
|
||||||
|
actual parameter expression.
|
||||||
|
If the expression is expanded and evaluated
|
||||||
|
after the operand has been changed,
|
||||||
|
the wrong value will be used.
|
||||||
|
.IP 3.
|
||||||
|
The actual parameter expression has side effects.
|
||||||
|
It must be evaluated only once,
|
||||||
|
at the place of the call.
|
||||||
|
.LP
|
||||||
|
It is undesirable to expand an actual parameter in line
|
||||||
|
in the following case:
|
||||||
|
.IP 1. 3
|
||||||
|
The parameter is used more than once
|
||||||
|
(dynamically) and the actual parameter expression
|
||||||
|
is not just a simple variable or constant.
|
||||||
|
.LP
|
132
doc/ego/il/il4
Normal file
132
doc/ego/il/il4
Normal file
|
@ -0,0 +1,132 @@
|
||||||
|
.NH 2
|
||||||
|
Heuristic rules
|
||||||
|
.PP
|
||||||
|
Using the information described
|
||||||
|
in the previous section,
|
||||||
|
we can find all calls that can
|
||||||
|
be expanded in line, and for which
|
||||||
|
this expansion is desirable.
|
||||||
|
In general, we cannot expand all these calls,
|
||||||
|
so we have to choose the 'best' ones.
|
||||||
|
With every CAL instruction
|
||||||
|
that may be expanded, we associate
|
||||||
|
a \fIpay off\fR,
|
||||||
|
which expresses how desirable it is
|
||||||
|
to expand this specific CAL.
|
||||||
|
.sp
|
||||||
|
Let Tc denote the portion of EM text involved
|
||||||
|
in a specific call, i.e. the pushing of the actual
|
||||||
|
parameter expressions, the CAL itself,
|
||||||
|
the popping of the parameters and the
|
||||||
|
pushing of the result (if any, via an LFR).
|
||||||
|
Let Te denote the EM text that would be obtained
|
||||||
|
by expanding the call in line.
|
||||||
|
Let Pc be the original program and Pe the program
|
||||||
|
with Te substituted for Tc.
|
||||||
|
The pay off of the CAL depends on two factors:
|
||||||
|
.IP -
|
||||||
|
T = execution_time(Pe) - execution_time(Pc)
|
||||||
|
.IP -
|
||||||
|
S = code_size(Pe) - code_size(Pc)
|
||||||
|
.LP
|
||||||
|
The change in execution time (T) depends on:
|
||||||
|
.IP -
|
||||||
|
T1 = execution_time(Te) - execution_time(Tc)
|
||||||
|
.IP -
|
||||||
|
N = number of times Te or Tc get executed.
|
||||||
|
.LP
|
||||||
|
We assume that T1 will be the same every
|
||||||
|
time the code gets executed.
|
||||||
|
This is a reasonable assumption.
|
||||||
|
(Note that we are talking about one CAL,
|
||||||
|
not about different calls to the same procedure).
|
||||||
|
Hence
|
||||||
|
.DS
|
||||||
|
T = N * T1
|
||||||
|
.DE
|
||||||
|
T1 can be estimated by a careful analysis
|
||||||
|
of the transformations that are performed.
|
||||||
|
Below, we list everything that will be
|
||||||
|
different when a call is expanded in line:
|
||||||
|
.IP -
|
||||||
|
The CAL instruction is not executed.
|
||||||
|
This saves a subroutine jump.
|
||||||
|
.IP -
|
||||||
|
The instructions in the procedure prolog
|
||||||
|
are not executed.
|
||||||
|
These instructions, generated from the PRO pseudo,
|
||||||
|
save some machine registers
|
||||||
|
(including the old LB), set the new LB and allocate space
|
||||||
|
for the locals of the called routine.
|
||||||
|
The savings may be less if there are no
|
||||||
|
locals to allocate.
|
||||||
|
.IP -
|
||||||
|
In line parameters are not evaluated before the call
|
||||||
|
and are not pushed on the stack.
|
||||||
|
.IP -
|
||||||
|
All remaining parameters are stored in local variables,
|
||||||
|
instead of being pushed on the stack.
|
||||||
|
.IP -
|
||||||
|
If the number of parameters is nonzero,
|
||||||
|
the ASP instruction after the CAL is not executed.
|
||||||
|
.IP -
|
||||||
|
Every reference to an in line parameter is
|
||||||
|
substituted by the parameter expression.
|
||||||
|
.IP -
|
||||||
|
RET (return) instructions are replaced by
|
||||||
|
BRA (branch) instructions.
|
||||||
|
If the called procedure 'falls through'
|
||||||
|
(i.e. it has only one RET, at the end of its code),
|
||||||
|
even the BRA is not needed.
|
||||||
|
.IP -
|
||||||
|
The LFR (fetch function result) is not executed
|
||||||
|
.PP
|
||||||
|
Besides these changes, which are caused directly by IL,
|
||||||
|
other changes may occur as IL influences other optimization
|
||||||
|
techniques, such as Register Allocation and Constant Propagation.
|
||||||
|
Our heuristic rules do not take into account the quite
|
||||||
|
inpredictable effects on Register Allocation.
|
||||||
|
It does, however, favour calls that have numeric \fIconstants\fR
|
||||||
|
as parameter; especially the constant "0" as an inline
|
||||||
|
parameter gets high scores,
|
||||||
|
as further optimizations may often be possible.
|
||||||
|
.PP
|
||||||
|
It cannot be determined statically how often a CAL instruction gets
|
||||||
|
executed.
|
||||||
|
We will use \fIloop nesting\fR information here.
|
||||||
|
The nesting level of the loop in which
|
||||||
|
the CAL appears (if any) will be used as an
|
||||||
|
indication for the number of times it gets executed.
|
||||||
|
.PP
|
||||||
|
Based on all these facts,
|
||||||
|
the pay off of a call will be computed.
|
||||||
|
The following model was developed empirically.
|
||||||
|
Assume procedure P calls procedure Q.
|
||||||
|
The call takes place in basic block B.
|
||||||
|
.DS
|
||||||
|
ZP = # zero parameters
|
||||||
|
CP = # constant parameters - ZP
|
||||||
|
LN = Loop Nesting level (0 if outside any loop)
|
||||||
|
F = \fIif\fR # formal parameters of Q > 0 \fIthen\fR 1 \fIelse\fR 0
|
||||||
|
FT = \fIif\fR Q falls through \fIthen\fR 1 \fIelse\fR 0
|
||||||
|
S = size(Q) - 1 - # inline_parameters - F
|
||||||
|
L = \fIif\fR # local variables of P > 0 \fIthen\fR 0 \fIelse\fR -1
|
||||||
|
A = CP + 2 * ZP
|
||||||
|
N = \fIif\fR LN=0 and P is never called from a loop \fIthen\fR 0 \fIelse\fR (LN+1)**2
|
||||||
|
FM = \fIif\fR B is a firm block \fIthen\fR 2 \fIelse\fR 1
|
||||||
|
|
||||||
|
pay_off = (100/S + FT + F + L + A) * N * FM
|
||||||
|
.DE
|
||||||
|
S stands for the size increase of the program,
|
||||||
|
which is slightly less than the size of Q.
|
||||||
|
The size of a procedure is taken to be its number
|
||||||
|
of (non-pseudo) EM instructions.
|
||||||
|
The terms "loop nesting level" and "firm" were defined
|
||||||
|
in the chapter on the Intermediate Code (section "loop tables").
|
||||||
|
If a call is not inside a loop and the calling procedure
|
||||||
|
is itself never called from a loop (transitively),
|
||||||
|
then the call will probably be executed at most once.
|
||||||
|
Such a call is never expanded in line (its pay off is zero).
|
||||||
|
If the calling procedure doesn't have local variables, a penalty (L)
|
||||||
|
is introduced, as it will most likely get local variables if the
|
||||||
|
call gets expanded.
|
440
doc/ego/il/il5
Normal file
440
doc/ego/il/il5
Normal file
|
@ -0,0 +1,440 @@
|
||||||
|
.NH 2
|
||||||
|
Implementation
|
||||||
|
.PP
|
||||||
|
A major factor in the implementation
|
||||||
|
of Inline Substitution is the requirement
|
||||||
|
not to use an excessive amount of memory.
|
||||||
|
IL essentially analyzes the entire program;
|
||||||
|
it makes decisions based on which procedure calls
|
||||||
|
appear in the whole program.
|
||||||
|
Yet, because of the memory restriction, it is
|
||||||
|
not feasible to read the entire program
|
||||||
|
in main memory.
|
||||||
|
To solve this problem, the IL phase has been
|
||||||
|
split up into three subphases that are executed sequentially:
|
||||||
|
.IP 1.
|
||||||
|
analyze every procedure; see how it accesses its parameters;
|
||||||
|
simultaneously collect all calls
|
||||||
|
appearing in the whole program an put them
|
||||||
|
in a \fIcall-list\fR.
|
||||||
|
.IP 2.
|
||||||
|
use the call-list and decide which calls will be substituted
|
||||||
|
in line.
|
||||||
|
.IP 3.
|
||||||
|
take the decisions of subphase 2 and modify the
|
||||||
|
program accordingly.
|
||||||
|
.LP
|
||||||
|
Subphases 1 and 3 scan the input program; only
|
||||||
|
subphase 3 modifies it.
|
||||||
|
It is essential that the decisions can be made
|
||||||
|
in subphase 2
|
||||||
|
without using the input program,
|
||||||
|
provided that subphase 1 puts enough information
|
||||||
|
in the call-list.
|
||||||
|
Subphase 2 keeps the entire call-list in main memory
|
||||||
|
and repeatedly scans it, to
|
||||||
|
find the next best candidate for expansion.
|
||||||
|
.PP
|
||||||
|
We will specify the
|
||||||
|
data structures used by IL before
|
||||||
|
describing the subphases.
|
||||||
|
.NH 3
|
||||||
|
Data structures
|
||||||
|
.NH 4
|
||||||
|
The procedure table
|
||||||
|
.PP
|
||||||
|
In subphase 1 information is gathered about every procedure
|
||||||
|
and added to the procedure table.
|
||||||
|
This information is used by the heuristic rules.
|
||||||
|
A proctable entry for procedure p has
|
||||||
|
the following extra information:
|
||||||
|
.IP -
|
||||||
|
is it allowed to substitute an invocation of p in line?
|
||||||
|
.IP -
|
||||||
|
is it allowed to put any parameter of such a call in line?
|
||||||
|
.IP -
|
||||||
|
the size of p (number of EM instructions)
|
||||||
|
.IP -
|
||||||
|
does p 'fall through'?
|
||||||
|
.IP -
|
||||||
|
a description of the formal parameters that p accesses; this information
|
||||||
|
is obtained by looking at the code of p. For every parameter f,
|
||||||
|
we record:
|
||||||
|
.RS
|
||||||
|
.IP -
|
||||||
|
the offset of f
|
||||||
|
.IP -
|
||||||
|
the type of f (word, double word, pointer)
|
||||||
|
.IP -
|
||||||
|
may the corresponding actual parameter be put in line?
|
||||||
|
.IP -
|
||||||
|
is f ever accessed indirectly?
|
||||||
|
.IP -
|
||||||
|
if f used: never, once or more than once?
|
||||||
|
.RE
|
||||||
|
.IP -
|
||||||
|
the number of times p is called (see below)
|
||||||
|
.IP -
|
||||||
|
the file address of its call-count information (see below).
|
||||||
|
.LP
|
||||||
|
.NH 4
|
||||||
|
Call-count information
|
||||||
|
.PP
|
||||||
|
As a result of Inline Substitution, some procedures may
|
||||||
|
become useless, because all their invocations have been
|
||||||
|
substituted in line.
|
||||||
|
One of the tasks of IL is to keep track which
|
||||||
|
procedures are no longer called.
|
||||||
|
Note that IL is especially keen on procedures that are
|
||||||
|
called only once
|
||||||
|
(possibly as a result of expanding all other calls to it).
|
||||||
|
So we want to know how many times a procedure
|
||||||
|
is called \fIduring\fR Inline Substitution.
|
||||||
|
It is not good enough to compute this
|
||||||
|
information afterwards.
|
||||||
|
The task is rather complex, because
|
||||||
|
the number of times a procedure is called
|
||||||
|
varies during the entire process:
|
||||||
|
.IP 1.
|
||||||
|
If a call to p is substituted in line,
|
||||||
|
the number of calls to p gets decremented by 1.
|
||||||
|
.IP 2.
|
||||||
|
If a call to p is substituted in line,
|
||||||
|
and p contains n calls to q, then the number of calls to q
|
||||||
|
gets incremented by n.
|
||||||
|
.IP 3.
|
||||||
|
If a procedure p is removed (because it is no
|
||||||
|
longer called) and p contains n calls to q,
|
||||||
|
then the number of calls to q gets decremented by n.
|
||||||
|
.LP
|
||||||
|
(Note that p may be the same as q, if p is recursive).
|
||||||
|
.sp 0
|
||||||
|
So we actually want to have the following information:
|
||||||
|
.DS
|
||||||
|
NRCALL(p,q) = number of call to q appearing in p,
|
||||||
|
|
||||||
|
for all procedures p and q that may be put in line.
|
||||||
|
.DE
|
||||||
|
This information, called \fIcall-count information\fR is
|
||||||
|
computed by the first subphase.
|
||||||
|
It is stored in a file.
|
||||||
|
It is represented as a number of lists, rather than as
|
||||||
|
a (very sparse) matrix.
|
||||||
|
Every procedure has a list of (proc,count) pairs,
|
||||||
|
telling which procedures it calls, and how many times.
|
||||||
|
The file address of its call-count list is stored
|
||||||
|
in its proctable entry.
|
||||||
|
Whenever this information is needed, it is fetched from
|
||||||
|
the file, using direct access.
|
||||||
|
The proctable entry also contains the number of times
|
||||||
|
a procedure is called, at any moment.
|
||||||
|
.NH 4
|
||||||
|
The call-list
|
||||||
|
.PP
|
||||||
|
The call-list is the major data structure use by IL.
|
||||||
|
Every item of the list describes one procedure call.
|
||||||
|
It contains the following attributes:
|
||||||
|
.IP -
|
||||||
|
the calling procedure (caller)
|
||||||
|
.IP -
|
||||||
|
the called procedure (callee)
|
||||||
|
.IP -
|
||||||
|
identification of the CAL instruction (sequence number)
|
||||||
|
.IP -
|
||||||
|
the loop nesting level; our heuristic rules appreciate
|
||||||
|
calls inside a loop (or even inside a loop nested inside
|
||||||
|
another loop, etc.) more than other calls
|
||||||
|
.IP -
|
||||||
|
the actual parameter expressions involved in the call;
|
||||||
|
for every actual, we record:
|
||||||
|
.RS
|
||||||
|
.IP -
|
||||||
|
the EM code of the expression
|
||||||
|
.IP -
|
||||||
|
the number of bytes of its result (size)
|
||||||
|
.IP -
|
||||||
|
an indication if the actual may be put in line
|
||||||
|
.RE
|
||||||
|
.LP
|
||||||
|
The structure of the call-list is rather complex.
|
||||||
|
Whenever a call is expanded in line, new calls
|
||||||
|
will suddenly appear in the program,
|
||||||
|
that were not contained in the original body
|
||||||
|
of the calling subroutine.
|
||||||
|
These calls are inherited from the called procedure.
|
||||||
|
We will refer to these invocations as \fInested calls\fR
|
||||||
|
(see Fig. 5.1).
|
||||||
|
.DS
|
||||||
|
procedure p is
|
||||||
|
begin .
|
||||||
|
a(); .
|
||||||
|
b(); .
|
||||||
|
end;
|
||||||
|
|
||||||
|
procedure r is procedure r is
|
||||||
|
begin begin
|
||||||
|
x(); x();
|
||||||
|
p(); -- in line a(); -- nested call
|
||||||
|
y(); b(); -- nested call
|
||||||
|
end; y();
|
||||||
|
end;
|
||||||
|
|
||||||
|
Fig. 5.1 Example of nested procedure calls
|
||||||
|
.DE
|
||||||
|
Nested calls may subsequently be put in line too
|
||||||
|
(probably resulting in a yet deeper nesting level, etc.).
|
||||||
|
So the call-list does not always reflect the source program,
|
||||||
|
but changes dynamically, as decisions are made.
|
||||||
|
If a call to p is expanded, all calls appearing in p
|
||||||
|
will be added to the call-list.
|
||||||
|
.sp 0
|
||||||
|
A convenient and elegant way to represent
|
||||||
|
the call-list is to use a LISP-like list.
|
||||||
|
.[
|
||||||
|
poel lisp trac
|
||||||
|
.]
|
||||||
|
Calls that appear at the same level
|
||||||
|
are linked in the CDR direction. If a call C
|
||||||
|
to a procedure p is expanded,
|
||||||
|
all calls appearing in p are put in a sub-list
|
||||||
|
of C, i.e. in its CAR.
|
||||||
|
In the example above, before the decision
|
||||||
|
to expand the call to p is made, the
|
||||||
|
call-list of procedure r looks like:
|
||||||
|
.DS
|
||||||
|
(call-to-x, call-to-p, call-to-y)
|
||||||
|
.DE
|
||||||
|
After the decision, it looks like:
|
||||||
|
.DS
|
||||||
|
(call-to-x, (call-to-p*, call-to-a, call-to-b), call-to-y)
|
||||||
|
.DE
|
||||||
|
The call to p is marked, because it has been
|
||||||
|
substituted.
|
||||||
|
Whenever IL wants to traverse the call-list of some procedure,
|
||||||
|
it uses the well-known LISP technique of
|
||||||
|
recursion in the CAR direction and
|
||||||
|
iteration in the CDR direction
|
||||||
|
(see page 1.19-2 of
|
||||||
|
.[
|
||||||
|
poel lisp trac
|
||||||
|
.]
|
||||||
|
).
|
||||||
|
All list traversals look like:
|
||||||
|
.DS
|
||||||
|
traverse(list)
|
||||||
|
{
|
||||||
|
for (c = first(list); c != 0; c = CDR(c)) {
|
||||||
|
if (c is marked) {
|
||||||
|
traverse(CAR(c));
|
||||||
|
} else {
|
||||||
|
do something with c
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
.DE
|
||||||
|
The entire call-list consists of a number of LISP-like lists,
|
||||||
|
one for every procedure.
|
||||||
|
The proctable entry of a procedure contains a pointer
|
||||||
|
to the beginning of the list.
|
||||||
|
.NH 3
|
||||||
|
The first subphase: procedure analysis
|
||||||
|
.PP
|
||||||
|
The tasks of the first subphase are to determine
|
||||||
|
several attributes of every procedure
|
||||||
|
and to construct the basic call-list,
|
||||||
|
i.e. without nested calls.
|
||||||
|
The size of a procedure is determined
|
||||||
|
by simply counting its EM instructions.
|
||||||
|
Pseudo instructions are skipped.
|
||||||
|
A procedure does not 'fall through' if its CFG
|
||||||
|
contains a basic block
|
||||||
|
that is not the last block of the CFG and
|
||||||
|
that ends on a RET instruction.
|
||||||
|
The formal parameters of a procedure are determined
|
||||||
|
by inspection of
|
||||||
|
its code.
|
||||||
|
.PP
|
||||||
|
The call-list in constructed by looking at all CAL instructions
|
||||||
|
appearing in the program.
|
||||||
|
The call-list should only contain calls to procedures
|
||||||
|
that may be put in line.
|
||||||
|
This fact is only known if the procedure was
|
||||||
|
analyzed earlier.
|
||||||
|
If a call to a procedure p appears in the program
|
||||||
|
before the body of p,
|
||||||
|
the call will always be put in the call-list.
|
||||||
|
If p is later found to be unsuitable,
|
||||||
|
the call will be removed from the list by the
|
||||||
|
second subphase.
|
||||||
|
.PP
|
||||||
|
An important issue is the recognition
|
||||||
|
of the actual parameter expressions of the call.
|
||||||
|
The front ends produces messages telling how many
|
||||||
|
bytes of formal parameters every procedure accesses.
|
||||||
|
(If there is no such message for a procedure, it
|
||||||
|
cannot be put in line).
|
||||||
|
The actual parameters together must account for
|
||||||
|
the same number of bytes.A recursive descent parser is used
|
||||||
|
to parse side-effect free EM expressions.
|
||||||
|
It uses a table and some
|
||||||
|
auxiliary routines to determine
|
||||||
|
how many bytes every EM instruction pops from the stack
|
||||||
|
and how many bytes it pushes onto the stack.
|
||||||
|
These numbers depend on the EM instruction, its argument,
|
||||||
|
and the wordsize and pointersize of the target machine.
|
||||||
|
Initially, the parser has to recognize the
|
||||||
|
number of bytes specified in the formals-message,
|
||||||
|
say N.
|
||||||
|
Assume the first instruction before the CAL pops S bytes
|
||||||
|
and pushes R bytes.
|
||||||
|
If R > N, too many bytes are recognized
|
||||||
|
and the parser fails.
|
||||||
|
Else, it calls itself recursively to recognize the
|
||||||
|
S bytes used as operand of the instruction.
|
||||||
|
If it succeeds in doing so, it continues with the next instruction,
|
||||||
|
i.e. the first instruction before the code recognized by
|
||||||
|
the recursive call, to recognize N-R more bytes.
|
||||||
|
The result is a number of EM instructions that collectively push N bytes.
|
||||||
|
If an instruction is come across that has side-effects
|
||||||
|
(e.g. a store or a procedure call) or of which R and S cannot
|
||||||
|
be computed statically (e.g. a LOS), it fails.
|
||||||
|
.sp 0
|
||||||
|
Note that the parser traverses the code backwards.
|
||||||
|
As EM code is essentially postfix code, the parser works top down.
|
||||||
|
.PP
|
||||||
|
If the parser fails to recognize the parameters, the call will not
|
||||||
|
be substituted in line.
|
||||||
|
If the parameters can be determined, they still have to
|
||||||
|
match the formal parameters of the called procedure.
|
||||||
|
This check is performed by the second subphase; it cannot be
|
||||||
|
done here, because it is possible that the called
|
||||||
|
procedure has not been analyzed yet.
|
||||||
|
.PP
|
||||||
|
The entire call-list is written to a file,
|
||||||
|
to be processed by the second subphase.
|
||||||
|
.NH 3
|
||||||
|
The second subphase: making decisions
|
||||||
|
.PP
|
||||||
|
The task of the second subphase is quite easy
|
||||||
|
to understand.
|
||||||
|
It reads the call-list file,
|
||||||
|
builds an incore call-list and deletes every
|
||||||
|
call that may not be expanded in line (either because the called
|
||||||
|
procedure may not be put in line, or because the actual parameters
|
||||||
|
of the call do not match the formal parameters of the called procedure).
|
||||||
|
It assigns a \fIpay-off\fR to every call,
|
||||||
|
indicating how desirable it is to expand it.
|
||||||
|
.PP
|
||||||
|
The subphase repeatedly scans the call-list and takes
|
||||||
|
the call with the highest ratio.
|
||||||
|
The chosen one gets marked,
|
||||||
|
and the call-list is extended with the nested calls,
|
||||||
|
as described above.
|
||||||
|
These nested calls are also assigned a ratio,
|
||||||
|
and will be considered too during the next scans.
|
||||||
|
.sp 0
|
||||||
|
After every decision the number of times
|
||||||
|
every procedure is called is updated, using
|
||||||
|
the call-count information.
|
||||||
|
Meanwhile, the subphase keeps track of the amount of space left
|
||||||
|
available.
|
||||||
|
If all space is used, or if there are no more calls left to
|
||||||
|
be expanded, it exits this loop.
|
||||||
|
Finally, calls to procedures that are called only
|
||||||
|
once are also chosen.
|
||||||
|
.PP
|
||||||
|
The actual parameters of a call are only needed by
|
||||||
|
this subphase to assign a ratio to a call.
|
||||||
|
To save some space, these actuals are not kept in main memory.
|
||||||
|
They are removed after the call has been read and a ratio
|
||||||
|
has been assigned to it.
|
||||||
|
So this subphase works with \fIabstracts\fR of calls.
|
||||||
|
After all work has been done,
|
||||||
|
the actual parameters of the chosen calls are retrieved
|
||||||
|
from a file,
|
||||||
|
as they are needed by the transformation subphase.
|
||||||
|
.NH 3
|
||||||
|
The third subphase: doing transformations
|
||||||
|
.PP
|
||||||
|
The third subphase makes the actual modifications to
|
||||||
|
the EM text.
|
||||||
|
It is directed by the decisions made in the previous subphase,
|
||||||
|
as expressed via the call-list.
|
||||||
|
The call-list read by this subphase contains
|
||||||
|
only calls that were selected for expansion.
|
||||||
|
The list is ordered in the same way as the EM text,
|
||||||
|
i.e. if a call C1 appears before a call C2 in the call-list,
|
||||||
|
C1 also appears before C2 in the EM text.
|
||||||
|
So the EM text is traversed linearly,
|
||||||
|
the calls that have to be substituted are determined
|
||||||
|
and the modifications are made.
|
||||||
|
If a procedure is come across that is no longer needed,
|
||||||
|
it is simply not written to the output EM file.
|
||||||
|
The substitution of a call takes place in distinct steps:
|
||||||
|
.IP "change the calling sequence" 7
|
||||||
|
.sp 0
|
||||||
|
The actual parameter expressions are changed.
|
||||||
|
Parameters that are put in line are removed.
|
||||||
|
All remaining ones must store their result in a
|
||||||
|
temporary local variable, rather than
|
||||||
|
push it on the stack.
|
||||||
|
The CAL instruction and any ASP (to pop actual parameters)
|
||||||
|
or LFR (to fetch the result of a function)
|
||||||
|
are deleted.
|
||||||
|
.IP "fetch the text of the called procedure"
|
||||||
|
.sp 0
|
||||||
|
Direct disk access is used to to read the text of the
|
||||||
|
called procedure.
|
||||||
|
The file offset is obtained from the proctable entry.
|
||||||
|
.IP "allocate bytes for locals and temporaries"
|
||||||
|
.sp 0
|
||||||
|
The local variables of the called procedure will be put in the
|
||||||
|
stack frame of the calling procedure.
|
||||||
|
The same applies to any temporary variables
|
||||||
|
that hold the result of parameters
|
||||||
|
that were not put in line.
|
||||||
|
The proctable entry of the caller is updated.
|
||||||
|
.IP "put a label after the CAL"
|
||||||
|
.sp 0
|
||||||
|
If the called procedure contains a RET (return) instruction
|
||||||
|
somewhere in the middle of its text (i.e. it does
|
||||||
|
not fall through), the RET must be changed into
|
||||||
|
a BRA (branch), to jump over the
|
||||||
|
remainder of the text.
|
||||||
|
This label is not needed if the called
|
||||||
|
procedure falls through.
|
||||||
|
.IP "copy the text of the called procedure and modify it"
|
||||||
|
.sp 0
|
||||||
|
References to local variables of the called routine
|
||||||
|
and to parameters that are not put in line
|
||||||
|
are changed to refer to the
|
||||||
|
new local of the caller.
|
||||||
|
References to in line parameters are replaced
|
||||||
|
by the actual parameter expression.
|
||||||
|
Returns (RETs) are either deleted or
|
||||||
|
replaced by a BRA.
|
||||||
|
Messages containing information about local
|
||||||
|
variables or parameters are changed.
|
||||||
|
Global data declarations and the PRO and END pseudos
|
||||||
|
are removed.
|
||||||
|
Instruction labels and references to them are
|
||||||
|
changed to make sure they do not have the
|
||||||
|
same identifying number as
|
||||||
|
labels in the calling procedure.
|
||||||
|
.IP "insert the modified text"
|
||||||
|
.sp 0
|
||||||
|
The pseudos of the called procedure are put after the pseudos
|
||||||
|
of the calling procedure.
|
||||||
|
The real text of the callee is put at
|
||||||
|
the place where the CAL was.
|
||||||
|
.IP "take care of nested substitutions"
|
||||||
|
.sp 0
|
||||||
|
The expanded procedure may contain calls that
|
||||||
|
have to be expanded too (nested calls).
|
||||||
|
If the descriptor of this call contains actual
|
||||||
|
parameter expressions,
|
||||||
|
the code of the expressions has to be changed
|
||||||
|
the same way as the code of the callee was changed.
|
||||||
|
Next, the entire process of finding CALs and doing
|
||||||
|
the substitutions is repeated recursively.
|
||||||
|
.LP
|
27
doc/ego/il/il6
Normal file
27
doc/ego/il/il6
Normal file
|
@ -0,0 +1,27 @@
|
||||||
|
.NH 2
|
||||||
|
Source files of IL
|
||||||
|
.PP
|
||||||
|
The sources of IL are in the following files
|
||||||
|
and packages (the prefixes 1_, 2_ and 3_ refer to the three subphases):
|
||||||
|
.IP il.h: 14
|
||||||
|
declarations of global variables and
|
||||||
|
data structures
|
||||||
|
.IP il.c:
|
||||||
|
the routine main; the driving routines of the three subphases
|
||||||
|
.IP 1_anal:
|
||||||
|
contains a subroutine that analyzes a procedure
|
||||||
|
.IP 1_cal:
|
||||||
|
contains a subroutine that analyzes a call
|
||||||
|
.IP 1_aux:
|
||||||
|
implements auxiliary procedures used by subphase 1
|
||||||
|
.IP 2_aux:
|
||||||
|
implements auxiliary procedures used by subphase 2
|
||||||
|
.IP 3_subst:
|
||||||
|
the driving routine for doing the substitution
|
||||||
|
.IP 3_change:
|
||||||
|
lower level routines that do certain modifications
|
||||||
|
.IP 3_aux:
|
||||||
|
implements auxiliary procedures used by subphase 3
|
||||||
|
.IP aux
|
||||||
|
implements auxiliary procedures used by several subphases.
|
||||||
|
.LP
|
7
doc/ego/intro/head
Normal file
7
doc/ego/intro/head
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
.ND
|
||||||
|
.ll 80m
|
||||||
|
.nr LL 80m
|
||||||
|
.nr tl 78m
|
||||||
|
.tr ~
|
||||||
|
.ds >. .
|
||||||
|
.ds [. " \[
|
79
doc/ego/intro/intro1
Normal file
79
doc/ego/intro/intro1
Normal file
|
@ -0,0 +1,79 @@
|
||||||
|
.TL
|
||||||
|
The design and implementation of
|
||||||
|
the EM Global Optimizer
|
||||||
|
.AU
|
||||||
|
H.E. Bal
|
||||||
|
.AI
|
||||||
|
Vrije Universiteit
|
||||||
|
Wiskundig Seminarium, Amsterdam
|
||||||
|
.AB
|
||||||
|
The EM Global Optimizer is part of the Amsterdam Compiler Kit,
|
||||||
|
a toolkit for making retargetable compilers.
|
||||||
|
It optimizes the intermediate code common to all compilers of
|
||||||
|
the toolkit (EM),
|
||||||
|
so it can be used for all programming languages and
|
||||||
|
all processors supported by the kit.
|
||||||
|
.PP
|
||||||
|
The optimizer is based on well-understood concepts like
|
||||||
|
control flow analysis and data flow analysis.
|
||||||
|
It performs the following optimizations:
|
||||||
|
Inline Substitution, Strength Reduction, Common Subexpression Elimination,
|
||||||
|
Stack Pollution, Cross Jumping, Branch Optimization, Copy Propagation,
|
||||||
|
Constant Propagation, Dead Code Elimination and Register Allocation.
|
||||||
|
.PP
|
||||||
|
This report describes the design of the optimizer and several
|
||||||
|
of its implementation issues.
|
||||||
|
.AE
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
.FS
|
||||||
|
This work was supported by the
|
||||||
|
Stichting Technische Wetenschappen (STW)
|
||||||
|
under grant VWI00.0001.
|
||||||
|
.FE
|
||||||
|
The EM Global Optimizer is part of a software toolkit
|
||||||
|
for making production-quality retargetable compilers.
|
||||||
|
This toolkit,
|
||||||
|
called the Amsterdam Compiler Kit
|
||||||
|
.[
|
||||||
|
tanenbaum toolkit rapport
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
tanenbaum toolkit cacm
|
||||||
|
.]
|
||||||
|
runs under the Unix*
|
||||||
|
.FS
|
||||||
|
*Unix is a Trademark of Bell Laboratories
|
||||||
|
.FE
|
||||||
|
operating system.
|
||||||
|
.sp 0
|
||||||
|
The main design philosophy of the toolkit is to use
|
||||||
|
a language- and machine-independent
|
||||||
|
intermediate code, called EM.
|
||||||
|
.[
|
||||||
|
keizer architecture
|
||||||
|
.]
|
||||||
|
The basic compilation process can be split up into
|
||||||
|
two parts.
|
||||||
|
A language-specific front end translates the source program into EM.
|
||||||
|
A machine-specific back end transforms EM to assembly code
|
||||||
|
of the target machine.
|
||||||
|
.PP
|
||||||
|
The global optimizer is an optional phase of the
|
||||||
|
compilation process, and can be used to obtain
|
||||||
|
machine code of a higher quality.
|
||||||
|
The optimizer transforms EM-code to better EM-code,
|
||||||
|
so it comes between the front end and the back end.
|
||||||
|
It can be used with any combination of languages
|
||||||
|
and machines, as far as they are supported by
|
||||||
|
the compiler kit.
|
||||||
|
.PP
|
||||||
|
This report describes the design of the
|
||||||
|
global optimizer and several of its
|
||||||
|
implementation issues.
|
||||||
|
Measurements can be found in.
|
||||||
|
.[
|
||||||
|
bal tanenbaum global
|
||||||
|
.]
|
3
doc/ego/intro/tail
Normal file
3
doc/ego/intro/tail
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
.[
|
||||||
|
$LIST$
|
||||||
|
.]
|
95
doc/ego/lv/lv1
Normal file
95
doc/ego/lv/lv1
Normal file
|
@ -0,0 +1,95 @@
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Live-Variable analysis
|
||||||
|
.NH 2
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
The "Live-Variable analysis" optimization technique (LV)
|
||||||
|
performs some code improvements and computes information that may be
|
||||||
|
used by subsequent optimizations.
|
||||||
|
The main task of this phase is the
|
||||||
|
computation of \fIlive-variable information\fR.
|
||||||
|
.[~[
|
||||||
|
aho compiler design
|
||||||
|
.] section 14.4]
|
||||||
|
A variable A is said to be \fIdead\fR at some point p of the
|
||||||
|
program text, if on no path in the control flow graph
|
||||||
|
from p to a RET (return), A can be used before being changed;
|
||||||
|
else A is said to be \fIlive\fR.
|
||||||
|
.PP
|
||||||
|
A statement of the form
|
||||||
|
.DS
|
||||||
|
VARIABLE := EXPRESSION
|
||||||
|
.DE
|
||||||
|
is said to be dead if the left hand side variable is dead just after
|
||||||
|
the statement and the right hand side expression has no
|
||||||
|
side effects (i.e. it doesn't change any variable).
|
||||||
|
Such a statement can be eliminated entirely.
|
||||||
|
Dead code will seldom be present in the original program,
|
||||||
|
but it may be the result of earlier optimizations,
|
||||||
|
such as copy propagation.
|
||||||
|
.PP
|
||||||
|
Live-variable information is passed to other phases via
|
||||||
|
messages in the EM code.
|
||||||
|
Live/dead messages are generated at points in the EM text where
|
||||||
|
variables become dead or live.
|
||||||
|
This information is especially useful for the Register
|
||||||
|
Allocation phase.
|
||||||
|
.NH 2
|
||||||
|
Implementation
|
||||||
|
.PP
|
||||||
|
The implementation uses algorithm 14.6 of.
|
||||||
|
.[
|
||||||
|
aho compiler design
|
||||||
|
.]
|
||||||
|
First two sets DEF and USE are computed for every basic block b:
|
||||||
|
.IP DEF(b) 9
|
||||||
|
the set of all variables that are assigned a value in b before
|
||||||
|
being used
|
||||||
|
.IP USE(b) 9
|
||||||
|
the set of all variables that may be used in b before being changed.
|
||||||
|
.LP
|
||||||
|
(So variables that may, but need not, be used resp. changed via a procedure
|
||||||
|
call or through a pointer are included in USE but not in DEF).
|
||||||
|
The next step is to compute the sets IN and OUT :
|
||||||
|
.IP IN[b] 9
|
||||||
|
the set of all variables that are live at the beginning of b
|
||||||
|
.IP OUT[b] 9
|
||||||
|
the set of all variables that are live at the end of b
|
||||||
|
.LP
|
||||||
|
IN and OUT can be computed for all blocks simultaneously by solving the
|
||||||
|
data flow equations:
|
||||||
|
.DS
|
||||||
|
(1) IN[b] = OUT[b] - DEF[b] + USE[b]
|
||||||
|
[2] OUT[b] = IN[s1] + ... + IN[sn] ;
|
||||||
|
where SUCC[b] = {s1, ... , sn}
|
||||||
|
.DE
|
||||||
|
The equations are solved by a similar algorithm as for
|
||||||
|
the Use Definition equations (see previous chapter).
|
||||||
|
.PP
|
||||||
|
Finally, each basic block is visited in turn to remove its dead code
|
||||||
|
and to emit the live/dead messages.
|
||||||
|
Every basic block b is traversed from its last
|
||||||
|
instruction backwards to the beginning of b.
|
||||||
|
Initially, all variables that are dead at the end
|
||||||
|
of b are marked dead. All others are marked live.
|
||||||
|
If we come across an assignment to a variable X that
|
||||||
|
was marked live, a live-message is put after the
|
||||||
|
assignment and X is marked dead;
|
||||||
|
if X was marked dead, the assignment may be removed, provided that
|
||||||
|
the right hand side expression contains no side effects.
|
||||||
|
If we come across a use of a variable X that
|
||||||
|
was marked dead, a dead-message is put after the
|
||||||
|
use and X is marked live.
|
||||||
|
So at any point, the mark of X tells whether X is
|
||||||
|
live or dead immediately before that point.
|
||||||
|
A message is also generated at the start of a basic block
|
||||||
|
for every variable that was live at the end of the (textually)
|
||||||
|
previous block, but dead at the entry of this block, or v.v.
|
||||||
|
.PP
|
||||||
|
Only local variables are considered.
|
||||||
|
This significantly reduces the memory needed by this phase,
|
||||||
|
eases the implementation and is hardly less efficient than
|
||||||
|
considering all variables.
|
||||||
|
(Note that it is very hard to prove that an assignment to
|
||||||
|
a global variable is dead).
|
371
doc/ego/ov/ov1
Normal file
371
doc/ego/ov/ov1
Normal file
|
@ -0,0 +1,371 @@
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Overview of the global optimizer
|
||||||
|
.NH 2
|
||||||
|
The ACK compilation process
|
||||||
|
.PP
|
||||||
|
The EM Global Optimizer is one of three optimizers that are
|
||||||
|
part of the Amsterdam Compiler Kit (ACK).
|
||||||
|
The phases of ACK are:
|
||||||
|
.IP 1.
|
||||||
|
A Front End translates a source program to EM
|
||||||
|
.IP 2.
|
||||||
|
The Peephole Optimizer
|
||||||
|
.[
|
||||||
|
tanenbaum staveren peephole toplass
|
||||||
|
.]
|
||||||
|
reads EM code and produces 'better' EM code.
|
||||||
|
It performs a number of optimizations (mostly peephole
|
||||||
|
optimizations)
|
||||||
|
such as constant folding, strength reduction and unreachable code
|
||||||
|
elimination.
|
||||||
|
.IP 3.
|
||||||
|
The Global Optimizer further improves the EM code.
|
||||||
|
.IP 4.
|
||||||
|
The Code Generator transforms EM to assembly code
|
||||||
|
of the target computer.
|
||||||
|
.IP 5.
|
||||||
|
The Target Optimizer improves the assembly code.
|
||||||
|
.IP 6.
|
||||||
|
An Assembler/Loader generates an executable file.
|
||||||
|
.LP
|
||||||
|
For a more extensive overview of the ACK compilation process,
|
||||||
|
we refer to.
|
||||||
|
.[
|
||||||
|
tanenbaum toolkit rapport
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
tanenbaum toolkit cacm
|
||||||
|
.]
|
||||||
|
.PP
|
||||||
|
The input of the Global Optimizer may consist of files and
|
||||||
|
libraries.
|
||||||
|
Every file or module in the library must contain EM code in
|
||||||
|
Compact Assembly Language format.
|
||||||
|
.[~[
|
||||||
|
tanenbaum machine architecture
|
||||||
|
.], section 11.2]
|
||||||
|
The output consists of one such EM file.
|
||||||
|
The input files and libraries together need not
|
||||||
|
constitute an entire program,
|
||||||
|
although as much of the program as possible should be supplied.
|
||||||
|
The more information about the program the optimizer
|
||||||
|
gets, the better its output code will be.
|
||||||
|
.PP
|
||||||
|
The Global Optimizer is language- and machine-independent,
|
||||||
|
i.e. it can be used for all languages and machines supported by ACK.
|
||||||
|
Yet, it puts some unavoidable restrictions on the EM code
|
||||||
|
produced by the Front End (see below).
|
||||||
|
It must have some knowledge of the target machine.
|
||||||
|
This knowledge is expressed in a machine description table
|
||||||
|
which is passed as argument to the optimizer.
|
||||||
|
This table does not contain very detailed information about the
|
||||||
|
target (such as its instruction set and addressing modes).
|
||||||
|
.NH 2
|
||||||
|
The EM code
|
||||||
|
.PP
|
||||||
|
The definition of EM, the intermediate code of all ACK compilers,
|
||||||
|
is given in a separate document.
|
||||||
|
.[
|
||||||
|
tanenbaum machine architecture
|
||||||
|
.]
|
||||||
|
We will only discuss some features of EM that are most relevant
|
||||||
|
to the Global Optimizer.
|
||||||
|
.PP
|
||||||
|
EM is the assembly code of a virtual \fIstack machine\fR.
|
||||||
|
All operations are performed on the top of the stack.
|
||||||
|
For example, the statement "A := B + 3" may be expressed in EM as:
|
||||||
|
.DS
|
||||||
|
LOL -4 -- push local variable B
|
||||||
|
LOC 3 -- push constant 3
|
||||||
|
ADI 2 -- add two 2-byte items on top of
|
||||||
|
-- the stack and push the result
|
||||||
|
STL -2 -- pop A
|
||||||
|
.DE
|
||||||
|
So EM is essentially a \fIpostfix\fR code.
|
||||||
|
.PP
|
||||||
|
EM has a rich instruction set, containing several arithmetic
|
||||||
|
and logical operators.
|
||||||
|
It also contains special-case instructions (such as INCrement).
|
||||||
|
.PP
|
||||||
|
EM has \fIglobal\fR (\fIexternal\fR) variables, accessible
|
||||||
|
by all procedures and \fIlocal\fR variables, accessible by a few
|
||||||
|
(nested) procedures.
|
||||||
|
The local variables of a lexically enclosing procedure may
|
||||||
|
be accessed via a \fIstatic link\fR.
|
||||||
|
EM has instructions to follow the static chain.
|
||||||
|
There are EM instruction to allow a procedure
|
||||||
|
to access its local variables directly (such as LOL and STL above).
|
||||||
|
Local variables are referenced via an offset in the stack frame
|
||||||
|
of the procedure, rather than by their names (e.g. -2 and -4 above).
|
||||||
|
The EM code does not contain the (source language) type
|
||||||
|
of the variables.
|
||||||
|
.PP
|
||||||
|
All structured statements in the source program are expressed in
|
||||||
|
low level jump instructions.
|
||||||
|
Besides conditional and unconditional branch instructions, there are
|
||||||
|
two case instructions (CSA and CSB),
|
||||||
|
to allow efficient translation of case statements.
|
||||||
|
.NH 2
|
||||||
|
Requirements on the EM input
|
||||||
|
.PP
|
||||||
|
As the optimizer should be useful for all languages,
|
||||||
|
it clearly should not put severe restrictions on the EM code
|
||||||
|
of the input.
|
||||||
|
There is, however, one immovable requirement:
|
||||||
|
it must be possible to determine the \fIflow of control\fR of the
|
||||||
|
input program.
|
||||||
|
As virtually all global optimizations are based on control flow information,
|
||||||
|
the optimizer would be totally powerless without it.
|
||||||
|
For this reason we restrict the usage of the case jump instructions (CSA/CSB)
|
||||||
|
of EM.
|
||||||
|
Such an instruction is always called with the address of a case descriptor
|
||||||
|
on top the the stack.
|
||||||
|
.[~[
|
||||||
|
tanenbaum machine architecture
|
||||||
|
.] section 7.4]
|
||||||
|
This descriptor contains the labels of all possible
|
||||||
|
destinations of the jump.
|
||||||
|
We demand that all case descriptors are allocated in a global
|
||||||
|
data fragment of type ROM, i.e. the case descriptors
|
||||||
|
may not be modifyable.
|
||||||
|
Furthermore, any case instruction should be immediately preceded by
|
||||||
|
a LAE (Load Address External) instruction, that loads the
|
||||||
|
address of the descriptor,
|
||||||
|
so the descriptor can be uniquely identified.
|
||||||
|
.PP
|
||||||
|
The optimizer will work improperly if the user deceives the control flow.
|
||||||
|
We will give two methods to do this.
|
||||||
|
.PP
|
||||||
|
In "C" the notorious library routines "setjmp" and "longjmp"
|
||||||
|
.[
|
||||||
|
unix programmer's manual
|
||||||
|
.]
|
||||||
|
may be used to jump out of a procedure,
|
||||||
|
but can also be used for a number of other stuffy purposes,
|
||||||
|
for example, to create an extra entry point in a loop.
|
||||||
|
.DS
|
||||||
|
while (condition) {
|
||||||
|
....
|
||||||
|
setjmp(buf);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
...
|
||||||
|
longjmp(buf);
|
||||||
|
.DE
|
||||||
|
The invocation to longjmp actually is a jump to the place of
|
||||||
|
the last call to setjmp with the same argument (buf).
|
||||||
|
As the calls to setjmp and longjmp are indistinguishable from
|
||||||
|
normal procedure calls, the optimizer will not see the danger.
|
||||||
|
No need to say that several loop optimizations will behave
|
||||||
|
unexpectedly when presented with such pathological input.
|
||||||
|
.PP
|
||||||
|
Another way to deceive the flow of control is
|
||||||
|
by using exception handling routines.
|
||||||
|
Ada*
|
||||||
|
.FS
|
||||||
|
* Ada is a registered trademark of the U.S. Government
|
||||||
|
(Ada Joint Program Office).
|
||||||
|
.FE
|
||||||
|
has clearly recognized the dangers of exception handling,
|
||||||
|
but other languages (such as PL/I) have not.
|
||||||
|
.[
|
||||||
|
ada rationale
|
||||||
|
.]
|
||||||
|
.PP
|
||||||
|
The optimizer will be more effective if the EM input contains
|
||||||
|
some extra information about the source program.
|
||||||
|
Especially the \fIregister message\fR is very important.
|
||||||
|
These messages indicate which local variables may never be
|
||||||
|
accessed indirectly.
|
||||||
|
Most optimizations benefit significantly by this information.
|
||||||
|
.PP
|
||||||
|
The Inline Substitution technique needs to know how many bytes
|
||||||
|
of formal parameters every procedure accesses.
|
||||||
|
Only calls to procedures for which the EM code contains this information
|
||||||
|
will be substituted in line.
|
||||||
|
.NH 2
|
||||||
|
Structure of the optimizer
|
||||||
|
.PP
|
||||||
|
The Global Optimizer is organized as a number of \fIphases\fR,
|
||||||
|
each one performing some task.
|
||||||
|
The main structure is as follows:
|
||||||
|
.IP IC 6
|
||||||
|
the Intermediate Code construction phase transforms EM into the
|
||||||
|
intermediate code (ic) of the optimizer
|
||||||
|
.IP CF
|
||||||
|
the Control Flow phase extends the ic with control flow
|
||||||
|
information and interprocedural information
|
||||||
|
.IP OPTs
|
||||||
|
zero or more optimization phases, each one performing one or
|
||||||
|
more related optimizations
|
||||||
|
.IP CA
|
||||||
|
the Compact Assembly phase generates Compact Assembly Language EM code
|
||||||
|
out of ic.
|
||||||
|
.LP
|
||||||
|
.PP
|
||||||
|
An important issue in the design of a global optimizer is the
|
||||||
|
interaction between optimization techniques.
|
||||||
|
It is often advantageous to combine several techniques in
|
||||||
|
one algorithm that takes into account all interactions between them.
|
||||||
|
Ideally, one single algorithm should be developed that does
|
||||||
|
all optimizations simultaneously and deals with all possible interactions.
|
||||||
|
In practice, such an algorithm is still far out of reach.
|
||||||
|
Instead some rather ad hoc (albeit important) combinations are chosen,
|
||||||
|
such as Common Subexpression Elimination and Register Allocation.
|
||||||
|
.[
|
||||||
|
prabhala sethi common subexpressions
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
sethi ullman optimal code
|
||||||
|
.]
|
||||||
|
.PP
|
||||||
|
In the Em Global Optimizer there is one separate algorithm for
|
||||||
|
every technique.
|
||||||
|
Note that this does not mean that all techniques are independent
|
||||||
|
of each other.
|
||||||
|
.PP
|
||||||
|
In principle, the optimization phases can be run in any order;
|
||||||
|
a phase may even be run more than once.
|
||||||
|
However, the following rules should be obeyed:
|
||||||
|
.IP -
|
||||||
|
the Live Variable analysis phase (LV) must be run prior to
|
||||||
|
Register Allocation (RA), as RA uses information outputted by LV.
|
||||||
|
.IP -
|
||||||
|
RA should be the last phase; this is a consequence of the way
|
||||||
|
the interface between RA and the Code Generator is defined.
|
||||||
|
.LP
|
||||||
|
The ordering of the phases has significant impact on
|
||||||
|
the quality of the produced code.
|
||||||
|
In
|
||||||
|
.[
|
||||||
|
wulf overview production quality carnegie-mellon
|
||||||
|
.]
|
||||||
|
two kinds of phase ordering problems are distinguished.
|
||||||
|
If two techniques A and B both take away opportunities of each other,
|
||||||
|
there is a "negative" ordering problem.
|
||||||
|
If, on the other hand, both A and B introduce new optimization
|
||||||
|
opportunities for each other, the problem is called "positive".
|
||||||
|
In the Global Optimizer the following interactions must be
|
||||||
|
taken into account:
|
||||||
|
.IP -
|
||||||
|
Inline Substitution (IL) may create new opportunities for most
|
||||||
|
other techniques, so it should be run as early as possible
|
||||||
|
.IP -
|
||||||
|
Use Definition analysis (UD) may introduce opportunities for LV.
|
||||||
|
.IP -
|
||||||
|
Strength Reduction may create opportunities for UD
|
||||||
|
.LP
|
||||||
|
The optimizer has a default phase ordering, which can
|
||||||
|
be changed by the user.
|
||||||
|
.NH 2
|
||||||
|
Structure of this document
|
||||||
|
.PP
|
||||||
|
The remaining chapters of this document each describe one
|
||||||
|
phase of the optimizer.
|
||||||
|
For every phase, we describe its task, its design,
|
||||||
|
its implementation, and its source files.
|
||||||
|
The latter two sections are intended to aid the
|
||||||
|
maintenance of the optimizer and
|
||||||
|
can be skipped by the initial reader.
|
||||||
|
.NH 2
|
||||||
|
References
|
||||||
|
.PP
|
||||||
|
There are very
|
||||||
|
few modern textbooks on optimization.
|
||||||
|
Chapters 12, 13, and 14 of
|
||||||
|
.[
|
||||||
|
aho compiler design
|
||||||
|
.]
|
||||||
|
are a good introduction to the subject.
|
||||||
|
Wulf et. al.
|
||||||
|
.[
|
||||||
|
wulf optimizing compiler
|
||||||
|
.]
|
||||||
|
describe one specific optimizing (Bliss) compiler.
|
||||||
|
Anklam et. al.
|
||||||
|
.[
|
||||||
|
anklam vax-11
|
||||||
|
.]
|
||||||
|
discuss code generation and optimization in
|
||||||
|
compilers for one specific machine (a Vax-11).
|
||||||
|
Kirchgaesner et. al.
|
||||||
|
.[
|
||||||
|
optimizing ada compiler
|
||||||
|
.]
|
||||||
|
present a brief description of many
|
||||||
|
optimizations; the report also contains a lengthy (over 60 pages)
|
||||||
|
bibliography.
|
||||||
|
.PP
|
||||||
|
The number of articles on optimization is quite impressive.
|
||||||
|
The Lowrey and Medlock paper on the Fortran H compiler
|
||||||
|
.[
|
||||||
|
object code optimization
|
||||||
|
.]
|
||||||
|
is a classical one.
|
||||||
|
Other papers on global optimization are.
|
||||||
|
.[
|
||||||
|
faiman optimizing pascal
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
perkins sites
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
harrison general purpose optimizing
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
morel partial redundancies
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
Mintz global optimizer
|
||||||
|
.]
|
||||||
|
Freudenberger
|
||||||
|
.[
|
||||||
|
freudenberger setl optimizer
|
||||||
|
.]
|
||||||
|
describes an optimizer for a Very High Level Language (SETL).
|
||||||
|
The Production-Quality Compiler-Compiler (PQCC) project uses
|
||||||
|
very sophisticated compiler techniques, as described in.
|
||||||
|
.[
|
||||||
|
wulf overview ieee
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
wulf overview carnegie-mellon
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
wulf machine-relative
|
||||||
|
.]
|
||||||
|
.PP
|
||||||
|
Several Ph.D. theses are dedicated to optimization.
|
||||||
|
Davidson
|
||||||
|
.[
|
||||||
|
davidson simplifying
|
||||||
|
.]
|
||||||
|
outlines a machine-independent peephole optimizer that
|
||||||
|
improves assembly code.
|
||||||
|
Katkus
|
||||||
|
.[
|
||||||
|
katkus
|
||||||
|
.]
|
||||||
|
describes how efficient programs can be obtained at little cost by
|
||||||
|
optimizing only a small part of a program.
|
||||||
|
Photopoulos
|
||||||
|
.[
|
||||||
|
photopoulos mixed code
|
||||||
|
.]
|
||||||
|
discusses the idea of generating interpreted intermediate code as well
|
||||||
|
as assembly code, to obtain programs that are both small and fast.
|
||||||
|
Shaffer
|
||||||
|
.[
|
||||||
|
shaffer automatic
|
||||||
|
.]
|
||||||
|
describes the theory of automatic subroutine generation.
|
||||||
|
.]
|
||||||
|
Leverett
|
||||||
|
.[
|
||||||
|
leverett register allocation compilers
|
||||||
|
.]
|
||||||
|
deals with register allocation in the PQCC compilers.
|
||||||
|
.PP
|
||||||
|
References to articles about specific optimization techniques
|
||||||
|
will be given in later chapters.
|
33
doc/ego/ra/ra1
Normal file
33
doc/ego/ra/ra1
Normal file
|
@ -0,0 +1,33 @@
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Register Allocation
|
||||||
|
.NH 2
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
The efficient usage of the general purpose registers
|
||||||
|
of the target machine plays a key role in any optimizing compiler.
|
||||||
|
This subject, often referred to as \fIRegister Allocation\fR,
|
||||||
|
has great impact on both the code generator and the
|
||||||
|
optimizing part of such a compiler.
|
||||||
|
The code generator needs registers for at least the evaluation of
|
||||||
|
arithmetic expressions;
|
||||||
|
the optimizer uses the registers to decrease the access costs
|
||||||
|
of frequently used entities (such as variables).
|
||||||
|
The design of an optimizing compiler must pay great
|
||||||
|
attention to the cooperation of optimization, register allocation
|
||||||
|
and code generation.
|
||||||
|
.PP
|
||||||
|
Register allocation has received much attention in literature (see
|
||||||
|
.[
|
||||||
|
leverett register allocation compilers
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
chaitin register coloring
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
freiburghouse usage counts
|
||||||
|
.]
|
||||||
|
and
|
||||||
|
.[~[
|
||||||
|
sites register
|
||||||
|
.]]).
|
139
doc/ego/ra/ra2
Normal file
139
doc/ego/ra/ra2
Normal file
|
@ -0,0 +1,139 @@
|
||||||
|
.NH 2
|
||||||
|
Usage of registers in ACK compilers
|
||||||
|
.PP
|
||||||
|
We will first describe the major design decisions
|
||||||
|
of the Amsterdam Compiler Kit,
|
||||||
|
as far as they concern register allocation.
|
||||||
|
Subsequently we will outline
|
||||||
|
the role of the Global Optimizer in the register
|
||||||
|
allocation process and the interface
|
||||||
|
between the code generator and the optimizer.
|
||||||
|
.NH 3
|
||||||
|
Usage of registers without the intervention of the Global Optimizer
|
||||||
|
.PP
|
||||||
|
Registers are used for two purposes:
|
||||||
|
.IP 1.
|
||||||
|
for the evaluation of arithmetic expressions
|
||||||
|
.IP 2.
|
||||||
|
to hold local variables, for the duration of the procedure they
|
||||||
|
are local to.
|
||||||
|
.LP
|
||||||
|
It is essential to note that no translation part of the compilers,
|
||||||
|
except for the code generator, knows anything at all
|
||||||
|
about the register set of the target computer.
|
||||||
|
Hence all decisions about registers are ultimately made by
|
||||||
|
the code generator.
|
||||||
|
Earlier phases of a compiler can only \fIadvise\fR the code generator.
|
||||||
|
.PP
|
||||||
|
The code generator splits the register set into two:
|
||||||
|
a fixed part for the evaluation of expressions (called \fIscratch\fR
|
||||||
|
registers) and a fixed part to store local variables.
|
||||||
|
This partitioning, which depends only on the target computer, significantly
|
||||||
|
reduces the complexity of register allocation, at the penalty
|
||||||
|
of some loss of code quality.
|
||||||
|
.PP
|
||||||
|
The code generator has some (machine-dependent) knowledge of the access costs
|
||||||
|
of memory locations and registers and of the costs of saving and
|
||||||
|
restoring registers. (Registers are always saved by the \fIcalled\fR
|
||||||
|
procedure).
|
||||||
|
This knowledge is expressed in a set of procedures for each target machine.
|
||||||
|
The code generator also knows how many registers there are and of
|
||||||
|
which type they are.
|
||||||
|
A register can be of type \fIpointer\fR, \fIfloating point\fR
|
||||||
|
or \fIgeneral\fR.
|
||||||
|
.PP
|
||||||
|
The front ends of the compilers determine which local variables may
|
||||||
|
be put in a register;
|
||||||
|
such a variable may never be accessed indirectly (i.e. through a pointer).
|
||||||
|
The front end also determines the types and sizes of these variables.
|
||||||
|
The type can be any of the register types or the type \fIloop variable\fR,
|
||||||
|
which denotes a general-typed variable that is used as loop variable
|
||||||
|
in a for-statement.
|
||||||
|
All this information is collected in a \fIregister message\fR in
|
||||||
|
the EM code.
|
||||||
|
Such a message is a pseudo EM instruction.
|
||||||
|
This message also contains a \fIscore\fR field,
|
||||||
|
indicating how desirable it is to put this variable in a register.
|
||||||
|
A front end may assign a high score to a variable if it
|
||||||
|
was declared as a register variable (which is only possible in
|
||||||
|
some languages, such as "C").
|
||||||
|
Any compiler phase before the code generator may change this score field,
|
||||||
|
if it has reason to do so.
|
||||||
|
The code generator bases its decisions on the information contained
|
||||||
|
in the register message, most notably on the score.
|
||||||
|
.PP
|
||||||
|
If the global optimizer is not used,
|
||||||
|
the score fields are set by the Peephole Optimizer.
|
||||||
|
This optimizer simply counts the number of occurrences
|
||||||
|
of every local (register) variable and adds this count
|
||||||
|
to the score provided by the front end.
|
||||||
|
In this way a simple, yet quite effective
|
||||||
|
register allocation scheme is achieved.
|
||||||
|
.NH 3
|
||||||
|
The role of the Global Optimizer
|
||||||
|
.PP
|
||||||
|
The Global Optimizer essentially tries to improve the scheme
|
||||||
|
outlined above.
|
||||||
|
It uses the following principles for this purpose:
|
||||||
|
.IP -
|
||||||
|
Entities are not always assigned a register for the duration
|
||||||
|
of an entire procedure; smaller regions of the program text
|
||||||
|
may be considered too.
|
||||||
|
.IP -
|
||||||
|
several variables may be put in the same register simultaneously,
|
||||||
|
provided at most one of them is live at any point.
|
||||||
|
.IP -
|
||||||
|
besides local variables, other entities (such as constants and addresses of
|
||||||
|
variables and procedures) may be put in a register.
|
||||||
|
.IP -
|
||||||
|
more accurate cost estimates are used.
|
||||||
|
.LP
|
||||||
|
To perform its task, the optimizer must have some
|
||||||
|
knowledge of the target machine.
|
||||||
|
.NH 3
|
||||||
|
The interface between the register allocator and the code generator
|
||||||
|
.PP
|
||||||
|
The RA phase of the optimizer must somehow be able to express its
|
||||||
|
decisions.
|
||||||
|
Such decisions may look like: 'put constant 1283 in a register from
|
||||||
|
line 12 to line 40'.
|
||||||
|
To be precise, RA must be able to tell the code generator to:
|
||||||
|
.IP -
|
||||||
|
initialize a register with some value
|
||||||
|
.IP -
|
||||||
|
update an entity from a register
|
||||||
|
.IP -
|
||||||
|
replace all occurrences of an entity in a certain region
|
||||||
|
of text by a reference to the register.
|
||||||
|
.LP
|
||||||
|
At least three problems occur here: the code generator is only used to
|
||||||
|
put local variables in registers,
|
||||||
|
it only assigns a register to a variable for the duration of an entire
|
||||||
|
procedure and it is not used to have some earlier compiler phase
|
||||||
|
make all the decisions.
|
||||||
|
.PP
|
||||||
|
All problems are solved by one mechanism, that involves no changes
|
||||||
|
to the code generator.
|
||||||
|
With every (non-scratch) register R that will be used in
|
||||||
|
a procedure P, we associate a new variable T, local to P.
|
||||||
|
The size of T is the same as the size of R.
|
||||||
|
A register message is generated for T with an exceptionally high score.
|
||||||
|
The scores of all original register messages are set to zero.
|
||||||
|
Consequently, the code generator will always assign precisely those new
|
||||||
|
variables to a register.
|
||||||
|
If the optimizer wants to put some entity, say the constant 1283, in
|
||||||
|
a register, it emits the code "T := 1283" and replaces all occurrences
|
||||||
|
of '1283' by T.
|
||||||
|
Similarly, it can put the address of a procedure in T and replace all
|
||||||
|
calls to that procedure by indirect calls.
|
||||||
|
Furthermore, it can put several different entities in T (and thus in R)
|
||||||
|
during the lifetime of P.
|
||||||
|
.PP
|
||||||
|
In principle, the code generated by the optimizer in this way would
|
||||||
|
always be valid EM code, even if the optimizer would be presented
|
||||||
|
a totally wrong description of the target computer register set.
|
||||||
|
In practice, it would be a waste of data as well as text space to
|
||||||
|
allocate memory for these new variables, as they will always be assigned
|
||||||
|
a register (in the correct order of events).
|
||||||
|
Hence, no memory locations are allocated for them.
|
||||||
|
For this reason they are called pseudo local variables.
|
383
doc/ego/ra/ra3
Normal file
383
doc/ego/ra/ra3
Normal file
|
@ -0,0 +1,383 @@
|
||||||
|
.NH 2
|
||||||
|
The register allocation phase
|
||||||
|
.PP
|
||||||
|
.NH 3
|
||||||
|
Overview
|
||||||
|
.PP
|
||||||
|
The RA phase deals with one procedure at a time.
|
||||||
|
For every procedure, it first determines which entities
|
||||||
|
may be put in a register. Such an entity
|
||||||
|
is called an \fIitem\fR.
|
||||||
|
For every item it decides during which parts of the procedure it
|
||||||
|
might be assigned a register.
|
||||||
|
Such a region is called a \fItimespan\fR.
|
||||||
|
For any item, several (possibly overlapping) timespans may
|
||||||
|
be considered.
|
||||||
|
A pair (item,timespan) is called an \fIallocation\fR.
|
||||||
|
If the items of two allocations are both live at some
|
||||||
|
point of time in the intersections of their timespans,
|
||||||
|
these allocations are said to be \fIrivals\fR of each other,
|
||||||
|
as they cannot be assigned the same register.
|
||||||
|
The rivals-set of every allocation is computed.
|
||||||
|
Next, the gains of assigning a register to an allocation are estimated,
|
||||||
|
for every allocation.
|
||||||
|
With all this information, decisions are made which allocations
|
||||||
|
to store in which registers (\fIpacking\fR).
|
||||||
|
Finally, the EM text is transformed to reflect these decisions.
|
||||||
|
.NH 3
|
||||||
|
The item recognition subphase
|
||||||
|
.PP
|
||||||
|
RA tries to put the following entities in a register:
|
||||||
|
.IP -
|
||||||
|
a local variable for which a register message was found
|
||||||
|
.IP -
|
||||||
|
the address of a local variable for which no
|
||||||
|
register message was found
|
||||||
|
.IP -
|
||||||
|
the address of a global variable
|
||||||
|
.IP -
|
||||||
|
the address of a procedure
|
||||||
|
.IP -
|
||||||
|
a numeric constant.
|
||||||
|
.LP
|
||||||
|
Only the \fIaddress\fR of a global variable
|
||||||
|
may be put in a register, not the variable itself.
|
||||||
|
This approach avoids the very complex problems that would be
|
||||||
|
caused by procedure calls and indirect pointer references (see
|
||||||
|
.[~[
|
||||||
|
aho design compiler
|
||||||
|
.] sections 14.7 and 14.8]
|
||||||
|
and
|
||||||
|
.[~[
|
||||||
|
spillman side-effects
|
||||||
|
.]]).
|
||||||
|
Still, on most machines accessing a global variable using indirect
|
||||||
|
addressing through a register is much cheaper than
|
||||||
|
accessing it via its address.
|
||||||
|
Similarly, if the address of a procedure is put in a register, the
|
||||||
|
procedure can be called via an indirect call.
|
||||||
|
.PP
|
||||||
|
With every item we associate a register type.
|
||||||
|
This type is
|
||||||
|
.DS
|
||||||
|
for local variables: the type contained in the register message
|
||||||
|
for addresses of variables and procedures: the pointer type
|
||||||
|
for constants: the general type
|
||||||
|
.DE
|
||||||
|
An entity other than a local variable is not taken to be an item
|
||||||
|
if it is used only once within the current procedure.
|
||||||
|
.PP
|
||||||
|
An item is said to be \fIlive\fR at some point of the program text
|
||||||
|
if its value may be used before it is changed.
|
||||||
|
As addresses and constants are never changed, all items but local
|
||||||
|
variables are always live.
|
||||||
|
The region of text during which a local variable is live is
|
||||||
|
determined via the live/dead messages generated by the
|
||||||
|
Live Variable analysis phase of the Global Optimizer.
|
||||||
|
.NH 3
|
||||||
|
The allocation determination subphase
|
||||||
|
.PP
|
||||||
|
If a procedure has more items than registers,
|
||||||
|
it may be advantageous to put an item in a register
|
||||||
|
only during those parts of the procedure where it is most
|
||||||
|
heavily used.
|
||||||
|
Such a part will be called a timespan.
|
||||||
|
With every item we may associate a set of timespans.
|
||||||
|
If two timespans of an item overlap,
|
||||||
|
at most one of them may be granted a register,
|
||||||
|
as there is no use in putting the same item in two
|
||||||
|
registers simultaneously.
|
||||||
|
If two timespans of an item are distinct,
|
||||||
|
both may be chosen;
|
||||||
|
the item will possibly be put in two
|
||||||
|
different registers during different parts of the procedure.
|
||||||
|
The timespan may also consist
|
||||||
|
of the whole procedure.
|
||||||
|
.PP
|
||||||
|
A list of (item,timespan) pairs (allocations)
|
||||||
|
is build, which will be the input to the decision making
|
||||||
|
subphase of RA (packing subphase).
|
||||||
|
This allocation list is the main data structure of RA.
|
||||||
|
The description of the remainder of RA will be in terms
|
||||||
|
of allocations rather than items.
|
||||||
|
The phrase "to assign a register to an allocation" means "to assign
|
||||||
|
a register to the item of the allocation for the duration of
|
||||||
|
the timespan of the allocation".
|
||||||
|
Subsequent subphases will add more information
|
||||||
|
to this list.
|
||||||
|
.PP
|
||||||
|
Several factors must be taken into account when a
|
||||||
|
timespan for an item is constructed:
|
||||||
|
.IP 1.
|
||||||
|
At any \fIentry point\fR of the timespan where the
|
||||||
|
item is live,
|
||||||
|
the register must be initialized with the item
|
||||||
|
.IP 2.
|
||||||
|
At any exit point of the timespan where the item is live,
|
||||||
|
the item must be updated.
|
||||||
|
.LP
|
||||||
|
In order to decrease these costs, we will only consider timespans with
|
||||||
|
one entry point
|
||||||
|
and no live exit points.
|
||||||
|
.NH 3
|
||||||
|
The rivals computation subphase
|
||||||
|
.PP
|
||||||
|
As stated before, several different items may be put in the
|
||||||
|
same register, provided they are not live simultaneously.
|
||||||
|
For every allocation we determine the intersection
|
||||||
|
of its timespan and the lifetime of its item (i.e. the part of the
|
||||||
|
procedure during which the item is live).
|
||||||
|
The allocation is said to be busy during this intersection.
|
||||||
|
If two allocations are ever busy simultaneously they are
|
||||||
|
said to be rivals of each other.
|
||||||
|
The rivals information is added to the allocation list.
|
||||||
|
.NH 3
|
||||||
|
The profits computation subphase
|
||||||
|
.PP
|
||||||
|
To make good decisions, the packing subphase needs to
|
||||||
|
know which allocations can be assigned the same register
|
||||||
|
(rivals information) and how much is gained by
|
||||||
|
granting an allocation a register.
|
||||||
|
.PP
|
||||||
|
Besides the gains of using a register instead of an
|
||||||
|
item,
|
||||||
|
two kinds of overhead costs must be
|
||||||
|
taken into account:
|
||||||
|
.IP -
|
||||||
|
the register must be initialized with the item
|
||||||
|
.IP -
|
||||||
|
the register must be saved at procedure entry
|
||||||
|
and restored at procedure exit.
|
||||||
|
.LP
|
||||||
|
The latter costs should not be due to a single
|
||||||
|
allocation, as several allocations can be assigned the same register.
|
||||||
|
These costs are dealt with after packing has been done.
|
||||||
|
They do not influence the decisions of the packing algorithm,
|
||||||
|
they may only undo them.
|
||||||
|
.PP
|
||||||
|
The actual profits consist of improvements
|
||||||
|
of execution time and code size.
|
||||||
|
As the former is far more difficult to estimate , we will
|
||||||
|
discuss code size improvements first.
|
||||||
|
.PP
|
||||||
|
The gains of putting a certain item in a register
|
||||||
|
depends on how the item is used.
|
||||||
|
Suppose the item is
|
||||||
|
a pointer variable.
|
||||||
|
On machines that do not have a
|
||||||
|
double-indirect addressing mode,
|
||||||
|
two instructions are needed to dereference the variable
|
||||||
|
if it is not in a register, but only one if it is put in a register.
|
||||||
|
If the variable is not dereferenced, but simply copied, one instruction
|
||||||
|
may be sufficient in both cases.
|
||||||
|
So the gains of putting a pointer variable in a register are higher
|
||||||
|
if the variable is dereferenced often.
|
||||||
|
.PP
|
||||||
|
To make accurate estimates, detailed knowledge of
|
||||||
|
the target machine and of the code generator
|
||||||
|
would be needed.
|
||||||
|
Therefore, a simplification has been made that substantially limits
|
||||||
|
the amount of target machine information that is needed.
|
||||||
|
The estimation of the number of bytes saved does
|
||||||
|
not take into account how an item is used.
|
||||||
|
Rather, an average number is used.
|
||||||
|
So these gains are computed as follows:
|
||||||
|
.DS
|
||||||
|
#bytes_saved = #occurrences * gains_per_occurrence
|
||||||
|
.DE
|
||||||
|
The number of occurrences is derived from
|
||||||
|
the EM code.
|
||||||
|
Note that this is not exact either,
|
||||||
|
as there is no one-to-one correspondence between occurrences in
|
||||||
|
the EM code and in the assembler code.
|
||||||
|
.PP
|
||||||
|
The gains of one occurrence depend on:
|
||||||
|
.IP 1.
|
||||||
|
the type of the item
|
||||||
|
.IP 2.
|
||||||
|
the size of the item
|
||||||
|
.IP 3.
|
||||||
|
the type of the register
|
||||||
|
.LP
|
||||||
|
and for local variables and addresses of local variables:
|
||||||
|
.IP 4.
|
||||||
|
the type of the local variable
|
||||||
|
.IP 5.
|
||||||
|
the offset of the variable in the stackframe
|
||||||
|
.LP
|
||||||
|
For every allocation we try two types of registers: the register type
|
||||||
|
of the item and the general register type.
|
||||||
|
Only the type with the highest profits will subsequently be used.
|
||||||
|
This type is added to the allocation information.
|
||||||
|
.PP
|
||||||
|
To compute the gains, RA uses a machine-dependent table
|
||||||
|
that is read from a machine descriptor file.
|
||||||
|
By means of this table the number of bytes saved can be computed
|
||||||
|
as a function of the five properties.
|
||||||
|
.PP
|
||||||
|
The costs of initializing a register with an item
|
||||||
|
is determined in a similar way.
|
||||||
|
The cost of one initialization is also
|
||||||
|
obtained from the descriptor file.
|
||||||
|
Note that there can be at most one initialization for any
|
||||||
|
allocation.
|
||||||
|
.PP
|
||||||
|
To summarize, the number of bytes a certain allocation would
|
||||||
|
save is computed as follows:
|
||||||
|
.DS
|
||||||
|
net_bytes_saved = bytes_saved - init_cost
|
||||||
|
bytes_saved = #occurrences * gains_per_occ
|
||||||
|
init_cost = #initializations * costs_per_init
|
||||||
|
.DE
|
||||||
|
.PP
|
||||||
|
It is inherently more difficult to estimate the execution
|
||||||
|
time saved by putting an item in a register,
|
||||||
|
because it is impossible to predict how
|
||||||
|
many times an item will be used dynamically.
|
||||||
|
If an occurrence is part of a loop,
|
||||||
|
it may be executed many times.
|
||||||
|
If it is part of a conditional statement,
|
||||||
|
it may never be executed at all.
|
||||||
|
In the latter case, the speed of the program may even get
|
||||||
|
worse if an initialization is needed.
|
||||||
|
As a clear example, consider the piece of "C" code in Fig. 13.1.
|
||||||
|
.DS
|
||||||
|
switch(expr) {
|
||||||
|
case 1: p(); break;
|
||||||
|
case 2: p(); p(); break;
|
||||||
|
case 3: p(); break;
|
||||||
|
default: break;
|
||||||
|
}
|
||||||
|
|
||||||
|
Fig. 13.1 A "C" switch statement
|
||||||
|
.DE
|
||||||
|
Lots of bytes may be saved by putting the address of procedure p
|
||||||
|
in a register, as p is called four times (statically).
|
||||||
|
Dynamically, p will be called zero, one or two times,
|
||||||
|
depending on the value of the expression.
|
||||||
|
.PP
|
||||||
|
The optimizer uses the following strategy for optimizing
|
||||||
|
execution time:
|
||||||
|
.IP 1.
|
||||||
|
try to put items in registers during \fIloops\fR first
|
||||||
|
.IP 2.
|
||||||
|
always keep the initializing code outside the loop
|
||||||
|
.IP 3.
|
||||||
|
if an item is not used in a loop, do not put it in a register if
|
||||||
|
the initialization costs may be higher than the gains
|
||||||
|
.LP
|
||||||
|
The latter condition can be checked by determining the
|
||||||
|
minimal number of usages (dynamically) of the item during the procedure,
|
||||||
|
via a shortest path algorithm.
|
||||||
|
In the example above, this minimal number is zero, so the address of
|
||||||
|
p is not put in a register.
|
||||||
|
.PP
|
||||||
|
The costs of one occurrence is estimated as described above for the
|
||||||
|
code size.
|
||||||
|
The number of dynamic occurrences is guessed by looking at the
|
||||||
|
loop nesting level of every occurrence.
|
||||||
|
If the item is never used in a loop,
|
||||||
|
the minimal number of occurrences is used.
|
||||||
|
From these facts, the execution time improvement is assessed
|
||||||
|
for every allocation.
|
||||||
|
.NH 3
|
||||||
|
The packing subphase
|
||||||
|
.PP
|
||||||
|
The packing subphase takes as input the allocation
|
||||||
|
list and outputs a
|
||||||
|
description of which allocations should be put
|
||||||
|
in which registers.
|
||||||
|
So it is essentially the decision making part of RA.
|
||||||
|
.PP
|
||||||
|
The packing system tries to assign a register to allocations one
|
||||||
|
at a time, in some yet to be defined order.
|
||||||
|
For every allocation A, it first checks if there is a register
|
||||||
|
(of the right type)
|
||||||
|
that is already assigned to one or more allocations,
|
||||||
|
none of which are rivals of A.
|
||||||
|
In this case A is assigned the same register.
|
||||||
|
Else, A is assigned a new register, if one exists.
|
||||||
|
A table containing the number of free registers for every type
|
||||||
|
is maintained.
|
||||||
|
It is initialized with the number of non-scratch registers of
|
||||||
|
the target computer and updated whenever a
|
||||||
|
new register is handed out.
|
||||||
|
The packing algorithm stops when no more allocations can
|
||||||
|
or need be assigned a register.
|
||||||
|
.PP
|
||||||
|
After an allocation A has been packed,
|
||||||
|
all allocations with non-disjunct timespans (including
|
||||||
|
A itself) are removed from the allocation list.
|
||||||
|
.PP
|
||||||
|
In case the number of items exceeds the number of registers, it
|
||||||
|
is important to choose the most profitable allocations.
|
||||||
|
Due to the possibility of having several allocations
|
||||||
|
occupying the same register,
|
||||||
|
this problem is quite complex.
|
||||||
|
Our packing algorithm uses simple heuristic rules
|
||||||
|
and avoids any combinatorial search.
|
||||||
|
It has distinct rules for different costs measures.
|
||||||
|
.PP
|
||||||
|
If object code size is the most important factor,
|
||||||
|
the algorithm is greedy and chooses allocations in
|
||||||
|
decreasing order of their profits attribute.
|
||||||
|
It does not take into account the fact that
|
||||||
|
other allocations may be passed over because of
|
||||||
|
this decision.
|
||||||
|
.PP
|
||||||
|
If execution time is at prime stake, the algorithm
|
||||||
|
first considers allocations whose timespans consist of loops.
|
||||||
|
After all these have been packed, it considers the remaining
|
||||||
|
allocations.
|
||||||
|
Within the two subclasses, it considers allocations
|
||||||
|
with the highest profits first.
|
||||||
|
When assigning a register to an allocation with a loop
|
||||||
|
as timespan, the algorithm checks if the item has
|
||||||
|
already been put in a register during another loop.
|
||||||
|
If so, it tries to use the same register for the
|
||||||
|
new allocation.
|
||||||
|
After all packing has been done,
|
||||||
|
it checks if the item has always been assigned the same
|
||||||
|
register (although not necessarily during all loops).
|
||||||
|
If so, it tries to put the item in that register during
|
||||||
|
the entire procedure. This is possible
|
||||||
|
if the allocation (item,whole_procedure) is not a rival
|
||||||
|
of any allocation with a different item that has been
|
||||||
|
assigned to the same register.
|
||||||
|
Note that this approach is essentially 'bottom up',
|
||||||
|
as registers are first assigned over small regions
|
||||||
|
of text which are later collapsed into larger regions.
|
||||||
|
The advantage of this approach is the fact that
|
||||||
|
the decisions for one loop can be made independently
|
||||||
|
of all other loops.
|
||||||
|
.PP
|
||||||
|
After the entire packing process has been completed,
|
||||||
|
we compute for each register how much is gained in using
|
||||||
|
this register, by simply adding the net profits
|
||||||
|
of all allocations assigned to it.
|
||||||
|
This total yield should outweigh the costs of
|
||||||
|
saving/restoring the register at procedure entry/exit.
|
||||||
|
As most modern processors (e.g. 68000, Vax) have special
|
||||||
|
instructions to save/restore several registers,
|
||||||
|
the differential costs of saving one extra register are by
|
||||||
|
no means constant.
|
||||||
|
The costs are read from the machine descriptor file and
|
||||||
|
compared to the total yields of the registers.
|
||||||
|
As a consequence of this analysis, some allocations
|
||||||
|
may have their registers taken away.
|
||||||
|
.NH 3
|
||||||
|
The transformation subphase
|
||||||
|
.PP
|
||||||
|
The final subphase of RA transforms the EM text according to the
|
||||||
|
decisions made by the packing system.
|
||||||
|
It traverses the text of the currently optimized procedure and
|
||||||
|
changes all occurrences of items at points where
|
||||||
|
they are assigned a register.
|
||||||
|
It also clears the score field of the register messages for
|
||||||
|
normal local variables and emits register messages with a very
|
||||||
|
high score for the pseudo locals.
|
||||||
|
At points where registers have to be initialized with items,
|
||||||
|
it generates EM code to do so.
|
||||||
|
Finally it tries to decrease the size of the stackframe
|
||||||
|
of the procedure by looking at which local variables need not
|
||||||
|
be given memory locations.
|
28
doc/ego/ra/ra4
Normal file
28
doc/ego/ra/ra4
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
.NH 2
|
||||||
|
Source files of RA
|
||||||
|
.PP
|
||||||
|
The sources of RA are in the following files and packages:
|
||||||
|
.IP ra.h: 14
|
||||||
|
declarations of global variables and data structures
|
||||||
|
.IP ra.c:
|
||||||
|
the routine main; initialization of target machine-dependent tables
|
||||||
|
.IP items:
|
||||||
|
a routine to build the list of items of one procedure;
|
||||||
|
routines to manipulate items
|
||||||
|
.IP lifetime:
|
||||||
|
contains a subroutine that determines when items are live/dead
|
||||||
|
.IP alloclist:
|
||||||
|
contains subroutines that build the initial allocations list
|
||||||
|
and that compute the rivals sets.
|
||||||
|
.IP profits:
|
||||||
|
contains a subroutine that computes the profits of the allocations
|
||||||
|
and a routine that determines the costs of saving/restoring registers
|
||||||
|
.IP pack:
|
||||||
|
contains the packing subphase
|
||||||
|
.IP xform:
|
||||||
|
contains the transformation subphase
|
||||||
|
.IP interval:
|
||||||
|
contains routines to manipulate intervals of time
|
||||||
|
.IP aux:
|
||||||
|
contains auxiliary routines
|
||||||
|
.LP
|
171
doc/ego/sp/sp1
Normal file
171
doc/ego/sp/sp1
Normal file
|
@ -0,0 +1,171 @@
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Stack pollution
|
||||||
|
.NH 2
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
The "Stack Pollution" optimization technique (SP) decreases the costs
|
||||||
|
(time as well as space) of procedure calls.
|
||||||
|
In the EM calling sequence, the actual parameters are popped from
|
||||||
|
the stack by the \fIcalling\fR procedure.
|
||||||
|
The ASP (Adjust Stack Pointer) instruction is used for this purpose.
|
||||||
|
A call in EM is shown in Fig. 8.1
|
||||||
|
.DS
|
||||||
|
Pascal: EM:
|
||||||
|
|
||||||
|
f(a,2) LOC 2
|
||||||
|
LOE A
|
||||||
|
CAL F
|
||||||
|
ASP 4 -- pop 4 bytes
|
||||||
|
|
||||||
|
Fig. 8.1 An example procedure call in Pascal and EM
|
||||||
|
.DE
|
||||||
|
As procedure calls occur often in most programs,
|
||||||
|
the ASP is one of the most frequently used EM instructions.
|
||||||
|
.PP
|
||||||
|
The main intention of removing the actual parameters after a procedure call
|
||||||
|
is to avoid the stack size to increase rapidly.
|
||||||
|
Yet, in some cases, it is possible to \fIdelay\fR or even \fIavoid\fR the
|
||||||
|
removal of the parameters without letting the stack grow
|
||||||
|
significantly.
|
||||||
|
In this way, considerable savings in code size and execution time may
|
||||||
|
be achieved, at the cost of a slightly increased stack size.
|
||||||
|
.PP
|
||||||
|
A stack adjustment may be delayed if there is some other stack adjustment
|
||||||
|
later on in the same basic block.
|
||||||
|
The two ASPs can be combined into one.
|
||||||
|
.DS
|
||||||
|
Pascal: EM: optimized EM:
|
||||||
|
|
||||||
|
f(a,2) LOC 2 LOC 2
|
||||||
|
g(3,b,c) LOE A LOE A
|
||||||
|
CAL F CAL F
|
||||||
|
ASP 4 LOE C
|
||||||
|
LOE C LOE B
|
||||||
|
LOE B LOC 3
|
||||||
|
LOC 3 CAL G
|
||||||
|
CAL G ASP 10
|
||||||
|
ASP 6
|
||||||
|
|
||||||
|
Fig. 8.2 An example of local Stack Pollution
|
||||||
|
.DE
|
||||||
|
The stacksize will be increased only temporarily.
|
||||||
|
If the basic block contains another ASP, the ASP 10 may subsequently be
|
||||||
|
combined with that next ASP, and so on.
|
||||||
|
.PP
|
||||||
|
For some back ends, a stack adjustment also takes place
|
||||||
|
at the point of a procedure return.
|
||||||
|
There is no need to specify the number of bytes to be popped at a
|
||||||
|
return.
|
||||||
|
This provides an opportunity to remove ASPs more globally.
|
||||||
|
If all ASPs outside any loop are removed, the increase of the
|
||||||
|
stack size will still only be small, as no such ASP is executed more
|
||||||
|
than once without an intervening return from the procedure it is part of.
|
||||||
|
.PP
|
||||||
|
This second approach is not generally applicable to all target machines,
|
||||||
|
as some back ends require the stack to be cleaned up at the point of
|
||||||
|
a procedure return.
|
||||||
|
.NH 2
|
||||||
|
Implementation
|
||||||
|
.PP
|
||||||
|
There is one main problem the implementation has to solve.
|
||||||
|
In EM, the stack is not only used for passing parameters,
|
||||||
|
but also for evaluating expressions.
|
||||||
|
Hence, ASP instructions can only be combined or removed
|
||||||
|
if certain conditions are satisfied.
|
||||||
|
.PP
|
||||||
|
Two consecutive ASPs of one basic block can only be combined
|
||||||
|
(as described above) if:
|
||||||
|
.IP 1.
|
||||||
|
On no point of text in between the two ASPs, any item is popped from
|
||||||
|
the stack that was pushed onto it before the first ASP.
|
||||||
|
.IP 2.
|
||||||
|
The number of bytes popped from the stack by the second ASP must equal
|
||||||
|
the number of bytes pushed since the first ASP.
|
||||||
|
.LP
|
||||||
|
Condition 1. is not satisfied in Fig. 8.3.
|
||||||
|
.DS
|
||||||
|
Pascal: EM:
|
||||||
|
|
||||||
|
5 + f(10) + g(30) LOC 5
|
||||||
|
LOC 10
|
||||||
|
CAL F
|
||||||
|
ASP 2 -- cannot be removed
|
||||||
|
LFR 2 -- push function result
|
||||||
|
ADI 2
|
||||||
|
LOC 30
|
||||||
|
CAL G
|
||||||
|
ASP 2
|
||||||
|
LFR 2
|
||||||
|
ADI 2
|
||||||
|
Fig. 8.3 An illegal transformation
|
||||||
|
.DE
|
||||||
|
If the first ASP were removed (delayed), the first ADI would add
|
||||||
|
10 and f(10), instead of 5 and f(10).
|
||||||
|
.sp
|
||||||
|
Condition 2. is not satisfied in Fig. 8.4.
|
||||||
|
.DS
|
||||||
|
Pascal: EM:
|
||||||
|
|
||||||
|
f(10) + 5 * g(30) LOC 10
|
||||||
|
CAL F
|
||||||
|
ASP 2
|
||||||
|
LFR 2
|
||||||
|
LOC 5
|
||||||
|
LOC 30
|
||||||
|
CAL G
|
||||||
|
ASP 2
|
||||||
|
LFR 2
|
||||||
|
MLI 2 -- 5 * g(30)
|
||||||
|
ADI 2
|
||||||
|
|
||||||
|
Fig. 8.4 A second illegal transformation
|
||||||
|
.DE
|
||||||
|
If the two ASPs were combined into one 'ASP 4', the constant 5 would
|
||||||
|
have been popped, rather than the parameter 10 (so '10 + f(10)*g(30)'
|
||||||
|
would have been computed).
|
||||||
|
.PP
|
||||||
|
The second approach to deleting ASPs (i.e. let the procedure return
|
||||||
|
do the stack clean-up)
|
||||||
|
is only applied to the last ASP of every basic block.
|
||||||
|
Any preceding ASPs are dealt with by the first approach.
|
||||||
|
The last ASP of a basic block B will only be removed if:
|
||||||
|
.IP -
|
||||||
|
on no path in the control flow graph from B to any block containing a
|
||||||
|
RET (return) there is a basic block that, at some point of its text, pops
|
||||||
|
items from the stack that it has not itself pushed earlier.
|
||||||
|
.LP
|
||||||
|
Clearly, if this condition is satisfied, no harm can be done; no
|
||||||
|
other basic block will ever access items that were pushed
|
||||||
|
on the stack before the ASP.
|
||||||
|
.PP
|
||||||
|
The number of bytes pushed onto or popped from the stack can be
|
||||||
|
easily encoded in a so called "pop-push table".
|
||||||
|
The numbers in general depend on the target machine word- and pointer
|
||||||
|
size and on the argument given to the instruction.
|
||||||
|
For example, an ADS instruction is described by:
|
||||||
|
.DS
|
||||||
|
-a-p+p
|
||||||
|
.DE
|
||||||
|
which means: an 'ADS n' first pops an n-byte value (n being the argument),
|
||||||
|
next pops a pointer-size value and finally pushes a pointer-size value.
|
||||||
|
For some infrequently used EM instructions the pop-push numbers
|
||||||
|
cannot be computed statically.
|
||||||
|
.PP
|
||||||
|
The stack pollution algorithm first performs a depth first search over
|
||||||
|
the control flow graph and marks all blocks that do not satisfy
|
||||||
|
the global condition.
|
||||||
|
Next it visits all basic blocks in turn.
|
||||||
|
For every pair of adjacent ASPs, it checks conditions 1. and 2. and
|
||||||
|
combines the ASPs if they are satisfied.
|
||||||
|
The new ASP may be used as first ASP in the next pair.
|
||||||
|
If a condition fails, it simply continues with the next ASP.
|
||||||
|
Finally, the last ASP is removed if:
|
||||||
|
.IP -
|
||||||
|
nothing has been popped from the stack after the last ASP that was
|
||||||
|
pushed before it
|
||||||
|
.IP -
|
||||||
|
the block was not marked by the depth first search
|
||||||
|
.IP -
|
||||||
|
the block is not in a loop
|
||||||
|
.LP
|
44
doc/ego/sr/sr1
Normal file
44
doc/ego/sr/sr1
Normal file
|
@ -0,0 +1,44 @@
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Strength reduction
|
||||||
|
.NH 2
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
The Strength Reduction optimization technique (SR)
|
||||||
|
tries to replace expensive operators
|
||||||
|
by cheaper ones,
|
||||||
|
in order to decrease the execution time
|
||||||
|
of the program.
|
||||||
|
A classical example is replacing a 'multiplication by 2'
|
||||||
|
by an addition or a shift instruction.
|
||||||
|
These kinds of local transformations are already
|
||||||
|
done by the EM Peephole Optimizer.
|
||||||
|
Strength reduction can also be applied
|
||||||
|
more generally to operators used in a loop.
|
||||||
|
.DS
|
||||||
|
i := 1; i := 1;
|
||||||
|
while i < 100 loop --> TMP := i * 118;
|
||||||
|
put(i * 118); while i < 100 loop
|
||||||
|
i := i + 1; put(TMP);
|
||||||
|
end loop; i := i + 1;
|
||||||
|
TMP := TMP + 118;
|
||||||
|
end loop;
|
||||||
|
|
||||||
|
Fig. 6.1 An example of Strenght Reduction
|
||||||
|
.DE
|
||||||
|
In Fig. 6.1, a multiplication inside a loop is
|
||||||
|
replaced by an addition inside the loop and a multiplication
|
||||||
|
outside the loop.
|
||||||
|
Clearly, this is a global optimization; it cannot
|
||||||
|
be done by a peephole optimizer.
|
||||||
|
.PP
|
||||||
|
In some cases a related technique, \fItest replacement\fR,
|
||||||
|
can be used to eliminate the
|
||||||
|
loop variable i.
|
||||||
|
This technique will not be discussed in this report.
|
||||||
|
.sp 0
|
||||||
|
In the example above, the resulting code
|
||||||
|
can be further optimized by using
|
||||||
|
constant propagation.
|
||||||
|
Obviously, this is not the task of the
|
||||||
|
Strength Reduction phase.
|
217
doc/ego/sr/sr2
Normal file
217
doc/ego/sr/sr2
Normal file
|
@ -0,0 +1,217 @@
|
||||||
|
.NH 2
|
||||||
|
The model of strength reduction
|
||||||
|
.PP
|
||||||
|
In this section we will describe
|
||||||
|
the transformations performed by
|
||||||
|
Strength Reduction (SR).
|
||||||
|
Before doing so, we will introduce the
|
||||||
|
central notion of an induction variable.
|
||||||
|
.NH 3
|
||||||
|
Induction variables
|
||||||
|
.PP
|
||||||
|
SR looks for variables whose
|
||||||
|
values form an arithmetic progression
|
||||||
|
at the beginning of a loop.
|
||||||
|
These variables are called induction variables.
|
||||||
|
The most frequently occurring example of such
|
||||||
|
a variable is a loop-variable in a high-order
|
||||||
|
programming language.
|
||||||
|
Several quite sophisticated models of strength
|
||||||
|
reduction can be found in the literature.
|
||||||
|
.[
|
||||||
|
cocke reduction strength cacm
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
allen cocke kennedy reduction strength
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
lowry medlock cacm
|
||||||
|
.]
|
||||||
|
.[
|
||||||
|
aho compiler design
|
||||||
|
.]
|
||||||
|
In these models the notion of an induction variable
|
||||||
|
is far more general than the intuitive notion
|
||||||
|
of a loop-variable.
|
||||||
|
The definition of an induction variable we present here
|
||||||
|
is more restricted,
|
||||||
|
yielding a simpler model and simpler transformations.
|
||||||
|
We think the principle source for strength reduction lies in
|
||||||
|
expressions using a loop-variable,
|
||||||
|
i.e. a variable that is incremented or decremented
|
||||||
|
by the same amount after every loop iteration,
|
||||||
|
and that cannot be changed in any other way.
|
||||||
|
.PP
|
||||||
|
Of course, the EM code does not contain high level constructs
|
||||||
|
such as for-statements.
|
||||||
|
We will define an induction variable in terms
|
||||||
|
of the Intermediate Code of the optimizer.
|
||||||
|
Note that the notions of a loop in the
|
||||||
|
EM text and of a firm basic block
|
||||||
|
were defined in section 3.3.5.
|
||||||
|
.sp
|
||||||
|
.UL definition
|
||||||
|
.sp 0
|
||||||
|
An induction variable i of a loop L is a local variable
|
||||||
|
that is never accessed indirectly,
|
||||||
|
whose size is the word size of the target machine, and
|
||||||
|
that is assigned exactly once within L,
|
||||||
|
the assignment:
|
||||||
|
.IP -
|
||||||
|
being of the form i := i + c or i := c +i,
|
||||||
|
c is a constant
|
||||||
|
called the \fIstep value\fR of i.
|
||||||
|
.IP -
|
||||||
|
occurring in a firm block of L.
|
||||||
|
.LP
|
||||||
|
(Note that the first restriction on the assignment
|
||||||
|
is not described in terms of the Intermediate Code;
|
||||||
|
we will give such a description later; the current
|
||||||
|
definition is easier to understand however).
|
||||||
|
.NH 3
|
||||||
|
Recognized expressions
|
||||||
|
.PP
|
||||||
|
SR recognizes certain expressions using
|
||||||
|
an induction variable and replaces
|
||||||
|
them by cheaper ones.
|
||||||
|
Two kinds of expensive operations are recognized:
|
||||||
|
multiplication and array address computations.
|
||||||
|
The expressions that are simplified must
|
||||||
|
use an induction variable
|
||||||
|
as an operand of
|
||||||
|
a multiplication or as index in an array expression.
|
||||||
|
.PP
|
||||||
|
Often a linear function of an induction variable is used,
|
||||||
|
rather than the variable itself.
|
||||||
|
In these cases optimization is still possible.
|
||||||
|
We call such expressions \fIiv-expressions\fR.
|
||||||
|
.sp
|
||||||
|
.UL definition:
|
||||||
|
.sp 0
|
||||||
|
An iv-expression of an induction variable i of a loop L is
|
||||||
|
an expression that:
|
||||||
|
.IP -
|
||||||
|
uses only the operators + and - (unary as well as binary)
|
||||||
|
.IP -
|
||||||
|
uses i as operand exactly once
|
||||||
|
.IP -
|
||||||
|
uses (besides i) only constants or variables that are
|
||||||
|
never changed in L as operands.
|
||||||
|
.LP
|
||||||
|
.PP
|
||||||
|
The expressions recognized by SR are of the following forms:
|
||||||
|
.IP (1)
|
||||||
|
iv_expression * constant
|
||||||
|
.IP (2)
|
||||||
|
constant * iv_expression
|
||||||
|
.IP (3)
|
||||||
|
A[iv-expression] := (assign to array element)
|
||||||
|
.IP (4)
|
||||||
|
A[iv-expression] (use array element)
|
||||||
|
.IP (5)
|
||||||
|
& A[iv-expression] (take address of array element)
|
||||||
|
.LP
|
||||||
|
(Note that EM has different instructions to use an array element,
|
||||||
|
store into one, or take the address of one, resp. LAR, SAR, and AAR).
|
||||||
|
.sp 0
|
||||||
|
The size of the elements of A must
|
||||||
|
be known statically.
|
||||||
|
In cases (3) and (4) this size
|
||||||
|
must equal the word size of the
|
||||||
|
target machine.
|
||||||
|
.NH 3
|
||||||
|
Transformations
|
||||||
|
.PP
|
||||||
|
With every recognized expression we associate
|
||||||
|
a new temporary local variable TMP,
|
||||||
|
allocated in the stack frame of the
|
||||||
|
procedure containing the expression.
|
||||||
|
At any program point within the loop, TMP will
|
||||||
|
contain the following value:
|
||||||
|
.IP multiplication: 18
|
||||||
|
the current value of iv-expression * constant
|
||||||
|
.IP arrays:
|
||||||
|
the current value of &A[iv-expression].
|
||||||
|
.LP
|
||||||
|
In the second case, TMP essentially is a pointer variable,
|
||||||
|
pointing to the element of A that is currently in use.
|
||||||
|
.sp 0
|
||||||
|
If the same expression occurs several times in the loop,
|
||||||
|
the same temporary local is used each time.
|
||||||
|
.PP
|
||||||
|
Three transformations are applied to the EM text:
|
||||||
|
.IP (1)
|
||||||
|
TMP is initialized with the right value.
|
||||||
|
This initialization takes place just
|
||||||
|
before the loop.
|
||||||
|
.IP (2)
|
||||||
|
The recognized expression is simplified.
|
||||||
|
.IP (3)
|
||||||
|
TMP is incremented; this takes place just
|
||||||
|
after the induction variable is incremented.
|
||||||
|
.LP
|
||||||
|
For multiplication, the initial value of TMP
|
||||||
|
is the value of the recognized expression at
|
||||||
|
the program point immediately before the loop.
|
||||||
|
For arrays, TMP is initialized with the address
|
||||||
|
of the first array element that is accessed.
|
||||||
|
So the initialization code is:
|
||||||
|
.DS
|
||||||
|
TMP := iv-expression * constant; or
|
||||||
|
TMP := &A[iv-expression]
|
||||||
|
.DE
|
||||||
|
At the point immediately before the loop,
|
||||||
|
the induction variable will already have been
|
||||||
|
initialized,
|
||||||
|
so the value used in the code above will be the
|
||||||
|
value it has during the first iteration.
|
||||||
|
.PP
|
||||||
|
For multiplication, the recognized expression can simply be
|
||||||
|
replaced by TMP.
|
||||||
|
For array optimizations, the replacement
|
||||||
|
depends on the form:
|
||||||
|
.DS
|
||||||
|
\fIform\fR \fIreplacement\fR
|
||||||
|
(3) A[iv-expr] := *TMP := (assign indirect)
|
||||||
|
(4) A[iv-expr] *TMP (use indirect)
|
||||||
|
(5) &A[iv-expr] TMP
|
||||||
|
.DE
|
||||||
|
The '*' denotes the indirect operator. (Note that
|
||||||
|
EM has different instructions to do
|
||||||
|
an assign-indirect and a use-indirect).
|
||||||
|
As the size of the array elements is restricted
|
||||||
|
to be the word size in case (3) and (4),
|
||||||
|
only one EM instruction needs to
|
||||||
|
be generated in all cases.
|
||||||
|
.PP
|
||||||
|
The amount by which TMP is incremented is:
|
||||||
|
.IP multiplication: 18
|
||||||
|
step value * constant
|
||||||
|
.IP arrays:
|
||||||
|
step value * element size
|
||||||
|
.LP
|
||||||
|
Note that the step value (see definition of induction variable above),
|
||||||
|
the constant, and the element size (see previous section) can all
|
||||||
|
be determined statically.
|
||||||
|
If the sign of the induction variable in the
|
||||||
|
iv-expression is negative, the amount
|
||||||
|
must be negated.
|
||||||
|
.PP
|
||||||
|
The transformations are demonstrated by an example.
|
||||||
|
.DS
|
||||||
|
i := 100; i := 100;
|
||||||
|
while i > 1 loop TMP := (6-i) * 5;
|
||||||
|
X := (6-i) * 5 + 2; while i > 1 loop
|
||||||
|
Y := (6-i) * 5 - 8; --> X := TMP + 2;
|
||||||
|
i := i - 3; Y := TMP - 8;
|
||||||
|
end loop; i := i - 3;
|
||||||
|
TMP := TMP + 15;
|
||||||
|
end loop;
|
||||||
|
|
||||||
|
Fig. 6.2 Example of complex Strength Reduction transformations
|
||||||
|
.DE
|
||||||
|
The expression '(6-i)*5' is recognized twice. The constant
|
||||||
|
is 5.
|
||||||
|
The step value is -3.
|
||||||
|
The sign of i in the recognized expression is '-'.
|
||||||
|
So the increment value of TMP is -(-3*5) = +15.
|
232
doc/ego/sr/sr3
Normal file
232
doc/ego/sr/sr3
Normal file
|
@ -0,0 +1,232 @@
|
||||||
|
.NH 2
|
||||||
|
Implementation
|
||||||
|
.PP
|
||||||
|
Like most phases, SR deals with one procedure
|
||||||
|
at a time.
|
||||||
|
Within a procedure, SR works on one loop at a time.
|
||||||
|
Loops are processed in textual order.
|
||||||
|
If loops are nested inside each other,
|
||||||
|
SR starts with the outermost loop and proceeds in the
|
||||||
|
inwards direction.
|
||||||
|
This order is chosen,
|
||||||
|
because it enables the optimization
|
||||||
|
of multi-dimensional array address computations,
|
||||||
|
if the elements are accessed in the usual way
|
||||||
|
(i.e. row after row, rather than column after column).
|
||||||
|
For every loop, SR first detects all induction variables
|
||||||
|
and then tries to recognize
|
||||||
|
expressions that can be optimized.
|
||||||
|
.NH 3
|
||||||
|
Finding induction variables
|
||||||
|
.PP
|
||||||
|
The process of finding induction variables
|
||||||
|
can conveniently be split up
|
||||||
|
into two parts.
|
||||||
|
First, the EM text of the loop is scanned to find
|
||||||
|
all \fIcandidate\fR induction variables,
|
||||||
|
which are word-sized local variables
|
||||||
|
that are assigned precisely once
|
||||||
|
in the loop, within a firm block.
|
||||||
|
Second, for every candidate, the single assignment
|
||||||
|
is inspected, to see if it has the form
|
||||||
|
required by the definition of an induction variable.
|
||||||
|
.PP
|
||||||
|
Candidates are found by scanning the EM code of the loop.
|
||||||
|
During this scan, two sets are maintained.
|
||||||
|
The set "cand" contains all variables that were
|
||||||
|
assigned exactly once so far, within a firm block.
|
||||||
|
The set "dismiss" contains all variables that
|
||||||
|
should not be made a candidate.
|
||||||
|
Initially, both sets are empty.
|
||||||
|
If a variable is assigned to, it is put
|
||||||
|
in the cand set, if three conditions are met:
|
||||||
|
.IP 1.
|
||||||
|
the variable was not in cand or dismiss already
|
||||||
|
.IP 2.
|
||||||
|
the assignment takes place in a firm block
|
||||||
|
.IP 3.
|
||||||
|
the assignment is not a ZRL instruction (assignment by zero)
|
||||||
|
or a SDL instruction (store double local).
|
||||||
|
.LP
|
||||||
|
If any condition fails, the variable is dismissed from cand
|
||||||
|
(if it was there already) and put in dismiss
|
||||||
|
(if it was not there already).
|
||||||
|
.sp 0
|
||||||
|
All variables for which no register message was generated (i.e. those
|
||||||
|
variables that may be accessed indirectly) are assumed
|
||||||
|
to be changed in the loop.
|
||||||
|
.sp 0
|
||||||
|
All variables that remain in cand are candidate induction variables.
|
||||||
|
.PP
|
||||||
|
From the set of candidates, the induction variables can
|
||||||
|
be determined, by inspecting the single assignment.
|
||||||
|
The assignment must match one of the EM patterns below.
|
||||||
|
('x' is the candidate. 'ws' is the word size of the target machine.
|
||||||
|
'n' is any number.)
|
||||||
|
.DS
|
||||||
|
\fIpattern\fR \fIstep size\fR
|
||||||
|
INL x | +1
|
||||||
|
DEL x | -1
|
||||||
|
LOL x ; (INC | DEC) ; STL x | +1 | -1
|
||||||
|
LOL x ; LOC n ; (ADI ws | SBI ws) ; STL x | +n | -n
|
||||||
|
LOC n ; LOL x ; ADI ws ; STL x. +n
|
||||||
|
.DE
|
||||||
|
From the patterns the step size of the induction variable
|
||||||
|
can also be determined.
|
||||||
|
These step sizes are displayed on the right hand side.
|
||||||
|
.sp
|
||||||
|
For every induction variable we maintain the following information:
|
||||||
|
.IP -
|
||||||
|
the offset of the variable in the stackframe of its procedure
|
||||||
|
.IP -
|
||||||
|
a pointer to the EM text of the assignment statement
|
||||||
|
.IP -
|
||||||
|
the step value
|
||||||
|
.LP
|
||||||
|
.NH 3
|
||||||
|
Optimizing expressions
|
||||||
|
.PP
|
||||||
|
If any induction variables of the loop were found,
|
||||||
|
the EM text of the loop is scanned again,
|
||||||
|
to detect expressions that can be optimized.
|
||||||
|
SR scans for multiplication and array instructions.
|
||||||
|
Whenever it finds such an instruction, it analyses the
|
||||||
|
code in front of it.
|
||||||
|
If an expression is to be optimized, it must
|
||||||
|
be generated by the following syntax rules.
|
||||||
|
.DS
|
||||||
|
optimizable_expr:
|
||||||
|
iv_expr const mult |
|
||||||
|
const iv_expr mult |
|
||||||
|
address iv_expr address array_instr;
|
||||||
|
mult:
|
||||||
|
MLI ws |
|
||||||
|
MLU ws ;
|
||||||
|
array_instr:
|
||||||
|
LAR ws |
|
||||||
|
SAR ws |
|
||||||
|
AAR ws ;
|
||||||
|
const:
|
||||||
|
LOC n ;
|
||||||
|
.DE
|
||||||
|
An 'address' is an EM instruction that loads an
|
||||||
|
address on the stack.
|
||||||
|
An instruction like LOL may be an 'address', if
|
||||||
|
the size of an address (pointer size, =ps) is
|
||||||
|
the same as the word size.
|
||||||
|
If the pointer size is twice the word size,
|
||||||
|
instructions like LDL are an 'address'.
|
||||||
|
(The addresses in the third grammar rule
|
||||||
|
denote resp. the array address and the
|
||||||
|
array descriptor address).
|
||||||
|
.DS
|
||||||
|
address:
|
||||||
|
LAE |
|
||||||
|
LAL |
|
||||||
|
LOL if ps=ws |
|
||||||
|
LOE ,, |
|
||||||
|
LIL ,, |
|
||||||
|
LDL if ps=2*ws |
|
||||||
|
LDE ,, ;
|
||||||
|
.DE
|
||||||
|
The notion of an iv-expression was introduced earlier.
|
||||||
|
.DS
|
||||||
|
iv_expr:
|
||||||
|
iv_expr unair_op |
|
||||||
|
iv_expr iv_expr binary_op |
|
||||||
|
loopconst |
|
||||||
|
iv ;
|
||||||
|
unair_op:
|
||||||
|
NGI ws |
|
||||||
|
INC |
|
||||||
|
DEC ;
|
||||||
|
binary_op:
|
||||||
|
ADI ws |
|
||||||
|
ADU ws |
|
||||||
|
SBI ws |
|
||||||
|
SBU ws ;
|
||||||
|
loopconst:
|
||||||
|
const |
|
||||||
|
LOL x if x is not changed in loop ;
|
||||||
|
iv:
|
||||||
|
LOL x if x is an induction variable ;
|
||||||
|
.DE
|
||||||
|
An iv_expression must satisfy one additional constraint:
|
||||||
|
it must use exactly one operand that is an induction
|
||||||
|
variable.
|
||||||
|
A simple, hand written, top-down parser is used
|
||||||
|
to recognize an iv-expression.
|
||||||
|
It scans the EM code from right to left
|
||||||
|
(recall that EM is essentially postfix).
|
||||||
|
It uses semantic attributes (inherited as well as
|
||||||
|
derived) to check the additional constraint.
|
||||||
|
.PP
|
||||||
|
All information assembled during the recognition
|
||||||
|
process is put in a 'code_info' structure.
|
||||||
|
This structure contains the following information:
|
||||||
|
.IP -
|
||||||
|
the optimizable code itself
|
||||||
|
.IP -
|
||||||
|
the loop and basic block the code is part of
|
||||||
|
.IP -
|
||||||
|
the induction variable
|
||||||
|
.IP -
|
||||||
|
the iv-expression
|
||||||
|
.IP -
|
||||||
|
the sign of the induction variable in the
|
||||||
|
iv-expression
|
||||||
|
.IP -
|
||||||
|
the offset and size of the temporary local variable
|
||||||
|
.IP -
|
||||||
|
the expensive operator (MLI, LAR etc.)
|
||||||
|
.IP -
|
||||||
|
the instruction that loads the constant
|
||||||
|
(for multiplication) or the array descriptor
|
||||||
|
(for arrays).
|
||||||
|
.LP
|
||||||
|
The entire transformation process is driven
|
||||||
|
by this information.
|
||||||
|
As the EM text is represented internally
|
||||||
|
as a list, this process consists
|
||||||
|
mainly of straightforward list manipulations.
|
||||||
|
.sp 0
|
||||||
|
The initialization code must be put
|
||||||
|
immediately before the loop entry.
|
||||||
|
For this purpose a \fIheader block\fR is
|
||||||
|
created that has the loop entry block as
|
||||||
|
its only successor and that dominates the
|
||||||
|
entry block.
|
||||||
|
The CFG and all relations (SUCC,PRED, IDOM, LOOPS etc.)
|
||||||
|
are updated.
|
||||||
|
.sp 0
|
||||||
|
An EM instruction that will
|
||||||
|
replace the optimizable code
|
||||||
|
is created and put at the place of the old code.
|
||||||
|
The list representing the old optimizable code
|
||||||
|
is used to create a list for the initializing code,
|
||||||
|
as they are similar.
|
||||||
|
Only two modifications are required:
|
||||||
|
.IP -
|
||||||
|
if the expensive operator is a LAR or SAR,
|
||||||
|
it must be replaced by an AAR, as the initial value
|
||||||
|
of TMP is the \fIaddress\fR of the first
|
||||||
|
array element that is accessed.
|
||||||
|
.IP -
|
||||||
|
code must be appended to store the result of the
|
||||||
|
expression in TMP.
|
||||||
|
.LP
|
||||||
|
Finally, code to increment TMP is created and put after
|
||||||
|
the code of the single assignment to the
|
||||||
|
induction variable.
|
||||||
|
The generated code uses either an integer addition
|
||||||
|
(ADI) or an integer-to-pointer addition (ADS)
|
||||||
|
to do the increment.
|
||||||
|
.PP
|
||||||
|
SR maintains a set of all expressions that have already
|
||||||
|
been recognized in the present loop.
|
||||||
|
Such expressions are said to be \fIavailable\fR.
|
||||||
|
If an expression is recognized that is
|
||||||
|
already available,
|
||||||
|
no new temporary local variable is allocated for it,
|
||||||
|
and the code to initialize and increment the local
|
||||||
|
is not generated.
|
28
doc/ego/sr/sr4
Normal file
28
doc/ego/sr/sr4
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
.NH 2
|
||||||
|
Source files of SR
|
||||||
|
.PP
|
||||||
|
The sources of SR are in the following files
|
||||||
|
and packages:
|
||||||
|
.IP sr.h: 14
|
||||||
|
declarations of global variables and
|
||||||
|
data structures
|
||||||
|
.IP sr.c:
|
||||||
|
the routine main; a driving routine to process
|
||||||
|
(possibly nested) loops in the right order
|
||||||
|
.IP iv
|
||||||
|
implements a procedure that finds the induction variables
|
||||||
|
of a loop
|
||||||
|
.IP reduce
|
||||||
|
implements a procedure that finds optimizable expressions
|
||||||
|
and that does the transformations
|
||||||
|
.IP cand
|
||||||
|
implements a procedure that finds the candidate induction
|
||||||
|
variables; used to implement iv
|
||||||
|
.IP xform
|
||||||
|
implements several useful routines that transform
|
||||||
|
lists of EM text or a CFG; used to implement reduce
|
||||||
|
.IP expr
|
||||||
|
implements a procedure that parses iv-expressions
|
||||||
|
.IP aux
|
||||||
|
implements several auxiliary procedures.
|
||||||
|
.LP
|
58
doc/ego/ud/ud1
Normal file
58
doc/ego/ud/ud1
Normal file
|
@ -0,0 +1,58 @@
|
||||||
|
.bp
|
||||||
|
.NH 1
|
||||||
|
Use-Definition analysis
|
||||||
|
.NH 2
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
The "Use-Definition analysis" phase (UD) consists of two related optimization
|
||||||
|
techniques that both depend on "Use-Definition" information.
|
||||||
|
The techniques are Copy Propagation and Constant Propagation.
|
||||||
|
They are best explained via an example (see Figs. 11.1 and 11.2).
|
||||||
|
.DS
|
||||||
|
(1) A := B A := B
|
||||||
|
... --> ...
|
||||||
|
(2) use(A) use(B)
|
||||||
|
|
||||||
|
Fig. 11.1 An example of Copy Propagation
|
||||||
|
.DE
|
||||||
|
.DS
|
||||||
|
(1) A := 12 A := 12
|
||||||
|
... --> ...
|
||||||
|
(2) use(A) use(12)
|
||||||
|
|
||||||
|
Fig. 11.2 An example of Constant Propagation
|
||||||
|
.DE
|
||||||
|
Both optimizations have to check that the value of A at line (2)
|
||||||
|
can only be obtained at line (1).
|
||||||
|
Copy Propagation also has to assure that the value of B is
|
||||||
|
the same at line (1) as at line (2).
|
||||||
|
.PP
|
||||||
|
One purpose of both transformations is to introduce
|
||||||
|
opportunities for the Dead Code Elimination optimization.
|
||||||
|
If the variable A is used nowhere else, the assignment A := B
|
||||||
|
becomes useless and can be eliminated.
|
||||||
|
.sp 0
|
||||||
|
If B is less expensive to access than A (e.g. this is sometimes the case
|
||||||
|
if A is a local variable and B is a global variable),
|
||||||
|
Copy Propagation directly improves the code itself.
|
||||||
|
If A is cheaper to access the transformation will not be performed.
|
||||||
|
Likewise, a constant as operand may be cheeper than a variable.
|
||||||
|
Having a constant as operand may also facilitate other optimizations.
|
||||||
|
.PP
|
||||||
|
The design of UD is based on the theory described in section
|
||||||
|
14.1 and 14.3 of.
|
||||||
|
.[
|
||||||
|
aho compiler design
|
||||||
|
.]
|
||||||
|
As a main departure from that theory,
|
||||||
|
we do not demand the statement A := B to become redundant after
|
||||||
|
Copy Propagation.
|
||||||
|
If B is cheaper to access than A, the optimization is always performed;
|
||||||
|
if B is more expensive than A, we never do the transformation.
|
||||||
|
If A and B are equally expensive UD uses the heuristic rule to
|
||||||
|
replace infrequently used variables by frequently used ones.
|
||||||
|
This rule increases the chances of the assignment to become useless.
|
||||||
|
.PP
|
||||||
|
In the next section we will give a brief outline of the data
|
||||||
|
flow theory used
|
||||||
|
for the implementation of UD.
|
64
doc/ego/ud/ud2
Normal file
64
doc/ego/ud/ud2
Normal file
|
@ -0,0 +1,64 @@
|
||||||
|
.NH 2
|
||||||
|
Data flow information
|
||||||
|
.PP
|
||||||
|
.NH 3
|
||||||
|
Use-Definition information
|
||||||
|
.PP
|
||||||
|
A \fIdefinition\fR of a variable A is an assignment to A.
|
||||||
|
A definition is said to \fIreach\fR a point p if there is a
|
||||||
|
path in the control flow graph from the definition to p, such that
|
||||||
|
A is not redefined on that path.
|
||||||
|
.PP
|
||||||
|
For every basic block B, we define the following sets:
|
||||||
|
.IP GEN[b] 9
|
||||||
|
the set of definitions in b that reach the end of b.
|
||||||
|
.IP KILL[b]
|
||||||
|
the set of definitions outside b that define a variable that
|
||||||
|
is changed in b.
|
||||||
|
.IP IN[b]
|
||||||
|
the set of all definitions reaching the beginning of b.
|
||||||
|
.IP OUT[b]
|
||||||
|
the set of all definitions reaching the end of b.
|
||||||
|
.LP
|
||||||
|
GEN and KILL can be determined by inspecting the code of the procedure.
|
||||||
|
IN and OUT are computed by solving the following data flow equations:
|
||||||
|
.DS
|
||||||
|
(1) OUT[b] = IN[b] - KILL[b] + GEN[b]
|
||||||
|
(2) IN[b] = OUT[p1] + ... + OUT[pn],
|
||||||
|
where PRED(b) = {p1, ... , pn}
|
||||||
|
.DE
|
||||||
|
.NH 3
|
||||||
|
Copy information
|
||||||
|
.PP
|
||||||
|
A \fIcopy\fR is a definition of the form "A := B".
|
||||||
|
A copy is said to be \fIgenerated\fR in a basic block n if
|
||||||
|
it occurs in n and there is no subsequent assignment to B in n.
|
||||||
|
A copy is said to be \fIkilled\fR in n if:
|
||||||
|
.IP (i)
|
||||||
|
it occurs in n and there is a subsequent assignment to B within n, or
|
||||||
|
.IP (ii)
|
||||||
|
it occurs outside n, the definition A := B reaches the beginning of n
|
||||||
|
and B is changed in n (note that a copy also is a definition).
|
||||||
|
.LP
|
||||||
|
A copy \fIreaches\fR a point p, if there are no assignments to B
|
||||||
|
on any path in the control flow graph from the copy to p.
|
||||||
|
.PP
|
||||||
|
We define the following sets:
|
||||||
|
.IP C_GEN[b] 11
|
||||||
|
the set of all copies in b generated in b.
|
||||||
|
.IP C_KILL[b]
|
||||||
|
the set of all copies killed in b.
|
||||||
|
.IP C_IN[b]
|
||||||
|
the set of all copies reaching the beginning of b.
|
||||||
|
.IP C_OUT[b]
|
||||||
|
the set of all copies reaching the end of b.
|
||||||
|
.LP
|
||||||
|
C_IN and C_OUT are computed by solving the following equations:
|
||||||
|
(root is the entry node of the current procedure; '*' denotes
|
||||||
|
set intersection)
|
||||||
|
.DS
|
||||||
|
(1) C_OUT[b] = C_IN[b] - C_KILL[b] + C_GEN[b]
|
||||||
|
(2) C_IN[b] = C_OUT[p1] * ... * C_OUT[pn],
|
||||||
|
where PRED(b) = {p1, ... , pn} and b /= root
|
||||||
|
C_IN[root] = {all copies}
|
||||||
|
.DE
|
26
doc/ego/ud/ud3
Normal file
26
doc/ego/ud/ud3
Normal file
|
@ -0,0 +1,26 @@
|
||||||
|
.NH 2
|
||||||
|
Pointers and subroutine calls
|
||||||
|
.PP
|
||||||
|
The theory outlined above assumes that variables can
|
||||||
|
only be changed by a direct assignment.
|
||||||
|
This condition does not hold for EM.
|
||||||
|
In case of an assignment through a pointer variable,
|
||||||
|
it is in general impossible to see which variable is affected
|
||||||
|
by the assignment.
|
||||||
|
Similar problems occur in the presence of procedure calls.
|
||||||
|
Therefore we distinguish two kinds of definitions:
|
||||||
|
.IP -
|
||||||
|
an \fIexplicit\fR definition is a direct assignment to one
|
||||||
|
specific variable
|
||||||
|
.IP -
|
||||||
|
an \fIimplicit\fR definition is the potential alteration of
|
||||||
|
a variable as a result of a procedure call or an indirect assignment.
|
||||||
|
.LP
|
||||||
|
An indirect assignment causes implicit definitions to
|
||||||
|
all variables that may be accessed indirectly, i.e.
|
||||||
|
all local variables for which no register message was generated
|
||||||
|
and all global variables.
|
||||||
|
If a procedure contains an indirect assignment it may change the
|
||||||
|
same set of variables, else it may change some global variables directly.
|
||||||
|
The KILL, GEN, IN and OUT sets contain explicit as well
|
||||||
|
as implicit definitions.
|
78
doc/ego/ud/ud4
Normal file
78
doc/ego/ud/ud4
Normal file
|
@ -0,0 +1,78 @@
|
||||||
|
.NH 2
|
||||||
|
Implementation
|
||||||
|
.PP
|
||||||
|
UD first builds a number of tables:
|
||||||
|
.IP locals: 9
|
||||||
|
contains information about the local variables of the
|
||||||
|
current procedure (offset,size,whether a register message was found
|
||||||
|
for it and, if so, the score field of that message)
|
||||||
|
.IP defs:
|
||||||
|
a table of all explicit definitions appearing in the
|
||||||
|
current procedure.
|
||||||
|
.IP copies:
|
||||||
|
a table of all copies appearing in the
|
||||||
|
current procedure.
|
||||||
|
.LP
|
||||||
|
Every variable (local as well as global), definition and copy
|
||||||
|
is identified by a unique number, which is the index
|
||||||
|
in the table.
|
||||||
|
All tables are constructed by traversing the EM code.
|
||||||
|
A fourth table, "vardefs" is used, indexed by a 'variable number',
|
||||||
|
which contains for every variable the set of explicit definitions of it.
|
||||||
|
Also, for each basic block b, the set CHGVARS containing all variables
|
||||||
|
changed by it is computed.
|
||||||
|
.PP
|
||||||
|
The GEN sets are obtained in one scan over the EM text,
|
||||||
|
by analyzing every EM instruction.
|
||||||
|
The KILL set of a basic block b is computed by looking at the
|
||||||
|
set of variables
|
||||||
|
changed by b (i.e. CHGVARS[b]).
|
||||||
|
For every such variable v, all explicit definitions to v
|
||||||
|
(i.e. vardefs[v]) that are not in GEN[b] are added to KILL[b].
|
||||||
|
Also, the implicit defininition of v is added to KILL[b].
|
||||||
|
Next, the data flow equations for use-definition information
|
||||||
|
are solved,
|
||||||
|
using a straight forward, iterative algorithm.
|
||||||
|
All sets are represented as bitvectors, so the operations
|
||||||
|
on sets (union, difference) can be implemented efficiently.
|
||||||
|
.PP
|
||||||
|
The C_GEN and C_KILL sets are computed simultaneously in one scan
|
||||||
|
over the EM text.
|
||||||
|
For every copy A := B appearing in basic block b we do
|
||||||
|
the following:
|
||||||
|
.IP 1.
|
||||||
|
for every basic block n /= b that changes B, see if the definition A := B
|
||||||
|
reaches the beginning of n (i.e. check if the index number of A := B in
|
||||||
|
the "defs" table is an element of IN[n]);
|
||||||
|
if so, add the copy to C_KILL[n]
|
||||||
|
.IP 2.
|
||||||
|
if B is redefined later on in b, add the copy to C_KILL[b], else
|
||||||
|
add it to C_GEN[b]
|
||||||
|
.LP
|
||||||
|
C_IN and C_OUT are computed from C_GEN and C_KILL via the second set of
|
||||||
|
data flow equations.
|
||||||
|
.PP
|
||||||
|
Finally, in one last scan all opportunities for optimization are
|
||||||
|
detected.
|
||||||
|
For every use u of a variable A, we check if
|
||||||
|
there is a unique explicit definition d reaching u.
|
||||||
|
.sp
|
||||||
|
If the definition is a copy A := B and B has the same value at d as
|
||||||
|
at u, then the use of A at u may be changed into B.
|
||||||
|
The latter condition can be verified as follows:
|
||||||
|
.IP -
|
||||||
|
if u and d are in the same basic block, see if there is
|
||||||
|
any assignment to B in between d and u
|
||||||
|
.IP -
|
||||||
|
if u and d are in different basic blocks, the condition is
|
||||||
|
satisfied if there is no assignment to B in the block of u prior to u
|
||||||
|
and d is in C_IN[b].
|
||||||
|
.LP
|
||||||
|
Before the transformation is actually done, UD first makes sure the
|
||||||
|
alteration is really desirable, as described before.
|
||||||
|
The information needed for this purpose (access costs of local and
|
||||||
|
global variables) is read from a machine descriptor file.
|
||||||
|
.sp
|
||||||
|
If the only definition reaching u has the form "A := constant", the use
|
||||||
|
of A at u is replaced by the constant.
|
||||||
|
|
19
doc/ego/ud/ud5
Normal file
19
doc/ego/ud/ud5
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
|
||||||
|
.NH 2
|
||||||
|
Source files of UD
|
||||||
|
.PP
|
||||||
|
The sources of UD are in the following files and packages:
|
||||||
|
.IP ud.h: 14
|
||||||
|
declarations of global variables and data structures
|
||||||
|
.IP ud.c:
|
||||||
|
the routine main; initialization of target machine dependent tables
|
||||||
|
.IP defs:
|
||||||
|
routines to compute the GEN and KILL sets and routines to analyse
|
||||||
|
EM instructions
|
||||||
|
.IP const:
|
||||||
|
routines involved in constant propagation
|
||||||
|
.IP copy:
|
||||||
|
routines involved in copy propagation
|
||||||
|
.IP aux:
|
||||||
|
contains auxiliary routines
|
||||||
|
.LP
|
Loading…
Add table
Reference in a new issue