386 lines
14 KiB
Text
386 lines
14 KiB
Text
.NH 2
|
|
The register allocation phase
|
|
.PP
|
|
.NH 3
|
|
Overview
|
|
.PP
|
|
The RA phase deals with one procedure at a time.
|
|
For every procedure, it first determines which entities
|
|
may be put in a register. Such an entity
|
|
is called an \fIitem\fR.
|
|
For every item it decides during which parts of the procedure it
|
|
might be assigned a register.
|
|
Such a region is called a \fItimespan\fR.
|
|
For any item, several (possibly overlapping) timespans may
|
|
be considered.
|
|
A pair (item,timespan) is called an \fIallocation\fR.
|
|
If the items of two allocations are both live at some
|
|
point of time in the intersections of their timespans,
|
|
these allocations are said to be \fIrivals\fR of each other,
|
|
as they cannot be assigned the same register.
|
|
The rivals-set of every allocation is computed.
|
|
Next, the gains of assigning a register to an allocation are estimated,
|
|
for every allocation.
|
|
With all this information, decisions are made which allocations
|
|
to store in which registers (\fIpacking\fR).
|
|
Finally, the EM text is transformed to reflect these decisions.
|
|
.NH 3
|
|
The item recognition subphase
|
|
.PP
|
|
RA tries to put the following entities in a register:
|
|
.IP -
|
|
a local variable for which a register message was found
|
|
.IP -
|
|
the address of a local variable for which no
|
|
register message was found
|
|
.IP -
|
|
the address of a global variable
|
|
.IP -
|
|
the address of a procedure
|
|
.IP -
|
|
a numeric constant.
|
|
.LP
|
|
Only the \fIaddress\fR of a global variable
|
|
may be put in a register, not the variable itself.
|
|
This approach avoids the very complex problems that would be
|
|
caused by procedure calls and indirect pointer references (see
|
|
.[~[
|
|
aho design compiler
|
|
.] sections 14.7 and 14.8]
|
|
and
|
|
.[~[
|
|
spillman side-effects
|
|
.]]).
|
|
Still, on most machines accessing a global variable using indirect
|
|
addressing through a register is much cheaper than
|
|
accessing it via its address.
|
|
Similarly, if the address of a procedure is put in a register, the
|
|
procedure can be called via an indirect call.
|
|
.PP
|
|
With every item we associate a register type.
|
|
This type is
|
|
.DS
|
|
for local variables: the type contained in the register message
|
|
for addresses of variables and procedures: the pointer type
|
|
for constants: the general type
|
|
.DE
|
|
An entity other than a local variable is not taken to be an item
|
|
if it is used only once within the current procedure.
|
|
.PP
|
|
An item is said to be \fIlive\fR at some point of the program text
|
|
if its value may be used before it is changed.
|
|
As addresses and constants are never changed, all items but local
|
|
variables are always live.
|
|
The region of text during which a local variable is live is
|
|
determined via the live/dead messages generated by the
|
|
Live Variable analysis phase of the Global Optimizer.
|
|
.NH 3
|
|
The allocation determination subphase
|
|
.PP
|
|
If a procedure has more items than registers,
|
|
it may be advantageous to put an item in a register
|
|
only during those parts of the procedure where it is most
|
|
heavily used.
|
|
Such a part will be called a timespan.
|
|
With every item we may associate a set of timespans.
|
|
If two timespans of an item overlap,
|
|
at most one of them may be granted a register,
|
|
as there is no use in putting the same item in two
|
|
registers simultaneously.
|
|
If two timespans of an item are distinct,
|
|
both may be chosen;
|
|
the item will possibly be put in two
|
|
different registers during different parts of the procedure.
|
|
The timespan may also consist
|
|
of the whole procedure.
|
|
.PP
|
|
A list of (item,timespan) pairs (allocations)
|
|
is build, which will be the input to the decision making
|
|
subphase of RA (packing subphase).
|
|
This allocation list is the main data structure of RA.
|
|
The description of the remainder of RA will be in terms
|
|
of allocations rather than items.
|
|
The phrase "to assign a register to an allocation" means "to assign
|
|
a register to the item of the allocation for the duration of
|
|
the timespan of the allocation".
|
|
Subsequent subphases will add more information
|
|
to this list.
|
|
.PP
|
|
Several factors must be taken into account when a
|
|
timespan for an item is constructed:
|
|
.IP 1.
|
|
At any \fIentry point\fR of the timespan where the
|
|
item is live,
|
|
the register must be initialized with the item
|
|
.IP 2.
|
|
At any exit point of the timespan where the item is live,
|
|
the item must be updated.
|
|
.LP
|
|
In order to decrease these costs, we will only consider timespans with
|
|
one entry point
|
|
and no live exit points.
|
|
.NH 3
|
|
The rivals computation subphase
|
|
.PP
|
|
As stated before, several different items may be put in the
|
|
same register, provided they are not live simultaneously.
|
|
For every allocation we determine the intersection
|
|
of its timespan and the lifetime of its item (i.e. the part of the
|
|
procedure during which the item is live).
|
|
The allocation is said to be busy during this intersection.
|
|
If two allocations are ever busy simultaneously they are
|
|
said to be rivals of each other.
|
|
The rivals information is added to the allocation list.
|
|
.NH 3
|
|
The profits computation subphase
|
|
.PP
|
|
To make good decisions, the packing subphase needs to
|
|
know which allocations can be assigned the same register
|
|
(rivals information) and how much is gained by
|
|
granting an allocation a register.
|
|
.PP
|
|
Besides the gains of using a register instead of an
|
|
item,
|
|
two kinds of overhead costs must be
|
|
taken into account:
|
|
.IP -
|
|
the register must be initialized with the item
|
|
.IP -
|
|
the register must be saved at procedure entry
|
|
and restored at procedure exit.
|
|
.LP
|
|
The latter costs should not be due to a single
|
|
allocation, as several allocations can be assigned the same register.
|
|
These costs are dealt with after packing has been done.
|
|
They do not influence the decisions of the packing algorithm,
|
|
they may only undo them.
|
|
.PP
|
|
The actual profits consist of improvements
|
|
of execution time and code size.
|
|
As the former is far more difficult to estimate , we will
|
|
discuss code size improvements first.
|
|
.PP
|
|
The gains of putting a certain item in a register
|
|
depends on how the item is used.
|
|
Suppose the item is
|
|
a pointer variable.
|
|
On machines that do not have a
|
|
double-indirect addressing mode,
|
|
two instructions are needed to dereference the variable
|
|
if it is not in a register, but only one if it is put in a register.
|
|
If the variable is not dereferenced, but simply copied, one instruction
|
|
may be sufficient in both cases.
|
|
So the gains of putting a pointer variable in a register are higher
|
|
if the variable is dereferenced often.
|
|
.PP
|
|
To make accurate estimates, detailed knowledge of
|
|
the target machine and of the code generator
|
|
would be needed.
|
|
Therefore, a simplification has been made that substantially limits
|
|
the amount of target machine information that is needed.
|
|
The estimation of the number of bytes saved does
|
|
not take into account how an item is used.
|
|
Rather, an average number is used.
|
|
So these gains are computed as follows:
|
|
.DS
|
|
#bytes_saved = #occurrences * gains_per_occurrence
|
|
.DE
|
|
The number of occurrences is derived from
|
|
the EM code.
|
|
Note that this is not exact either,
|
|
as there is no one-to-one correspondence between occurrences in
|
|
the EM code and in the assembler code.
|
|
.PP
|
|
The gains of one occurrence depend on:
|
|
.IP 1.
|
|
the type of the item
|
|
.IP 2.
|
|
the size of the item
|
|
.IP 3.
|
|
the type of the register
|
|
.LP
|
|
and for local variables and addresses of local variables:
|
|
.IP 4.
|
|
the type of the local variable
|
|
.IP 5.
|
|
the offset of the variable in the stackframe
|
|
.LP
|
|
For every allocation we try two types of registers: the register type
|
|
of the item and the general register type.
|
|
Only the type with the highest profits will subsequently be used.
|
|
This type is added to the allocation information.
|
|
.PP
|
|
To compute the gains, RA uses a machine-dependent table
|
|
that is read from a machine descriptor file.
|
|
By means of this table the number of bytes saved can be computed
|
|
as a function of the five properties.
|
|
.PP
|
|
The costs of initializing a register with an item
|
|
is determined in a similar way.
|
|
The cost of one initialization is also
|
|
obtained from the descriptor file.
|
|
Note that there can be at most one initialization for any
|
|
allocation.
|
|
.PP
|
|
To summarize, the number of bytes a certain allocation would
|
|
save is computed as follows:
|
|
.DS
|
|
.TS
|
|
l l.
|
|
net_bytes_saved = bytes_saved - init_cost
|
|
bytes_saved = #occurrences * gains_per_occ
|
|
init_cost = #initializations * costs_per_init
|
|
.TE
|
|
.DE
|
|
.PP
|
|
It is inherently more difficult to estimate the execution
|
|
time saved by putting an item in a register,
|
|
because it is impossible to predict how
|
|
many times an item will be used dynamically.
|
|
If an occurrence is part of a loop,
|
|
it may be executed many times.
|
|
If it is part of a conditional statement,
|
|
it may never be executed at all.
|
|
In the latter case, the speed of the program may even get
|
|
worse if an initialization is needed.
|
|
As a clear example, consider the piece of "C" code in Fig. 13.1.
|
|
.DS
|
|
switch(expr) {
|
|
case 1: p(); break;
|
|
case 2: p(); p(); break;
|
|
case 3: p(); break;
|
|
default: break;
|
|
}
|
|
|
|
Fig. 13.1 A "C" switch statement
|
|
.DE
|
|
Lots of bytes may be saved by putting the address of procedure p
|
|
in a register, as p is called four times (statically).
|
|
Dynamically, p will be called zero, one or two times,
|
|
depending on the value of the expression.
|
|
.PP
|
|
The optimizer uses the following strategy for optimizing
|
|
execution time:
|
|
.IP 1.
|
|
try to put items in registers during \fIloops\fR first
|
|
.IP 2.
|
|
always keep the initializing code outside the loop
|
|
.IP 3.
|
|
if an item is not used in a loop, do not put it in a register if
|
|
the initialization costs may be higher than the gains
|
|
.LP
|
|
The latter condition can be checked by determining the
|
|
minimal number of usages (dynamically) of the item during the procedure,
|
|
via a shortest path algorithm.
|
|
In the example above, this minimal number is zero, so the address of
|
|
p is not put in a register.
|
|
.PP
|
|
The costs of one occurrence is estimated as described above for the
|
|
code size.
|
|
The number of dynamic occurrences is guessed by looking at the
|
|
loop nesting level of every occurrence.
|
|
If the item is never used in a loop,
|
|
the minimal number of occurrences is used.
|
|
From these facts, the execution time improvement is assessed
|
|
for every allocation.
|
|
.NH 3
|
|
The packing subphase
|
|
.PP
|
|
The packing subphase takes as input the allocation
|
|
list and outputs a
|
|
description of which allocations should be put
|
|
in which registers.
|
|
So it is essentially the decision making part of RA.
|
|
.PP
|
|
The packing system tries to assign a register to allocations one
|
|
at a time, in some yet to be defined order.
|
|
For every allocation A, it first checks if there is a register
|
|
(of the right type)
|
|
that is already assigned to one or more allocations,
|
|
none of which are rivals of A.
|
|
In this case A is assigned the same register.
|
|
Else, A is assigned a new register, if one exists.
|
|
A table containing the number of free registers for every type
|
|
is maintained.
|
|
It is initialized with the number of non-scratch registers of
|
|
the target computer and updated whenever a
|
|
new register is handed out.
|
|
The packing algorithm stops when no more allocations can
|
|
or need be assigned a register.
|
|
.PP
|
|
After an allocation A has been packed,
|
|
all allocations with non-disjunct timespans (including
|
|
A itself) are removed from the allocation list.
|
|
.PP
|
|
In case the number of items exceeds the number of registers, it
|
|
is important to choose the most profitable allocations.
|
|
Due to the possibility of having several allocations
|
|
occupying the same register,
|
|
this problem is quite complex.
|
|
Our packing algorithm uses simple heuristic rules
|
|
and avoids any combinatorial search.
|
|
It has distinct rules for different costs measures.
|
|
.PP
|
|
If object code size is the most important factor,
|
|
the algorithm is greedy and chooses allocations in
|
|
decreasing order of their profits attribute.
|
|
It does not take into account the fact that
|
|
other allocations may be passed over because of
|
|
this decision.
|
|
.PP
|
|
If execution time is at prime stake, the algorithm
|
|
first considers allocations whose timespans consist of loops.
|
|
After all these have been packed, it considers the remaining
|
|
allocations.
|
|
Within the two subclasses, it considers allocations
|
|
with the highest profits first.
|
|
When assigning a register to an allocation with a loop
|
|
as timespan, the algorithm checks if the item has
|
|
already been put in a register during another loop.
|
|
If so, it tries to use the same register for the
|
|
new allocation.
|
|
After all packing has been done,
|
|
it checks if the item has always been assigned the same
|
|
register (although not necessarily during all loops).
|
|
If so, it tries to put the item in that register during
|
|
the entire procedure. This is possible
|
|
if the allocation (item,whole_procedure) is not a rival
|
|
of any allocation with a different item that has been
|
|
assigned to the same register.
|
|
Note that this approach is essentially 'bottom up',
|
|
as registers are first assigned over small regions
|
|
of text which are later collapsed into larger regions.
|
|
The advantage of this approach is the fact that
|
|
the decisions for one loop can be made independently
|
|
of all other loops.
|
|
.PP
|
|
After the entire packing process has been completed,
|
|
we compute for each register how much is gained in using
|
|
this register, by simply adding the net profits
|
|
of all allocations assigned to it.
|
|
This total yield should outweigh the costs of
|
|
saving/restoring the register at procedure entry/exit.
|
|
As most modern processors (e.g. 68000, Vax) have special
|
|
instructions to save/restore several registers,
|
|
the differential costs of saving one extra register are by
|
|
no means constant.
|
|
The costs are read from the machine descriptor file and
|
|
compared to the total yields of the registers.
|
|
As a consequence of this analysis, some allocations
|
|
may have their registers taken away.
|
|
.NH 3
|
|
The transformation subphase
|
|
.PP
|
|
The final subphase of RA transforms the EM text according to the
|
|
decisions made by the packing system.
|
|
It traverses the text of the currently optimized procedure and
|
|
changes all occurrences of items at points where
|
|
they are assigned a register.
|
|
It also clears the score field of the register messages for
|
|
normal local variables and emits register messages with a very
|
|
high score for the pseudo locals.
|
|
At points where registers have to be initialized with items,
|
|
it generates EM code to do so.
|
|
Finally it tries to decrease the size of the stackframe
|
|
of the procedure by looking at which local variables need not
|
|
be given memory locations.
|