383 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			383 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.NH 2
 | 
						|
The register allocation phase
 | 
						|
.PP
 | 
						|
.NH 3
 | 
						|
Overview
 | 
						|
.PP
 | 
						|
The RA phase deals with one procedure at a time.
 | 
						|
For every procedure, it first determines which entities
 | 
						|
may be put in a register. Such an entity
 | 
						|
is called an \fIitem\fR.
 | 
						|
For every item it decides during which parts of the procedure it
 | 
						|
might be assigned a register.
 | 
						|
Such a region is called a \fItimespan\fR.
 | 
						|
For any item, several (possibly overlapping) timespans may
 | 
						|
be considered.
 | 
						|
A pair (item,timespan) is called an \fIallocation\fR.
 | 
						|
If the items of two allocations are both live at some
 | 
						|
point of time in the intersections of their timespans,
 | 
						|
these allocations are said to be \fIrivals\fR of each other,
 | 
						|
as they cannot be assigned the same register.
 | 
						|
The rivals-set of every allocation is computed.
 | 
						|
Next, the gains of assigning a register to an allocation are estimated,
 | 
						|
for every allocation.
 | 
						|
With all this information, decisions are made which allocations
 | 
						|
to store in which registers (\fIpacking\fR).
 | 
						|
Finally, the EM text is transformed to reflect these decisions.
 | 
						|
.NH 3
 | 
						|
The item recognition subphase
 | 
						|
.PP
 | 
						|
RA tries to put the following entities in a register:
 | 
						|
.IP -
 | 
						|
a local variable for which a register message was found
 | 
						|
.IP -
 | 
						|
the address of a local variable for which no
 | 
						|
register message was found
 | 
						|
.IP -
 | 
						|
the address of a global variable
 | 
						|
.IP -
 | 
						|
the address of a procedure
 | 
						|
.IP -
 | 
						|
a numeric constant.
 | 
						|
.LP
 | 
						|
Only the \fIaddress\fR of a global variable
 | 
						|
may be put in a register, not the variable itself.
 | 
						|
This approach avoids the very complex problems that would be
 | 
						|
caused by procedure calls and indirect pointer references (see
 | 
						|
.[~[
 | 
						|
aho design compiler
 | 
						|
.] sections 14.7 and 14.8]
 | 
						|
and 
 | 
						|
.[~[
 | 
						|
spillman side-effects
 | 
						|
.]]).
 | 
						|
Still, on most machines accessing a global variable using indirect
 | 
						|
addressing through a register is much cheaper than
 | 
						|
accessing it via its address.
 | 
						|
Similarly, if the address of a procedure is put in a register, the
 | 
						|
procedure can be called via an indirect call.
 | 
						|
.PP
 | 
						|
With every item we associate a register type.
 | 
						|
This type is
 | 
						|
.DS
 | 
						|
for local variables: the type contained in the register message
 | 
						|
for addresses of variables and procedures: the pointer type
 | 
						|
for constants: the general type
 | 
						|
.DE
 | 
						|
An entity other than a local variable is not taken to be an item
 | 
						|
if it is used only once within the current procedure.
 | 
						|
.PP
 | 
						|
An item is said to be \fIlive\fR at some point of the program text
 | 
						|
if its value may be used before it is changed.
 | 
						|
As addresses and constants are never changed, all items but local
 | 
						|
variables are always live.
 | 
						|
The region of text during which a local variable is live is
 | 
						|
determined via the live/dead messages generated by the
 | 
						|
Live Variable analysis phase of the Global Optimizer.
 | 
						|
.NH 3
 | 
						|
The allocation determination subphase
 | 
						|
.PP
 | 
						|
If a procedure has more items than registers,
 | 
						|
it may be advantageous to put an item in a register
 | 
						|
only during those parts of the procedure where it is most
 | 
						|
heavily used.
 | 
						|
Such a part will be called a timespan.
 | 
						|
With every item we may associate a set of timespans.
 | 
						|
If two timespans of an item overlap,
 | 
						|
at most one of them may be granted a register,
 | 
						|
as there is no use in putting the same item in two
 | 
						|
registers simultaneously.
 | 
						|
If two timespans of an item are distinct,
 | 
						|
both may be chosen;
 | 
						|
the item will possibly be put in two
 | 
						|
different registers during different parts of the procedure.
 | 
						|
The timespan may also consist
 | 
						|
of the whole procedure.
 | 
						|
.PP
 | 
						|
A list of (item,timespan) pairs (allocations)
 | 
						|
is build, which will be the input to the decision making
 | 
						|
subphase of RA (packing subphase).
 | 
						|
This allocation list is the main data structure of RA.
 | 
						|
The description of the remainder of RA will be in terms
 | 
						|
of allocations rather than items.
 | 
						|
The phrase "to assign a register to an allocation" means "to assign
 | 
						|
a register to the item of the allocation for the duration of
 | 
						|
the timespan of the allocation".
 | 
						|
Subsequent subphases will add more information
 | 
						|
to this list.
 | 
						|
.PP
 | 
						|
Several factors must be taken into account when a
 | 
						|
timespan for an item is constructed:
 | 
						|
.IP 1.
 | 
						|
At any \fIentry point\fR of the timespan where the
 | 
						|
item is live,
 | 
						|
the register must be initialized with the item
 | 
						|
.IP 2.
 | 
						|
At any exit point of the timespan where the item is live,
 | 
						|
the item must be updated.
 | 
						|
.LP
 | 
						|
In order to decrease these costs, we will only consider timespans with
 | 
						|
one entry point
 | 
						|
and no live exit points.
 | 
						|
.NH 3
 | 
						|
The rivals computation subphase
 | 
						|
.PP
 | 
						|
As stated before, several different items may be put in the
 | 
						|
same register, provided they are not live simultaneously.
 | 
						|
For every allocation we determine the intersection
 | 
						|
of its timespan and the lifetime of its item (i.e. the part of the
 | 
						|
procedure during which the item is live).
 | 
						|
The allocation is said to be busy during this intersection.
 | 
						|
If two allocations are ever busy simultaneously they are
 | 
						|
said to be rivals of each other.
 | 
						|
The rivals information is added to the allocation list.
 | 
						|
.NH 3
 | 
						|
The profits computation subphase
 | 
						|
.PP
 | 
						|
To make good decisions, the packing subphase needs to
 | 
						|
know which allocations can be assigned the same register
 | 
						|
(rivals information) and how much is gained by
 | 
						|
granting an allocation a register.
 | 
						|
.PP
 | 
						|
Besides the gains of using a register instead of an
 | 
						|
item,
 | 
						|
two kinds of overhead costs must be
 | 
						|
taken into account:
 | 
						|
.IP -
 | 
						|
the register must be initialized with the item
 | 
						|
.IP -
 | 
						|
the register must be saved at procedure entry
 | 
						|
and restored at procedure exit.
 | 
						|
.LP
 | 
						|
The latter costs should not be due to a single
 | 
						|
allocation, as several allocations can be assigned the same register.
 | 
						|
These costs are dealt with after packing has been done.
 | 
						|
They do not influence the decisions of the packing algorithm,
 | 
						|
they may only undo them.
 | 
						|
.PP
 | 
						|
The actual profits consist of improvements
 | 
						|
of execution time and code size.
 | 
						|
As the former is far more difficult to estimate , we will 
 | 
						|
discuss code size improvements first.
 | 
						|
.PP
 | 
						|
The gains of putting a certain item in a register
 | 
						|
depends on how the item is used.
 | 
						|
Suppose the item is
 | 
						|
a pointer variable.
 | 
						|
On machines that do not have a
 | 
						|
double-indirect addressing mode,
 | 
						|
two instructions are needed to dereference the variable
 | 
						|
if it is not in a register, but only one if it is put in a register.
 | 
						|
If the variable is not dereferenced, but simply copied, one instruction
 | 
						|
may be sufficient in both cases.
 | 
						|
So  the gains of putting a pointer variable in a register are higher
 | 
						|
if the variable is dereferenced often.
 | 
						|
.PP
 | 
						|
To make accurate estimates, detailed knowledge of
 | 
						|
the target machine and of the code generator
 | 
						|
would be needed.
 | 
						|
Therefore, a simplification has been made that substantially limits
 | 
						|
the amount of target machine information that is needed.
 | 
						|
The estimation of the number of bytes saved does
 | 
						|
not take into account how an item is used.
 | 
						|
Rather, an average number is used.
 | 
						|
So these gains are computed as follows:
 | 
						|
.DS
 | 
						|
#bytes_saved = #occurrences * gains_per_occurrence
 | 
						|
.DE
 | 
						|
The number of occurrences is derived from
 | 
						|
the EM code.
 | 
						|
Note that this is not exact either,
 | 
						|
as there is no one-to-one correspondence between occurrences in
 | 
						|
the EM code and in the assembler code.
 | 
						|
.PP
 | 
						|
The gains of one occurrence depend on:
 | 
						|
.IP 1.
 | 
						|
the type of the item
 | 
						|
.IP 2.
 | 
						|
the size of the item
 | 
						|
.IP 3.
 | 
						|
the type of the register
 | 
						|
.LP
 | 
						|
and for local variables and addresses of local variables:
 | 
						|
.IP 4.
 | 
						|
the type of the local variable
 | 
						|
.IP 5.
 | 
						|
the offset of the variable in the stackframe
 | 
						|
.LP
 | 
						|
For every allocation we try two types of registers: the register type
 | 
						|
of the item and the general register type.
 | 
						|
Only the type with the highest profits will subsequently be used.
 | 
						|
This type is added to the allocation information.
 | 
						|
.PP
 | 
						|
To compute the gains, RA uses a machine-dependent table
 | 
						|
that is read from a machine descriptor file.
 | 
						|
By means of this table the number of bytes saved can be computed
 | 
						|
as a function of the five properties.
 | 
						|
.PP
 | 
						|
The costs of initializing a register with an item
 | 
						|
is determined in a similar way.
 | 
						|
The cost of one initialization is also
 | 
						|
obtained from the descriptor file.
 | 
						|
Note that there can be at most one initialization for any
 | 
						|
allocation.
 | 
						|
.PP
 | 
						|
To summarize, the number of bytes a certain allocation would
 | 
						|
save is computed as follows:
 | 
						|
.DS
 | 
						|
net_bytes_saved =  bytes_saved - init_cost
 | 
						|
bytes_saved =      #occurrences * gains_per_occ
 | 
						|
init_cost =        #initializations * costs_per_init
 | 
						|
.DE
 | 
						|
.PP
 | 
						|
It is inherently more difficult to estimate the execution
 | 
						|
time saved by putting an item in a register,
 | 
						|
because it is impossible to predict how
 | 
						|
many times an item will be used dynamically.
 | 
						|
If an occurrence is part of a loop,
 | 
						|
it may be executed many times.
 | 
						|
If it is part of a conditional statement, 
 | 
						|
it may never be executed at all.
 | 
						|
In the latter case, the speed of the program may even get
 | 
						|
worse if an initialization is needed.
 | 
						|
As a clear example, consider the piece of "C" code in Fig. 13.1.
 | 
						|
.DS
 | 
						|
switch(expr) {
 | 
						|
      case 1:  p(); break;
 | 
						|
      case 2:  p(); p(); break;
 | 
						|
      case 3:  p(); break;
 | 
						|
      default: break;
 | 
						|
}
 | 
						|
 | 
						|
Fig. 13.1 A "C" switch statement
 | 
						|
.DE
 | 
						|
Lots of bytes may be saved by putting the address of procedure p
 | 
						|
in a register, as p is called four times (statically).
 | 
						|
Dynamically, p will be called zero, one or two times,
 | 
						|
depending on the value of the expression.
 | 
						|
.PP
 | 
						|
The optimizer uses the following strategy for optimizing
 | 
						|
execution time:
 | 
						|
.IP 1.
 | 
						|
try to put items in registers during \fIloops\fR first
 | 
						|
.IP 2.
 | 
						|
always keep the initializing code outside the loop
 | 
						|
.IP 3.
 | 
						|
if an item is not used in a loop, do not put it in a register if
 | 
						|
the initialization costs may be higher than the gains
 | 
						|
.LP
 | 
						|
The latter condition can be checked by determining the 
 | 
						|
minimal number of usages (dynamically) of the item during the procedure,
 | 
						|
via a shortest path algorithm.
 | 
						|
In the example above, this minimal number is zero, so the address of
 | 
						|
p is not put in a register.
 | 
						|
.PP
 | 
						|
The costs of one occurrence is estimated as described above for the
 | 
						|
code size.
 | 
						|
The number of dynamic occurrences is guessed by looking at the
 | 
						|
loop nesting level of every occurrence.
 | 
						|
If the item is never used in a loop,
 | 
						|
the minimal number of occurrences is used.
 | 
						|
From these facts, the execution time improvement is assessed
 | 
						|
for every allocation.
 | 
						|
.NH 3
 | 
						|
The packing subphase
 | 
						|
.PP
 | 
						|
The packing subphase takes as input the allocation
 | 
						|
list and outputs a
 | 
						|
description of which allocations should be put
 | 
						|
in which registers.
 | 
						|
So it is essentially the decision making part of RA.
 | 
						|
.PP
 | 
						|
The packing system tries to assign a register to allocations one
 | 
						|
at a time, in some yet to be defined order.
 | 
						|
For every allocation A, it first checks if there is a register
 | 
						|
(of the right type)
 | 
						|
that is already assigned to one or more allocations,
 | 
						|
none of which are rivals of A.
 | 
						|
In this case A is assigned the same register.
 | 
						|
Else, A is assigned a new register, if one exists.
 | 
						|
A table containing the number of free registers for every type
 | 
						|
is maintained.
 | 
						|
It is initialized with the number of non-scratch registers of
 | 
						|
the target computer and updated whenever a
 | 
						|
new register is handed out.
 | 
						|
The packing algorithm stops when no more allocations can 
 | 
						|
or need be assigned a register.
 | 
						|
.PP
 | 
						|
After an allocation A has been packed,
 | 
						|
all allocations with non-disjunct timespans (including
 | 
						|
A itself) are removed from the allocation list.
 | 
						|
.PP
 | 
						|
In case the number of items exceeds the number of registers, it
 | 
						|
is important to choose the most profitable allocations.
 | 
						|
Due to the possibility of having several allocations
 | 
						|
occupying the same register,
 | 
						|
this problem is quite complex.
 | 
						|
Our packing algorithm uses simple heuristic rules
 | 
						|
and avoids any combinatorial search.
 | 
						|
It has distinct rules for different costs measures.
 | 
						|
.PP
 | 
						|
If object code size is the most important factor,
 | 
						|
the algorithm is greedy and chooses allocations in
 | 
						|
decreasing order of their profits attribute.
 | 
						|
It does not take into account the fact that
 | 
						|
other allocations may be passed over because of
 | 
						|
this decision.
 | 
						|
.PP
 | 
						|
If execution time is at prime stake, the algorithm
 | 
						|
first considers allocations whose timespans consist of loops.
 | 
						|
After all these have been packed, it considers the remaining
 | 
						|
allocations.
 | 
						|
Within the two subclasses, it considers allocations
 | 
						|
with the highest profits first.
 | 
						|
When assigning a register to an allocation with a loop
 | 
						|
as timespan, the algorithm checks if the item has
 | 
						|
already been put in a register during another loop.
 | 
						|
If so, it tries to use the same register for the
 | 
						|
new allocation.
 | 
						|
After all packing has been done,
 | 
						|
it checks if the item has always been assigned the same
 | 
						|
register (although not necessarily during all loops).
 | 
						|
If so, it tries to put the item in that register during
 | 
						|
the entire procedure. This is possible
 | 
						|
if the allocation (item,whole_procedure) is not a rival
 | 
						|
of any allocation with a different item that has been
 | 
						|
assigned to the same register.
 | 
						|
Note that this approach is essentially 'bottom up',
 | 
						|
as registers are first assigned over small regions
 | 
						|
of text which are later collapsed into larger regions.
 | 
						|
The advantage of this approach is the fact that
 | 
						|
the decisions for one loop can be made independently
 | 
						|
of all other loops.
 | 
						|
.PP
 | 
						|
After the entire packing process has been completed,
 | 
						|
we compute for each register how much is gained in using
 | 
						|
this register, by simply adding the net profits
 | 
						|
of all allocations assigned to it.
 | 
						|
This total yield should outweigh the costs of
 | 
						|
saving/restoring the register at procedure entry/exit.
 | 
						|
As most modern processors (e.g. 68000, Vax) have special
 | 
						|
instructions to save/restore several registers,
 | 
						|
the differential costs of saving one extra register are by
 | 
						|
no means constant.
 | 
						|
The costs are read from the machine descriptor file and
 | 
						|
compared to the total yields of the registers.
 | 
						|
As a consequence of this analysis, some allocations 
 | 
						|
may have their registers taken away.
 | 
						|
.NH 3
 | 
						|
The transformation subphase
 | 
						|
.PP
 | 
						|
The final subphase of RA transforms the EM text according to the
 | 
						|
decisions made by the packing system.
 | 
						|
It traverses the text of the currently optimized procedure and
 | 
						|
changes all occurrences of items at points where
 | 
						|
they are assigned a register.
 | 
						|
It also clears the score field of the register messages for
 | 
						|
normal local variables and emits register messages with a very
 | 
						|
high score for the pseudo locals.
 | 
						|
At points where registers have to be initialized with items,
 | 
						|
it generates EM code to do so.
 | 
						|
Finally it tries to decrease the size of the stackframe
 | 
						|
of the procedure by looking at which local variables need not
 | 
						|
be given memory locations.
 |