386 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			386 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .NH 2
 | |
| The register allocation phase
 | |
| .PP
 | |
| .NH 3
 | |
| Overview
 | |
| .PP
 | |
| The RA phase deals with one procedure at a time.
 | |
| For every procedure, it first determines which entities
 | |
| may be put in a register. Such an entity
 | |
| is called an \fIitem\fR.
 | |
| For every item it decides during which parts of the procedure it
 | |
| might be assigned a register.
 | |
| Such a region is called a \fItimespan\fR.
 | |
| For any item, several (possibly overlapping) timespans may
 | |
| be considered.
 | |
| A pair (item,timespan) is called an \fIallocation\fR.
 | |
| If the items of two allocations are both live at some
 | |
| point of time in the intersections of their timespans,
 | |
| these allocations are said to be \fIrivals\fR of each other,
 | |
| as they cannot be assigned the same register.
 | |
| The rivals-set of every allocation is computed.
 | |
| Next, the gains of assigning a register to an allocation are estimated,
 | |
| for every allocation.
 | |
| With all this information, decisions are made which allocations
 | |
| to store in which registers (\fIpacking\fR).
 | |
| Finally, the EM text is transformed to reflect these decisions.
 | |
| .NH 3
 | |
| The item recognition subphase
 | |
| .PP
 | |
| RA tries to put the following entities in a register:
 | |
| .IP -
 | |
| a local variable for which a register message was found
 | |
| .IP -
 | |
| the address of a local variable for which no
 | |
| register message was found
 | |
| .IP -
 | |
| the address of a global variable
 | |
| .IP -
 | |
| the address of a procedure
 | |
| .IP -
 | |
| a numeric constant.
 | |
| .LP
 | |
| Only the \fIaddress\fR of a global variable
 | |
| may be put in a register, not the variable itself.
 | |
| This approach avoids the very complex problems that would be
 | |
| caused by procedure calls and indirect pointer references (see
 | |
| .[~[
 | |
| aho design compiler
 | |
| .] sections 14.7 and 14.8]
 | |
| and 
 | |
| .[~[
 | |
| spillman side-effects
 | |
| .]]).
 | |
| Still, on most machines accessing a global variable using indirect
 | |
| addressing through a register is much cheaper than
 | |
| accessing it via its address.
 | |
| Similarly, if the address of a procedure is put in a register, the
 | |
| procedure can be called via an indirect call.
 | |
| .PP
 | |
| With every item we associate a register type.
 | |
| This type is
 | |
| .DS
 | |
| for local variables: the type contained in the register message
 | |
| for addresses of variables and procedures: the pointer type
 | |
| for constants: the general type
 | |
| .DE
 | |
| An entity other than a local variable is not taken to be an item
 | |
| if it is used only once within the current procedure.
 | |
| .PP
 | |
| An item is said to be \fIlive\fR at some point of the program text
 | |
| if its value may be used before it is changed.
 | |
| As addresses and constants are never changed, all items but local
 | |
| variables are always live.
 | |
| The region of text during which a local variable is live is
 | |
| determined via the live/dead messages generated by the
 | |
| Live Variable analysis phase of the Global Optimizer.
 | |
| .NH 3
 | |
| The allocation determination subphase
 | |
| .PP
 | |
| If a procedure has more items than registers,
 | |
| it may be advantageous to put an item in a register
 | |
| only during those parts of the procedure where it is most
 | |
| heavily used.
 | |
| Such a part will be called a timespan.
 | |
| With every item we may associate a set of timespans.
 | |
| If two timespans of an item overlap,
 | |
| at most one of them may be granted a register,
 | |
| as there is no use in putting the same item in two
 | |
| registers simultaneously.
 | |
| If two timespans of an item are distinct,
 | |
| both may be chosen;
 | |
| the item will possibly be put in two
 | |
| different registers during different parts of the procedure.
 | |
| The timespan may also consist
 | |
| of the whole procedure.
 | |
| .PP
 | |
| A list of (item,timespan) pairs (allocations)
 | |
| is build, which will be the input to the decision making
 | |
| subphase of RA (packing subphase).
 | |
| This allocation list is the main data structure of RA.
 | |
| The description of the remainder of RA will be in terms
 | |
| of allocations rather than items.
 | |
| The phrase "to assign a register to an allocation" means "to assign
 | |
| a register to the item of the allocation for the duration of
 | |
| the timespan of the allocation".
 | |
| Subsequent subphases will add more information
 | |
| to this list.
 | |
| .PP
 | |
| Several factors must be taken into account when a
 | |
| timespan for an item is constructed:
 | |
| .IP 1.
 | |
| At any \fIentry point\fR of the timespan where the
 | |
| item is live,
 | |
| the register must be initialized with the item
 | |
| .IP 2.
 | |
| At any exit point of the timespan where the item is live,
 | |
| the item must be updated.
 | |
| .LP
 | |
| In order to decrease these costs, we will only consider timespans with
 | |
| one entry point
 | |
| and no live exit points.
 | |
| .NH 3
 | |
| The rivals computation subphase
 | |
| .PP
 | |
| As stated before, several different items may be put in the
 | |
| same register, provided they are not live simultaneously.
 | |
| For every allocation we determine the intersection
 | |
| of its timespan and the lifetime of its item (i.e. the part of the
 | |
| procedure during which the item is live).
 | |
| The allocation is said to be busy during this intersection.
 | |
| If two allocations are ever busy simultaneously they are
 | |
| said to be rivals of each other.
 | |
| The rivals information is added to the allocation list.
 | |
| .NH 3
 | |
| The profits computation subphase
 | |
| .PP
 | |
| To make good decisions, the packing subphase needs to
 | |
| know which allocations can be assigned the same register
 | |
| (rivals information) and how much is gained by
 | |
| granting an allocation a register.
 | |
| .PP
 | |
| Besides the gains of using a register instead of an
 | |
| item,
 | |
| two kinds of overhead costs must be
 | |
| taken into account:
 | |
| .IP -
 | |
| the register must be initialized with the item
 | |
| .IP -
 | |
| the register must be saved at procedure entry
 | |
| and restored at procedure exit.
 | |
| .LP
 | |
| The latter costs should not be due to a single
 | |
| allocation, as several allocations can be assigned the same register.
 | |
| These costs are dealt with after packing has been done.
 | |
| They do not influence the decisions of the packing algorithm,
 | |
| they may only undo them.
 | |
| .PP
 | |
| The actual profits consist of improvements
 | |
| of execution time and code size.
 | |
| As the former is far more difficult to estimate , we will 
 | |
| discuss code size improvements first.
 | |
| .PP
 | |
| The gains of putting a certain item in a register
 | |
| depends on how the item is used.
 | |
| Suppose the item is
 | |
| a pointer variable.
 | |
| On machines that do not have a
 | |
| double-indirect addressing mode,
 | |
| two instructions are needed to dereference the variable
 | |
| if it is not in a register, but only one if it is put in a register.
 | |
| If the variable is not dereferenced, but simply copied, one instruction
 | |
| may be sufficient in both cases.
 | |
| So  the gains of putting a pointer variable in a register are higher
 | |
| if the variable is dereferenced often.
 | |
| .PP
 | |
| To make accurate estimates, detailed knowledge of
 | |
| the target machine and of the code generator
 | |
| would be needed.
 | |
| Therefore, a simplification has been made that substantially limits
 | |
| the amount of target machine information that is needed.
 | |
| The estimation of the number of bytes saved does
 | |
| not take into account how an item is used.
 | |
| Rather, an average number is used.
 | |
| So these gains are computed as follows:
 | |
| .DS
 | |
| #bytes_saved = #occurrences * gains_per_occurrence
 | |
| .DE
 | |
| The number of occurrences is derived from
 | |
| the EM code.
 | |
| Note that this is not exact either,
 | |
| as there is no one-to-one correspondence between occurrences in
 | |
| the EM code and in the assembler code.
 | |
| .PP
 | |
| The gains of one occurrence depend on:
 | |
| .IP 1.
 | |
| the type of the item
 | |
| .IP 2.
 | |
| the size of the item
 | |
| .IP 3.
 | |
| the type of the register
 | |
| .LP
 | |
| and for local variables and addresses of local variables:
 | |
| .IP 4.
 | |
| the type of the local variable
 | |
| .IP 5.
 | |
| the offset of the variable in the stackframe
 | |
| .LP
 | |
| For every allocation we try two types of registers: the register type
 | |
| of the item and the general register type.
 | |
| Only the type with the highest profits will subsequently be used.
 | |
| This type is added to the allocation information.
 | |
| .PP
 | |
| To compute the gains, RA uses a machine-dependent table
 | |
| that is read from a machine descriptor file.
 | |
| By means of this table the number of bytes saved can be computed
 | |
| as a function of the five properties.
 | |
| .PP
 | |
| The costs of initializing a register with an item
 | |
| is determined in a similar way.
 | |
| The cost of one initialization is also
 | |
| obtained from the descriptor file.
 | |
| Note that there can be at most one initialization for any
 | |
| allocation.
 | |
| .PP
 | |
| To summarize, the number of bytes a certain allocation would
 | |
| save is computed as follows:
 | |
| .DS
 | |
| .TS
 | |
| l l.
 | |
| net_bytes_saved =	bytes_saved - init_cost
 | |
| bytes_saved =	#occurrences * gains_per_occ
 | |
| init_cost =	#initializations * costs_per_init
 | |
| .TE
 | |
| .DE
 | |
| .PP
 | |
| It is inherently more difficult to estimate the execution
 | |
| time saved by putting an item in a register,
 | |
| because it is impossible to predict how
 | |
| many times an item will be used dynamically.
 | |
| If an occurrence is part of a loop,
 | |
| it may be executed many times.
 | |
| If it is part of a conditional statement, 
 | |
| it may never be executed at all.
 | |
| In the latter case, the speed of the program may even get
 | |
| worse if an initialization is needed.
 | |
| As a clear example, consider the piece of "C" code in Fig. 13.1.
 | |
| .DS
 | |
| switch(expr) {
 | |
|       case 1:  p(); break;
 | |
|       case 2:  p(); p(); break;
 | |
|       case 3:  p(); break;
 | |
|       default: break;
 | |
| }
 | |
| 
 | |
| Fig. 13.1 A "C" switch statement
 | |
| .DE
 | |
| Lots of bytes may be saved by putting the address of procedure p
 | |
| in a register, as p is called four times (statically).
 | |
| Dynamically, p will be called zero, one or two times,
 | |
| depending on the value of the expression.
 | |
| .PP
 | |
| The optimizer uses the following strategy for optimizing
 | |
| execution time:
 | |
| .IP 1.
 | |
| try to put items in registers during \fIloops\fR first
 | |
| .IP 2.
 | |
| always keep the initializing code outside the loop
 | |
| .IP 3.
 | |
| if an item is not used in a loop, do not put it in a register if
 | |
| the initialization costs may be higher than the gains
 | |
| .LP
 | |
| The latter condition can be checked by determining the 
 | |
| minimal number of usages (dynamically) of the item during the procedure,
 | |
| via a shortest path algorithm.
 | |
| In the example above, this minimal number is zero, so the address of
 | |
| p is not put in a register.
 | |
| .PP
 | |
| The costs of one occurrence is estimated as described above for the
 | |
| code size.
 | |
| The number of dynamic occurrences is guessed by looking at the
 | |
| loop nesting level of every occurrence.
 | |
| If the item is never used in a loop,
 | |
| the minimal number of occurrences is used.
 | |
| From these facts, the execution time improvement is assessed
 | |
| for every allocation.
 | |
| .NH 3
 | |
| The packing subphase
 | |
| .PP
 | |
| The packing subphase takes as input the allocation
 | |
| list and outputs a
 | |
| description of which allocations should be put
 | |
| in which registers.
 | |
| So it is essentially the decision making part of RA.
 | |
| .PP
 | |
| The packing system tries to assign a register to allocations one
 | |
| at a time, in some yet to be defined order.
 | |
| For every allocation A, it first checks if there is a register
 | |
| (of the right type)
 | |
| that is already assigned to one or more allocations,
 | |
| none of which are rivals of A.
 | |
| In this case A is assigned the same register.
 | |
| Else, A is assigned a new register, if one exists.
 | |
| A table containing the number of free registers for every type
 | |
| is maintained.
 | |
| It is initialized with the number of non-scratch registers of
 | |
| the target computer and updated whenever a
 | |
| new register is handed out.
 | |
| The packing algorithm stops when no more allocations can 
 | |
| or need be assigned a register.
 | |
| .PP
 | |
| After an allocation A has been packed,
 | |
| all allocations with non-disjunct timespans (including
 | |
| A itself) are removed from the allocation list.
 | |
| .PP
 | |
| In case the number of items exceeds the number of registers, it
 | |
| is important to choose the most profitable allocations.
 | |
| Due to the possibility of having several allocations
 | |
| occupying the same register,
 | |
| this problem is quite complex.
 | |
| Our packing algorithm uses simple heuristic rules
 | |
| and avoids any combinatorial search.
 | |
| It has distinct rules for different costs measures.
 | |
| .PP
 | |
| If object code size is the most important factor,
 | |
| the algorithm is greedy and chooses allocations in
 | |
| decreasing order of their profits attribute.
 | |
| It does not take into account the fact that
 | |
| other allocations may be passed over because of
 | |
| this decision.
 | |
| .PP
 | |
| If execution time is at prime stake, the algorithm
 | |
| first considers allocations whose timespans consist of loops.
 | |
| After all these have been packed, it considers the remaining
 | |
| allocations.
 | |
| Within the two subclasses, it considers allocations
 | |
| with the highest profits first.
 | |
| When assigning a register to an allocation with a loop
 | |
| as timespan, the algorithm checks if the item has
 | |
| already been put in a register during another loop.
 | |
| If so, it tries to use the same register for the
 | |
| new allocation.
 | |
| After all packing has been done,
 | |
| it checks if the item has always been assigned the same
 | |
| register (although not necessarily during all loops).
 | |
| If so, it tries to put the item in that register during
 | |
| the entire procedure. This is possible
 | |
| if the allocation (item,whole_procedure) is not a rival
 | |
| of any allocation with a different item that has been
 | |
| assigned to the same register.
 | |
| Note that this approach is essentially 'bottom up',
 | |
| as registers are first assigned over small regions
 | |
| of text which are later collapsed into larger regions.
 | |
| The advantage of this approach is the fact that
 | |
| the decisions for one loop can be made independently
 | |
| of all other loops.
 | |
| .PP
 | |
| After the entire packing process has been completed,
 | |
| we compute for each register how much is gained in using
 | |
| this register, by simply adding the net profits
 | |
| of all allocations assigned to it.
 | |
| This total yield should outweigh the costs of
 | |
| saving/restoring the register at procedure entry/exit.
 | |
| As most modern processors (e.g. 68000, Vax) have special
 | |
| instructions to save/restore several registers,
 | |
| the differential costs of saving one extra register are by
 | |
| no means constant.
 | |
| The costs are read from the machine descriptor file and
 | |
| compared to the total yields of the registers.
 | |
| As a consequence of this analysis, some allocations 
 | |
| may have their registers taken away.
 | |
| .NH 3
 | |
| The transformation subphase
 | |
| .PP
 | |
| The final subphase of RA transforms the EM text according to the
 | |
| decisions made by the packing system.
 | |
| It traverses the text of the currently optimized procedure and
 | |
| changes all occurrences of items at points where
 | |
| they are assigned a register.
 | |
| It also clears the score field of the register messages for
 | |
| normal local variables and emits register messages with a very
 | |
| high score for the pseudo locals.
 | |
| At points where registers have to be initialized with items,
 | |
| it generates EM code to do so.
 | |
| Finally it tries to decrease the size of the stackframe
 | |
| of the procedure by looking at which local variables need not
 | |
| be given memory locations.
 |