.NH 2 The register allocation phase .PP .NH 3 Overview .PP The RA phase deals with one procedure at a time. For every procedure, it first determines which entities may be put in a register. Such an entity is called an \fIitem\fR. For every item it decides during which parts of the procedure it might be assigned a register. Such a region is called a \fItimespan\fR. For any item, several (possibly overlapping) timespans may be considered. A pair (item,timespan) is called an \fIallocation\fR. If the items of two allocations are both live at some point of time in the intersections of their timespans, these allocations are said to be \fIrivals\fR of each other, as they cannot be assigned the same register. The rivals-set of every allocation is computed. Next, the gains of assigning a register to an allocation are estimated, for every allocation. With all this information, decisions are made which allocations to store in which registers (\fIpacking\fR). Finally, the EM text is transformed to reflect these decisions. .NH 3 The item recognition subphase .PP RA tries to put the following entities in a register: .IP - a local variable for which a register message was found .IP - the address of a local variable for which no register message was found .IP - the address of a global variable .IP - the address of a procedure .IP - a numeric constant. .LP Only the \fIaddress\fR of a global variable may be put in a register, not the variable itself. This approach avoids the very complex problems that would be caused by procedure calls and indirect pointer references (see .[~[ aho design compiler .] sections 14.7 and 14.8] and .[~[ spillman side-effects .]]). Still, on most machines accessing a global variable using indirect addressing through a register is much cheaper than accessing it via its address. Similarly, if the address of a procedure is put in a register, the procedure can be called via an indirect call. .PP With every item we associate a register type. This type is .DS for local variables: the type contained in the register message for addresses of variables and procedures: the pointer type for constants: the general type .DE An entity other than a local variable is not taken to be an item if it is used only once within the current procedure. .PP An item is said to be \fIlive\fR at some point of the program text if its value may be used before it is changed. As addresses and constants are never changed, all items but local variables are always live. The region of text during which a local variable is live is determined via the live/dead messages generated by the Live Variable analysis phase of the Global Optimizer. .NH 3 The allocation determination subphase .PP If a procedure has more items than registers, it may be advantageous to put an item in a register only during those parts of the procedure where it is most heavily used. Such a part will be called a timespan. With every item we may associate a set of timespans. If two timespans of an item overlap, at most one of them may be granted a register, as there is no use in putting the same item in two registers simultaneously. If two timespans of an item are distinct, both may be chosen; the item will possibly be put in two different registers during different parts of the procedure. The timespan may also consist of the whole procedure. .PP A list of (item,timespan) pairs (allocations) is build, which will be the input to the decision making subphase of RA (packing subphase). This allocation list is the main data structure of RA. The description of the remainder of RA will be in terms of allocations rather than items. The phrase "to assign a register to an allocation" means "to assign a register to the item of the allocation for the duration of the timespan of the allocation". Subsequent subphases will add more information to this list. .PP Several factors must be taken into account when a timespan for an item is constructed: .IP 1. At any \fIentry point\fR of the timespan where the item is live, the register must be initialized with the item .IP 2. At any exit point of the timespan where the item is live, the item must be updated. .LP In order to decrease these costs, we will only consider timespans with one entry point and no live exit points. .NH 3 The rivals computation subphase .PP As stated before, several different items may be put in the same register, provided they are not live simultaneously. For every allocation we determine the intersection of its timespan and the lifetime of its item (i.e. the part of the procedure during which the item is live). The allocation is said to be busy during this intersection. If two allocations are ever busy simultaneously they are said to be rivals of each other. The rivals information is added to the allocation list. .NH 3 The profits computation subphase .PP To make good decisions, the packing subphase needs to know which allocations can be assigned the same register (rivals information) and how much is gained by granting an allocation a register. .PP Besides the gains of using a register instead of an item, two kinds of overhead costs must be taken into account: .IP - the register must be initialized with the item .IP - the register must be saved at procedure entry and restored at procedure exit. .LP The latter costs should not be due to a single allocation, as several allocations can be assigned the same register. These costs are dealt with after packing has been done. They do not influence the decisions of the packing algorithm, they may only undo them. .PP The actual profits consist of improvements of execution time and code size. As the former is far more difficult to estimate , we will discuss code size improvements first. .PP The gains of putting a certain item in a register depends on how the item is used. Suppose the item is a pointer variable. On machines that do not have a double-indirect addressing mode, two instructions are needed to dereference the variable if it is not in a register, but only one if it is put in a register. If the variable is not dereferenced, but simply copied, one instruction may be sufficient in both cases. So the gains of putting a pointer variable in a register are higher if the variable is dereferenced often. .PP To make accurate estimates, detailed knowledge of the target machine and of the code generator would be needed. Therefore, a simplification has been made that substantially limits the amount of target machine information that is needed. The estimation of the number of bytes saved does not take into account how an item is used. Rather, an average number is used. So these gains are computed as follows: .DS #bytes_saved = #occurrences * gains_per_occurrence .DE The number of occurrences is derived from the EM code. Note that this is not exact either, as there is no one-to-one correspondence between occurrences in the EM code and in the assembler code. .PP The gains of one occurrence depend on: .IP 1. the type of the item .IP 2. the size of the item .IP 3. the type of the register .LP and for local variables and addresses of local variables: .IP 4. the type of the local variable .IP 5. the offset of the variable in the stackframe .LP For every allocation we try two types of registers: the register type of the item and the general register type. Only the type with the highest profits will subsequently be used. This type is added to the allocation information. .PP To compute the gains, RA uses a machine-dependent table that is read from a machine descriptor file. By means of this table the number of bytes saved can be computed as a function of the five properties. .PP The costs of initializing a register with an item is determined in a similar way. The cost of one initialization is also obtained from the descriptor file. Note that there can be at most one initialization for any allocation. .PP To summarize, the number of bytes a certain allocation would save is computed as follows: .DS .TS l l. net_bytes_saved = bytes_saved - init_cost bytes_saved = #occurrences * gains_per_occ init_cost = #initializations * costs_per_init .TE .DE .PP It is inherently more difficult to estimate the execution time saved by putting an item in a register, because it is impossible to predict how many times an item will be used dynamically. If an occurrence is part of a loop, it may be executed many times. If it is part of a conditional statement, it may never be executed at all. In the latter case, the speed of the program may even get worse if an initialization is needed. As a clear example, consider the piece of "C" code in Fig. 13.1. .DS switch(expr) { case 1: p(); break; case 2: p(); p(); break; case 3: p(); break; default: break; } Fig. 13.1 A "C" switch statement .DE Lots of bytes may be saved by putting the address of procedure p in a register, as p is called four times (statically). Dynamically, p will be called zero, one or two times, depending on the value of the expression. .PP The optimizer uses the following strategy for optimizing execution time: .IP 1. try to put items in registers during \fIloops\fR first .IP 2. always keep the initializing code outside the loop .IP 3. if an item is not used in a loop, do not put it in a register if the initialization costs may be higher than the gains .LP The latter condition can be checked by determining the minimal number of usages (dynamically) of the item during the procedure, via a shortest path algorithm. In the example above, this minimal number is zero, so the address of p is not put in a register. .PP The costs of one occurrence is estimated as described above for the code size. The number of dynamic occurrences is guessed by looking at the loop nesting level of every occurrence. If the item is never used in a loop, the minimal number of occurrences is used. From these facts, the execution time improvement is assessed for every allocation. .NH 3 The packing subphase .PP The packing subphase takes as input the allocation list and outputs a description of which allocations should be put in which registers. So it is essentially the decision making part of RA. .PP The packing system tries to assign a register to allocations one at a time, in some yet to be defined order. For every allocation A, it first checks if there is a register (of the right type) that is already assigned to one or more allocations, none of which are rivals of A. In this case A is assigned the same register. Else, A is assigned a new register, if one exists. A table containing the number of free registers for every type is maintained. It is initialized with the number of non-scratch registers of the target computer and updated whenever a new register is handed out. The packing algorithm stops when no more allocations can or need be assigned a register. .PP After an allocation A has been packed, all allocations with non-disjunct timespans (including A itself) are removed from the allocation list. .PP In case the number of items exceeds the number of registers, it is important to choose the most profitable allocations. Due to the possibility of having several allocations occupying the same register, this problem is quite complex. Our packing algorithm uses simple heuristic rules and avoids any combinatorial search. It has distinct rules for different costs measures. .PP If object code size is the most important factor, the algorithm is greedy and chooses allocations in decreasing order of their profits attribute. It does not take into account the fact that other allocations may be passed over because of this decision. .PP If execution time is at prime stake, the algorithm first considers allocations whose timespans consist of loops. After all these have been packed, it considers the remaining allocations. Within the two subclasses, it considers allocations with the highest profits first. When assigning a register to an allocation with a loop as timespan, the algorithm checks if the item has already been put in a register during another loop. If so, it tries to use the same register for the new allocation. After all packing has been done, it checks if the item has always been assigned the same register (although not necessarily during all loops). If so, it tries to put the item in that register during the entire procedure. This is possible if the allocation (item,whole_procedure) is not a rival of any allocation with a different item that has been assigned to the same register. Note that this approach is essentially 'bottom up', as registers are first assigned over small regions of text which are later collapsed into larger regions. The advantage of this approach is the fact that the decisions for one loop can be made independently of all other loops. .PP After the entire packing process has been completed, we compute for each register how much is gained in using this register, by simply adding the net profits of all allocations assigned to it. This total yield should outweigh the costs of saving/restoring the register at procedure entry/exit. As most modern processors (e.g. 68000, Vax) have special instructions to save/restore several registers, the differential costs of saving one extra register are by no means constant. The costs are read from the machine descriptor file and compared to the total yields of the registers. As a consequence of this analysis, some allocations may have their registers taken away. .NH 3 The transformation subphase .PP The final subphase of RA transforms the EM text according to the decisions made by the packing system. It traverses the text of the currently optimized procedure and changes all occurrences of items at points where they are assigned a register. It also clears the score field of the register messages for normal local variables and emits register messages with a very high score for the pseudo locals. At points where registers have to be initialized with items, it generates EM code to do so. Finally it tries to decrease the size of the stackframe of the procedure by looking at which local variables need not be given memory locations.