.NH 2 Implementation .PP A major factor in the implementation of Inline Substitution is the requirement not to use an excessive amount of memory. IL essentially analyzes the entire program; it makes decisions based on which procedure calls appear in the whole program. Yet, because of the memory restriction, it is not feasible to read the entire program in main memory. To solve this problem, the IL phase has been split up into three subphases that are executed sequentially: .IP 1. analyze every procedure; see how it accesses its parameters; simultaneously collect all calls appearing in the whole program an put them in a \fIcall-list\fR. .IP 2. use the call-list and decide which calls will be substituted in line. .IP 3. take the decisions of subphase 2 and modify the program accordingly. .LP Subphases 1 and 3 scan the input program; only subphase 3 modifies it. It is essential that the decisions can be made in subphase 2 without using the input program, provided that subphase 1 puts enough information in the call-list. Subphase 2 keeps the entire call-list in main memory and repeatedly scans it, to find the next best candidate for expansion. .PP We will specify the data structures used by IL before describing the subphases. .NH 3 Data structures .NH 4 The procedure table .PP In subphase 1 information is gathered about every procedure and added to the procedure table. This information is used by the heuristic rules. A proctable entry for procedure p has the following extra information: .IP - is it allowed to substitute an invocation of p in line? .IP - is it allowed to put any parameter of such a call in line? .IP - the size of p (number of EM instructions) .IP - does p 'fall through'? .IP - a description of the formal parameters that p accesses; this information is obtained by looking at the code of p. For every parameter f, we record: .RS .IP - the offset of f .IP - the type of f (word, double word, pointer) .IP - may the corresponding actual parameter be put in line? .IP - is f ever accessed indirectly? .IP - if f used: never, once or more than once? .RE .IP - the number of times p is called (see below) .IP - the file address of its call-count information (see below). .LP .NH 4 Call-count information .PP As a result of Inline Substitution, some procedures may become useless, because all their invocations have been substituted in line. One of the tasks of IL is to keep track which procedures are no longer called. Note that IL is especially keen on procedures that are called only once (possibly as a result of expanding all other calls to it). So we want to know how many times a procedure is called \fIduring\fR Inline Substitution. It is not good enough to compute this information afterwards. The task is rather complex, because the number of times a procedure is called varies during the entire process: .IP 1. If a call to p is substituted in line, the number of calls to p gets decremented by 1. .IP 2. If a call to p is substituted in line, and p contains n calls to q, then the number of calls to q gets incremented by n. .IP 3. If a procedure p is removed (because it is no longer called) and p contains n calls to q, then the number of calls to q gets decremented by n. .LP (Note that p may be the same as q, if p is recursive). .sp 0 So we actually want to have the following information: .DS NRCALL(p,q) = number of call to q appearing in p, for all procedures p and q that may be put in line. .DE This information, called \fIcall-count information\fR is computed by the first subphase. It is stored in a file. It is represented as a number of lists, rather than as a (very sparse) matrix. Every procedure has a list of (proc,count) pairs, telling which procedures it calls, and how many times. The file address of its call-count list is stored in its proctable entry. Whenever this information is needed, it is fetched from the file, using direct access. The proctable entry also contains the number of times a procedure is called, at any moment. .NH 4 The call-list .PP The call-list is the major data structure use by IL. Every item of the list describes one procedure call. It contains the following attributes: .IP - the calling procedure (caller) .IP - the called procedure (callee) .IP - identification of the CAL instruction (sequence number) .IP - the loop nesting level; our heuristic rules appreciate calls inside a loop (or even inside a loop nested inside another loop, etc.) more than other calls .IP - the actual parameter expressions involved in the call; for every actual, we record: .RS .IP - the EM code of the expression .IP - the number of bytes of its result (size) .IP - an indication if the actual may be put in line .RE .LP The structure of the call-list is rather complex. Whenever a call is expanded in line, new calls will suddenly appear in the program, that were not contained in the original body of the calling subroutine. These calls are inherited from the called procedure. We will refer to these invocations as \fInested calls\fR (see Fig. 5.1). .DS procedure p is begin . a(); . b(); . end; procedure r is procedure r is begin begin x(); x(); p(); -- in line a(); -- nested call y(); b(); -- nested call end; y(); end; Fig. 5.1 Example of nested procedure calls .DE Nested calls may subsequently be put in line too (probably resulting in a yet deeper nesting level, etc.). So the call-list does not always reflect the source program, but changes dynamically, as decisions are made. If a call to p is expanded, all calls appearing in p will be added to the call-list. .sp 0 A convenient and elegant way to represent the call-list is to use a LISP-like list. .[ poel lisp trac .] Calls that appear at the same level are linked in the CDR direction. If a call C to a procedure p is expanded, all calls appearing in p are put in a sub-list of C, i.e. in its CAR. In the example above, before the decision to expand the call to p is made, the call-list of procedure r looks like: .DS (call-to-x, call-to-p, call-to-y) .DE After the decision, it looks like: .DS (call-to-x, (call-to-p*, call-to-a, call-to-b), call-to-y) .DE The call to p is marked, because it has been substituted. Whenever IL wants to traverse the call-list of some procedure, it uses the well-known LISP technique of recursion in the CAR direction and iteration in the CDR direction (see page 1.19-2 of .[ poel lisp trac .] ). All list traversals look like: .DS traverse(list) { for (c = first(list); c != 0; c = CDR(c)) { if (c is marked) { traverse(CAR(c)); } else { do something with c } } } .DE The entire call-list consists of a number of LISP-like lists, one for every procedure. The proctable entry of a procedure contains a pointer to the beginning of the list. .NH 3 The first subphase: procedure analysis .PP The tasks of the first subphase are to determine several attributes of every procedure and to construct the basic call-list, i.e. without nested calls. The size of a procedure is determined by simply counting its EM instructions. Pseudo instructions are skipped. A procedure does not 'fall through' if its CFG contains a basic block that is not the last block of the CFG and that ends on a RET instruction. The formal parameters of a procedure are determined by inspection of its code. .PP The call-list in constructed by looking at all CAL instructions appearing in the program. The call-list should only contain calls to procedures that may be put in line. This fact is only known if the procedure was analyzed earlier. If a call to a procedure p appears in the program before the body of p, the call will always be put in the call-list. If p is later found to be unsuitable, the call will be removed from the list by the second subphase. .PP An important issue is the recognition of the actual parameter expressions of the call. The front ends produces messages telling how many bytes of formal parameters every procedure accesses. (If there is no such message for a procedure, it cannot be put in line). The actual parameters together must account for the same number of bytes.A recursive descent parser is used to parse side-effect free EM expressions. It uses a table and some auxiliary routines to determine how many bytes every EM instruction pops from the stack and how many bytes it pushes onto the stack. These numbers depend on the EM instruction, its argument, and the wordsize and pointersize of the target machine. Initially, the parser has to recognize the number of bytes specified in the formals-message, say N. Assume the first instruction before the CAL pops S bytes and pushes R bytes. If R > N, too many bytes are recognized and the parser fails. Else, it calls itself recursively to recognize the S bytes used as operand of the instruction. If it succeeds in doing so, it continues with the next instruction, i.e. the first instruction before the code recognized by the recursive call, to recognize N-R more bytes. The result is a number of EM instructions that collectively push N bytes. If an instruction is come across that has side-effects (e.g. a store or a procedure call) or of which R and S cannot be computed statically (e.g. a LOS), it fails. .sp 0 Note that the parser traverses the code backwards. As EM code is essentially postfix code, the parser works top down. .PP If the parser fails to recognize the parameters, the call will not be substituted in line. If the parameters can be determined, they still have to match the formal parameters of the called procedure. This check is performed by the second subphase; it cannot be done here, because it is possible that the called procedure has not been analyzed yet. .PP The entire call-list is written to a file, to be processed by the second subphase. .NH 3 The second subphase: making decisions .PP The task of the second subphase is quite easy to understand. It reads the call-list file, builds an incore call-list and deletes every call that may not be expanded in line (either because the called procedure may not be put in line, or because the actual parameters of the call do not match the formal parameters of the called procedure). It assigns a \fIpay-off\fR to every call, indicating how desirable it is to expand it. .PP The subphase repeatedly scans the call-list and takes the call with the highest ratio. The chosen one gets marked, and the call-list is extended with the nested calls, as described above. These nested calls are also assigned a ratio, and will be considered too during the next scans. .sp 0 After every decision the number of times every procedure is called is updated, using the call-count information. Meanwhile, the subphase keeps track of the amount of space left available. If all space is used, or if there are no more calls left to be expanded, it exits this loop. Finally, calls to procedures that are called only once are also chosen. .PP The actual parameters of a call are only needed by this subphase to assign a ratio to a call. To save some space, these actuals are not kept in main memory. They are removed after the call has been read and a ratio has been assigned to it. So this subphase works with \fIabstracts\fR of calls. After all work has been done, the actual parameters of the chosen calls are retrieved from a file, as they are needed by the transformation subphase. .NH 3 The third subphase: doing transformations .PP The third subphase makes the actual modifications to the EM text. It is directed by the decisions made in the previous subphase, as expressed via the call-list. The call-list read by this subphase contains only calls that were selected for expansion. The list is ordered in the same way as the EM text, i.e. if a call C1 appears before a call C2 in the call-list, C1 also appears before C2 in the EM text. So the EM text is traversed linearly, the calls that have to be substituted are determined and the modifications are made. If a procedure is come across that is no longer needed, it is simply not written to the output EM file. The substitution of a call takes place in distinct steps: .IP "change the calling sequence" 7 .sp 0 The actual parameter expressions are changed. Parameters that are put in line are removed. All remaining ones must store their result in a temporary local variable, rather than push it on the stack. The CAL instruction and any ASP (to pop actual parameters) or LFR (to fetch the result of a function) are deleted. .IP "fetch the text of the called procedure" .sp 0 Direct disk access is used to to read the text of the called procedure. The file offset is obtained from the proctable entry. .IP "allocate bytes for locals and temporaries" .sp 0 The local variables of the called procedure will be put in the stack frame of the calling procedure. The same applies to any temporary variables that hold the result of parameters that were not put in line. The proctable entry of the caller is updated. .IP "put a label after the CAL" .sp 0 If the called procedure contains a RET (return) instruction somewhere in the middle of its text (i.e. it does not fall through), the RET must be changed into a BRA (branch), to jump over the remainder of the text. This label is not needed if the called procedure falls through. .IP "copy the text of the called procedure and modify it" .sp 0 References to local variables of the called routine and to parameters that are not put in line are changed to refer to the new local of the caller. References to in line parameters are replaced by the actual parameter expression. Returns (RETs) are either deleted or replaced by a BRA. Messages containing information about local variables or parameters are changed. Global data declarations and the PRO and END pseudos are removed. Instruction labels and references to them are changed to make sure they do not have the same identifying number as labels in the calling procedure. .IP "insert the modified text" .sp 0 The pseudos of the called procedure are put after the pseudos of the calling procedure. The real text of the callee is put at the place where the CAL was. .IP "take care of nested substitutions" .sp 0 The expanded procedure may contain calls that have to be expanded too (nested calls). If the descriptor of this call contains actual parameter expressions, the code of the expressions has to be changed the same way as the code of the callee was changed. Next, the entire process of finding CALs and doing the substitutions is repeated recursively. .LP