ack/doc/ego/il/il5

.NH 2
Implementation
.PP
A major factor in the implementation
of Inline Substitution is the requirement
not to use an excessive amount of memory.
IL essentially analyzes the entire program;
it makes decisions based on which procedure calls
appear in the whole program.
Yet, because of the memory restriction, it is
not feasible to read the entire program
in main memory.
To solve this problem, the IL phase has been
split up into three subphases that are executed sequentially:
.IP 1.
analyze every procedure; see how it accesses its parameters;
simultaneously collect all calls
appearing in the whole program an put them
in a \fIcall-list\fR.
.IP 2.
use the call-list and decide which calls will be substituted
in line.
.IP 3.
take the decisions of subphase 2 and modify the
program accordingly.
.LP
Subphases 1 and 3 scan the input program; only
subphase 3 modifies it.
It is essential that the decisions can be made
in subphase 2
without using the input program,
provided that subphase 1 puts enough information
in the call-list.
Subphase 2 keeps the entire call-list in main memory
and repeatedly scans it, to
find the next best candidate for expansion.
.PP
We will specify the
data structures used by IL before
describing the subphases.
.NH 3
Data structures
.NH 4
The procedure table
.PP
In subphase 1 information is gathered about every procedure
and added to the procedure table.
This information is used by the heuristic rules.
A proctable entry for procedure p has
the following extra information:
.IP -
is it allowed to substitute an invocation of p in line?
.IP -
is it allowed to put any parameter of such a call in line?
.IP -
the size of p (number of EM instructions)
.IP -
does p 'fall through'?
.IP -
a description of the formal parameters that p accesses; this information
is obtained by looking at the code of p. For every parameter f,
we record:
.RS
.IP -
the offset of f
.IP -
the type of f (word, double word, pointer)
.IP -
may the corresponding actual parameter be put in line?
.IP -
is f ever accessed indirectly?
.IP -
if f used: never, once or more than once?
.RE
.IP -
the number of times p is called (see below)
.IP -
the file address of its call-count information (see below).
.LP
.NH 4
Call-count information
.PP
As a result of Inline Substitution, some procedures may
become useless, because all their invocations have been
substituted in line.
One of the tasks of IL is to keep track which
procedures are no longer called.
Note that IL is especially keen on procedures that are
called only once
(possibly as a result of expanding all other calls to it).
So we want to know how many times a procedure
is called \fIduring\fR Inline Substitution.
It is not good enough to compute this
information afterwards.
The task is rather complex, because
the number of times a procedure is called
varies during the entire process:
.IP 1.
If a call to p is substituted in line,
the number of calls to p gets decremented by 1.
.IP 2.
If a call to p is substituted in line,
and p contains n calls to q, then the number of calls to q
gets incremented by n.
.IP 3.
If a procedure p is removed (because it is no
longer called) and p contains n calls to q,
then the number of calls to q gets decremented by n.
.LP
(Note that p may be the same as q, if p is recursive).
.sp 0
So we actually want to have the following information:
.DS
NRCALL(p,q) = number of call to q appearing in p,

for all procedures p and q that may be put in line.
.DE
This information, called \fIcall-count information\fR is
computed by the first subphase.
It is stored in a file.
It is represented as a number of lists, rather than as
a (very sparse) matrix.
Every procedure has a list of (proc,count) pairs,
telling which procedures it calls, and how many times.
The file address of its call-count list is stored
in its proctable entry.
Whenever this information is needed, it is fetched from
the file, using direct access.
The proctable entry also contains the number of times
a procedure is called, at any moment.
.NH 4
The call-list
.PP
The call-list is the major data structure use by IL.
Every item of the list describes one procedure call.
It contains the following attributes:
.IP -
the calling procedure (caller)
.IP -
the called procedure (callee)
.IP -
identification of the CAL instruction (sequence number)
.IP -
the loop nesting level; our heuristic rules appreciate
calls inside a loop (or even inside a loop nested inside
another loop, etc.) more than other calls
.IP -
the actual parameter expressions involved in the call;
for every actual, we record:
.RS
.IP -
the EM code of the expression
.IP -
the number of bytes of its result (size)
.IP -
an indication if the actual may be put in line
.RE
.LP
The structure of the call-list is rather complex.
Whenever a call is expanded in line, new calls
will suddenly appear in the program,
that were not contained in the original body
of the calling subroutine.
These calls are inherited from the called procedure.
We will refer to these invocations as \fInested calls\fR
(see Fig. 5.1).
.DS
.TS
lw(2.5i) l.
procedure p is
begin	.
     a();	.
     b();	.
end;
.TE

.TS
lw(2.5i) l.
procedure r is	procedure r is
begin	begin
     x();	    x();
     p();  -- in line	    a();  -- nested call
     y();	    b();  -- nested call
end;	    y();
	end;
.TE

Fig. 5.1 Example of nested procedure calls
.DE
Nested calls may subsequently be put in line too
(probably resulting in a yet deeper nesting level, etc.).
So the call-list does not always reflect the source program,
but changes dynamically, as decisions are made.
If a call to p is expanded, all calls appearing in p
will be added to the call-list.
.sp 0
A convenient and elegant way to represent
the call-list is to use a LISP-like list.
.[
poel lisp trac
.]
Calls that appear at the same level
are linked in the CDR direction. If a call C
to a procedure p is expanded,
all calls appearing in p are put in a sub-list
of C, i.e. in its CAR.
In the example above, before the decision
to expand the call to p is made, the
call-list of procedure r looks like:
.DS
(call-to-x, call-to-p, call-to-y)
.DE
After the decision, it looks like:
.DS
(call-to-x, (call-to-p*, call-to-a, call-to-b), call-to-y)
.DE
The call to p is marked, because it has been
substituted.
Whenever IL wants to traverse the call-list of some procedure,
it uses the well-known LISP technique of
recursion in the CAR direction and
iteration in the CDR direction
(see page 1.19-2 of
.[
poel lisp trac
.]
).
All list traversals look like:
.DS
traverse(list)
{
    for (c = first(list); c != 0; c = CDR(c)) {
        if (c is marked) {
            traverse(CAR(c));
        } else {
            do something with c
        }
    }
}
.DE
The entire call-list consists of a number of LISP-like lists,
one for every procedure.
The proctable entry of a procedure contains a pointer
to the beginning of the list.
.NH 3
The first subphase: procedure analysis
.PP
The tasks of the first subphase are to determine
several attributes of every procedure
and to construct the basic call-list,
i.e. without nested calls.
The size of a procedure is determined
by simply counting its EM instructions.
Pseudo instructions are skipped.
A procedure does not 'fall through' if its CFG
contains a basic block
that is not the last block of the CFG and
that ends on a RET instruction.
The formal parameters of a procedure are determined
by inspection of
its code.
.PP
The call-list in constructed by looking at all CAL instructions
appearing in the program.
The call-list should only contain calls to procedures
that may be put in line.
This fact is only known if the procedure was
analyzed earlier.
If a call to a procedure p appears in the program
before the body of p,
the call will always be put in the call-list.
If p is later found to be unsuitable,
the call will be removed from the list by the
second subphase.
.PP
An important issue is the recognition
of the actual parameter expressions of the call.
The front ends produces messages telling how many
bytes of formal parameters every procedure accesses.
(If there is no such message for a procedure, it
cannot be put in line).
The actual parameters together must account for
the same number of bytes.A recursive descent parser is used
to parse side-effect free EM expressions.
It uses a table and some
auxiliary routines to determine
how many bytes every EM instruction pops from the stack
and how many bytes it pushes onto the stack.
These numbers depend on the EM instruction, its argument,
and the wordsize and pointersize of the target machine.
Initially, the parser has to recognize the
number of bytes specified in the formals-message,
say N.
Assume the first instruction before the CAL pops S bytes
and pushes R bytes.
If R > N, too many bytes are recognized
and the parser fails.
Else, it calls itself recursively to recognize the
S bytes used as operand of the instruction.
If it succeeds in doing so, it continues with the next instruction,
i.e. the first instruction before the code recognized by
the recursive call, to recognize N-R more bytes.
The result is a number of EM instructions that collectively push N bytes.
If an instruction is come across that has side-effects
(e.g. a store or a procedure call) or of which R and S cannot
be computed statically (e.g. a LOS), it fails.
.sp 0
Note that the parser traverses the code backwards.
As EM code is essentially postfix code, the parser works top down.
.PP
If the parser fails to recognize the parameters, the call will not
be substituted in line.
If the parameters can be determined, they still have to
match the formal parameters of the called procedure.
This check is performed by the second subphase; it cannot be
done here, because it is possible that the called
procedure has not been analyzed yet.
.PP
The entire call-list is written to a file,
to be processed by the second subphase.
.NH 3
The second subphase: making decisions
.PP
The task of the second subphase is quite easy
to understand.
It reads the call-list file,
builds an incore call-list and deletes every
call that may not be expanded in line (either because the called
procedure may not be put in line, or because the actual parameters
of the call do not match the formal parameters of the called procedure).
It assigns a \fIpay-off\fR to every call,
indicating how desirable it is to expand it.
.PP
The subphase repeatedly scans the call-list and takes
the call with the highest ratio.
The chosen one gets marked,
and the call-list is extended with the nested calls,
as described above.
These nested calls are also assigned a ratio,
and will be considered too during the next scans.
.sp 0
After every decision the number of times
every procedure is called is updated, using
the call-count information.
Meanwhile, the subphase keeps track of the amount of space left
available.
If all space is used, or if there are no more calls left to
be expanded, it exits this loop.
Finally, calls to procedures that are called only
once are also chosen.
.PP
The actual parameters of a call are only needed by
this subphase to assign a ratio to a call.
To save some space, these actuals are not kept in main memory.
They are removed after the call has been read and a ratio
has been assigned to it.
So this subphase works with \fIabstracts\fR of calls.
After all work has been done,
the actual parameters of the chosen calls are retrieved
from a file,
as they are needed by the transformation subphase.
.NH 3
The third subphase: doing transformations
.PP
The third subphase makes the actual modifications to
the EM text.
It is directed by the decisions made in the previous subphase,
as expressed via the call-list.
The call-list read by this subphase contains
only calls that were selected for expansion.
The list is ordered in the same way as the EM text,
i.e. if a call C1 appears before a call C2 in the call-list,
C1 also appears before C2 in the EM text.
So the EM text is traversed linearly,
the calls that have to be substituted are determined
and the modifications are made.
If a procedure is come across that is no longer needed,
it is simply not written to the output EM file.
The substitution of a call takes place in distinct steps:
.IP "change the calling sequence" 7
.sp 0
The actual parameter expressions are changed.
Parameters that are put in line are removed.
All remaining ones must store their result in a
temporary local variable, rather than
push it on the stack.
The CAL instruction and any ASP (to pop actual parameters)
or LFR (to fetch the result of a function)
are deleted.
.IP "fetch the text of the called procedure"
.sp 0
Direct disk access is used to to read the text of the
called procedure.
The file offset is obtained from the proctable entry.
.IP "allocate bytes for locals and temporaries"
.sp 0
The local variables of the called procedure will be put in the
stack frame of the calling procedure.
The same applies to any temporary variables
that hold the result of parameters
that were not put in line.
The proctable entry of the caller is updated.
.IP "put a label after the CAL"
.sp 0
If the called procedure contains a RET (return) instruction
somewhere in the middle of its text (i.e. it does
not fall through), the RET must be changed into
a BRA (branch), to jump over the
remainder of the text.
This label is not needed if the called
procedure falls through.
.IP "copy the text of the called procedure and modify it"
.sp 0
References to local variables of the called routine
and to parameters that are not put in line
are changed to refer to the
new local of the caller.
References to in line parameters are replaced
by the actual parameter expression.
Returns (RETs) are either deleted or
replaced by a BRA.
Messages containing information about local
variables or parameters are changed.
Global data declarations and the PRO and END pseudos
are removed.
Instruction labels and references to them are
changed to make sure they do not have the
same identifying number as
labels in the calling procedure.
.IP "insert the modified text"
.sp 0
The pseudos of the called procedure are put after the pseudos
of the calling procedure.
The real text of the callee is put at
the place where the CAL was.
.IP "take care of nested substitutions"
.sp 0
The expanded procedure may contain calls that
have to be expanded too (nested calls).
If the descriptor of this call contains actual
parameter expressions,
the code of the expressions has to be changed
the same way as the code of the callee was changed.
Next, the entire process of finding CALs and doing
the substitutions is repeated recursively.
.LP