135 lines
4.6 KiB
Text
135 lines
4.6 KiB
Text
.NH 2
|
|
Heuristic rules
|
|
.PP
|
|
Using the information described
|
|
in the previous section,
|
|
we can find all calls that can
|
|
be expanded in line, and for which
|
|
this expansion is desirable.
|
|
In general, we cannot expand all these calls,
|
|
so we have to choose the 'best' ones.
|
|
With every CAL instruction
|
|
that may be expanded, we associate
|
|
a \fIpay off\fR,
|
|
which expresses how desirable it is
|
|
to expand this specific CAL.
|
|
.sp
|
|
Let Tc denote the portion of EM text involved
|
|
in a specific call, i.e. the pushing of the actual
|
|
parameter expressions, the CAL itself,
|
|
the popping of the parameters and the
|
|
pushing of the result (if any, via an LFR).
|
|
Let Te denote the EM text that would be obtained
|
|
by expanding the call in line.
|
|
Let Pc be the original program and Pe the program
|
|
with Te substituted for Tc.
|
|
The pay off of the CAL depends on two factors:
|
|
.IP -
|
|
T = execution_time(Pe) - execution_time(Pc)
|
|
.IP -
|
|
S = code_size(Pe) - code_size(Pc)
|
|
.LP
|
|
The change in execution time (T) depends on:
|
|
.IP -
|
|
T1 = execution_time(Te) - execution_time(Tc)
|
|
.IP -
|
|
N = number of times Te or Tc get executed.
|
|
.LP
|
|
We assume that T1 will be the same every
|
|
time the code gets executed.
|
|
This is a reasonable assumption.
|
|
(Note that we are talking about one CAL,
|
|
not about different calls to the same procedure).
|
|
Hence
|
|
.DS
|
|
T = N * T1
|
|
.DE
|
|
T1 can be estimated by a careful analysis
|
|
of the transformations that are performed.
|
|
Below, we list everything that will be
|
|
different when a call is expanded in line:
|
|
.IP -
|
|
The CAL instruction is not executed.
|
|
This saves a subroutine jump.
|
|
.IP -
|
|
The instructions in the procedure prolog
|
|
are not executed.
|
|
These instructions, generated from the PRO pseudo,
|
|
save some machine registers
|
|
(including the old LB), set the new LB and allocate space
|
|
for the locals of the called routine.
|
|
The savings may be less if there are no
|
|
locals to allocate.
|
|
.IP -
|
|
In line parameters are not evaluated before the call
|
|
and are not pushed on the stack.
|
|
.IP -
|
|
All remaining parameters are stored in local variables,
|
|
instead of being pushed on the stack.
|
|
.IP -
|
|
If the number of parameters is nonzero,
|
|
the ASP instruction after the CAL is not executed.
|
|
.IP -
|
|
Every reference to an in line parameter is
|
|
substituted by the parameter expression.
|
|
.IP -
|
|
RET (return) instructions are replaced by
|
|
BRA (branch) instructions.
|
|
If the called procedure 'falls through'
|
|
(i.e. it has only one RET, at the end of its code),
|
|
even the BRA is not needed.
|
|
.IP -
|
|
The LFR (fetch function result) is not executed
|
|
.PP
|
|
Besides these changes, which are caused directly by IL,
|
|
other changes may occur as IL influences other optimization
|
|
techniques, such as Register Allocation and Constant Propagation.
|
|
Our heuristic rules do not take into account the quite
|
|
inpredictable effects on Register Allocation.
|
|
It does, however, favour calls that have numeric \fIconstants\fR
|
|
as parameter; especially the constant "0" as an inline
|
|
parameter gets high scores,
|
|
as further optimizations may often be possible.
|
|
.PP
|
|
It cannot be determined statically how often a CAL instruction gets
|
|
executed.
|
|
We will use \fIloop nesting\fR information here.
|
|
The nesting level of the loop in which
|
|
the CAL appears (if any) will be used as an
|
|
indication for the number of times it gets executed.
|
|
.PP
|
|
Based on all these facts,
|
|
the pay off of a call will be computed.
|
|
The following model was developed empirically.
|
|
Assume procedure P calls procedure Q.
|
|
The call takes place in basic block B.
|
|
.DS
|
|
.TS
|
|
l l l.
|
|
ZP \&= # zero parameters
|
|
CP \&= # constant parameters - ZP
|
|
LN \&= Loop Nesting level (0 if outside any loop)
|
|
F \&= \fIif\fR # formal parameters of Q > 0 \fIthen\fR 1 \fIelse\fR 0
|
|
FT \&= \fIif\fR Q falls through \fIthen\fR 1 \fIelse\fR 0
|
|
S \&= size(Q) - 1 - # inline_parameters - F
|
|
L \&= \fIif\fR # local variables of P > 0 \fIthen\fR 0 \fIelse\fR -1
|
|
A \&= CP + 2 * ZP
|
|
N \&= \fIif\fR LN=0 and P is never called from a loop \fIthen\fR 0 \fIelse\fR (LN+1)**2
|
|
FM \&= \fIif\fR B is a firm block \fIthen\fR 2 \fIelse\fR 1
|
|
|
|
pay_off \&= (100/S + FT + F + L + A) * N * FM
|
|
.TE
|
|
.DE
|
|
S stands for the size increase of the program,
|
|
which is slightly less than the size of Q.
|
|
The size of a procedure is taken to be its number
|
|
of (non-pseudo) EM instructions.
|
|
The terms "loop nesting level" and "firm" were defined
|
|
in the chapter on the Intermediate Code (section "loop tables").
|
|
If a call is not inside a loop and the calling procedure
|
|
is itself never called from a loop (transitively),
|
|
then the call will probably be executed at most once.
|
|
Such a call is never expanded in line (its pay off is zero).
|
|
If the calling procedure doesn't have local variables, a penalty (L)
|
|
is introduced, as it will most likely get local variables if the
|
|
call gets expanded.
|