133 lines
4.5 KiB
Text
133 lines
4.5 KiB
Text
|
.NH 2
|
||
|
Heuristic rules
|
||
|
.PP
|
||
|
Using the information described
|
||
|
in the previous section,
|
||
|
we can find all calls that can
|
||
|
be expanded in line, and for which
|
||
|
this expansion is desirable.
|
||
|
In general, we cannot expand all these calls,
|
||
|
so we have to choose the 'best' ones.
|
||
|
With every CAL instruction
|
||
|
that may be expanded, we associate
|
||
|
a \fIpay off\fR,
|
||
|
which expresses how desirable it is
|
||
|
to expand this specific CAL.
|
||
|
.sp
|
||
|
Let Tc denote the portion of EM text involved
|
||
|
in a specific call, i.e. the pushing of the actual
|
||
|
parameter expressions, the CAL itself,
|
||
|
the popping of the parameters and the
|
||
|
pushing of the result (if any, via an LFR).
|
||
|
Let Te denote the EM text that would be obtained
|
||
|
by expanding the call in line.
|
||
|
Let Pc be the original program and Pe the program
|
||
|
with Te substituted for Tc.
|
||
|
The pay off of the CAL depends on two factors:
|
||
|
.IP -
|
||
|
T = execution_time(Pe) - execution_time(Pc)
|
||
|
.IP -
|
||
|
S = code_size(Pe) - code_size(Pc)
|
||
|
.LP
|
||
|
The change in execution time (T) depends on:
|
||
|
.IP -
|
||
|
T1 = execution_time(Te) - execution_time(Tc)
|
||
|
.IP -
|
||
|
N = number of times Te or Tc get executed.
|
||
|
.LP
|
||
|
We assume that T1 will be the same every
|
||
|
time the code gets executed.
|
||
|
This is a reasonable assumption.
|
||
|
(Note that we are talking about one CAL,
|
||
|
not about different calls to the same procedure).
|
||
|
Hence
|
||
|
.DS
|
||
|
T = N * T1
|
||
|
.DE
|
||
|
T1 can be estimated by a careful analysis
|
||
|
of the transformations that are performed.
|
||
|
Below, we list everything that will be
|
||
|
different when a call is expanded in line:
|
||
|
.IP -
|
||
|
The CAL instruction is not executed.
|
||
|
This saves a subroutine jump.
|
||
|
.IP -
|
||
|
The instructions in the procedure prolog
|
||
|
are not executed.
|
||
|
These instructions, generated from the PRO pseudo,
|
||
|
save some machine registers
|
||
|
(including the old LB), set the new LB and allocate space
|
||
|
for the locals of the called routine.
|
||
|
The savings may be less if there are no
|
||
|
locals to allocate.
|
||
|
.IP -
|
||
|
In line parameters are not evaluated before the call
|
||
|
and are not pushed on the stack.
|
||
|
.IP -
|
||
|
All remaining parameters are stored in local variables,
|
||
|
instead of being pushed on the stack.
|
||
|
.IP -
|
||
|
If the number of parameters is nonzero,
|
||
|
the ASP instruction after the CAL is not executed.
|
||
|
.IP -
|
||
|
Every reference to an in line parameter is
|
||
|
substituted by the parameter expression.
|
||
|
.IP -
|
||
|
RET (return) instructions are replaced by
|
||
|
BRA (branch) instructions.
|
||
|
If the called procedure 'falls through'
|
||
|
(i.e. it has only one RET, at the end of its code),
|
||
|
even the BRA is not needed.
|
||
|
.IP -
|
||
|
The LFR (fetch function result) is not executed
|
||
|
.PP
|
||
|
Besides these changes, which are caused directly by IL,
|
||
|
other changes may occur as IL influences other optimization
|
||
|
techniques, such as Register Allocation and Constant Propagation.
|
||
|
Our heuristic rules do not take into account the quite
|
||
|
inpredictable effects on Register Allocation.
|
||
|
It does, however, favour calls that have numeric \fIconstants\fR
|
||
|
as parameter; especially the constant "0" as an inline
|
||
|
parameter gets high scores,
|
||
|
as further optimizations may often be possible.
|
||
|
.PP
|
||
|
It cannot be determined statically how often a CAL instruction gets
|
||
|
executed.
|
||
|
We will use \fIloop nesting\fR information here.
|
||
|
The nesting level of the loop in which
|
||
|
the CAL appears (if any) will be used as an
|
||
|
indication for the number of times it gets executed.
|
||
|
.PP
|
||
|
Based on all these facts,
|
||
|
the pay off of a call will be computed.
|
||
|
The following model was developed empirically.
|
||
|
Assume procedure P calls procedure Q.
|
||
|
The call takes place in basic block B.
|
||
|
.DS
|
||
|
ZP = # zero parameters
|
||
|
CP = # constant parameters - ZP
|
||
|
LN = Loop Nesting level (0 if outside any loop)
|
||
|
F = \fIif\fR # formal parameters of Q > 0 \fIthen\fR 1 \fIelse\fR 0
|
||
|
FT = \fIif\fR Q falls through \fIthen\fR 1 \fIelse\fR 0
|
||
|
S = size(Q) - 1 - # inline_parameters - F
|
||
|
L = \fIif\fR # local variables of P > 0 \fIthen\fR 0 \fIelse\fR -1
|
||
|
A = CP + 2 * ZP
|
||
|
N = \fIif\fR LN=0 and P is never called from a loop \fIthen\fR 0 \fIelse\fR (LN+1)**2
|
||
|
FM = \fIif\fR B is a firm block \fIthen\fR 2 \fIelse\fR 1
|
||
|
|
||
|
pay_off = (100/S + FT + F + L + A) * N * FM
|
||
|
.DE
|
||
|
S stands for the size increase of the program,
|
||
|
which is slightly less than the size of Q.
|
||
|
The size of a procedure is taken to be its number
|
||
|
of (non-pseudo) EM instructions.
|
||
|
The terms "loop nesting level" and "firm" were defined
|
||
|
in the chapter on the Intermediate Code (section "loop tables").
|
||
|
If a call is not inside a loop and the calling procedure
|
||
|
is itself never called from a loop (transitively),
|
||
|
then the call will probably be executed at most once.
|
||
|
Such a call is never expanded in line (its pay off is zero).
|
||
|
If the calling procedure doesn't have local variables, a penalty (L)
|
||
|
is introduced, as it will most likely get local variables if the
|
||
|
call gets expanded.
|